Friday 16 August 2019

Restoring the failed member in MySQL InnoDB cluster with MySQL Enterprise Backup

Using MySQL Enterprise Backup to restore and rebuild the failed member in secondary node

Testing Environment:

Linux OS: CentOS 7.4
MySQL Server Version: 8.0
MySQL Enterprise Backup 8.0
MySQL Enterprise Transparent Encryption Encryption 
MySQL InnoDB cluster 8.0

Host: primary01 as primary node
Host: second02 as secondary node

A. Pre-Task: 

1. Taking Full-backup with Compression on secondary using MySQL Enterprise Backup

mysqlbackup --defaults-file=my.cnf --backup_image=/backup/mysqld/my.mbi_`date +%d%m_%H%M` --backup-dir=/backup/mysqld/backup_`date +%d%m_%H%M` --user=mysbackup -p --host=second02 --port=3306 --encrypt-password= --compress --compress-level=5  backup-to-image &>> `date +%d%m_%H%M`_full_compress_tde_mysqld_backup.log

2. check backup log with "completed" and please ignore the warning since the backup was taken on secondary node.

ySQL Enterprise Backup  Ver 8.0.15-commercial for linux-glibc2.12 on x86_64 (MySQL Enterprise - Commercial)
Copyright (c) 2003, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

190816 16:56:26 MAIN    INFO: A thread created with Id '140676524181376'
190816 16:56:26 MAIN    INFO: Starting with following command line ...
 mysqlbackup
        --defaults-file=my.cnf
        --backup_image=/backup/mysqld/my.mbi_1608_1656
        --backup-dir=/backup/mysqld/backup_1608_1656
        --user=mysbackup -pxxxxxxxxxxxx --host=second02 --port=3306
        --encrypt-password=xxxxxxxxxx --compress --compress-level=5
        backup-to-image

190816 16:56:26 MAIN    INFO:
190816 16:56:26 MAIN    INFO: No SSL options specified.
190816 16:56:26 MAIN    INFO: MySQL server version is '8.0.15-commercial'
190816 16:56:26 MAIN    INFO: MySQL server compile os version is 'linux-glibc2.12'
190816 16:56:26 MAIN    INFO: SSL/TLS version used for connection is TLSv1.2
190816 16:56:26 MAIN WARNING: This backup operation cannot write to backup progress. The MySQL server is running with the --super-read-only option.
....
mysqlbackup completed OK! with 1 warnings

B. Task: Restore the backup on secondary node
1.Stop MySQL server
mysql -u mysql -p password  -hsecond02 -P3306 -e "shutdown"

2. remove all data files under mysql database data directory
rm -rf /data/mysqld
mkdir -p /data/mysqld

3. remove all log files and the keyring files 
rm -rf /data/mysql-keyring
mkdir -p /data/mysql-keyring
rm -rf /data/log
mkdir -p /data/log

4. create restore directory
mkdir -p /backup/mysqld/restore

5. restore backup 
mysqlbackup --defaults-file=/my.cnf --backup-image=/backup/mysqld/ --backup-dir=/backup/mysqld/restore --datadir=/data/mysqld --user=mysbackup -p password  --host=second02 --port=3306 --encrypt-password= encryption_password  --uncompress copy-back-and-apply-log &>> `date +%d%m_%H%M`_full_compress_tde_mysqld_restore.log

C. Post-Task: Re-join as a New Member in MySQL InnoDB cluster
1. Start restored MySQL Server
mysqld_safe --defaults-file=my.cnf &

2. Reset group replication member status
mysql -u mysql -p password -hsecond02 -P3306
mysql> RESET MASTER;
mysql> RESET SLAVE ALL;


*Remark: please follow the below steps if enabled auto recover using group replication's built-in mechanism with the variable "gtid_executed
//start

mysql> SET sql_log_bin=off;
mysql> source /data/mysqld/backup_gtid_executed.sql
mysql> SET sql_log_bin=on;

//end

3. Re-join the restored server in MySQL InnoDB cluster
mysqlsh --uri=mysql:password @primary01:3306

MySQL  primary01:3306 ssl  JS > var x = dba.getCluster();
MySQL  primary01:3306 ssl  JS > x.rescan();
Rescanning the cluster...

Result of the rescanning operation for the 'default' ReplicaSet:
{
    "name": "default", 
    "newTopologyMode": null, 
    "newlyDiscoveredInstances": [], 
    "unavailableInstances": [
        {
            "host": "second02:3306", 
            "label": "second02:3306", 
            "member_id": "4e1eff7d-bf22-11e9-93f4-080027ea2dcf"
        }
    ]
}

The instance 'second02:3306' is no longer part of the ReplicaSet.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('second02:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: y 
Removing instance from the cluster metadata...
The instance 'second02:3306' was successfully removed from the cluster metadata.

MySQL  primary01:3306 ssl  JS > dba.checkInstanceConfiguration('group_replication_user:group_replication_password@second02:3306');
MySQL  primary01:3306 ssl  JS > dba.configureInstance('mysadm:mysadm43d@second02:3306', {clusterAdmin:'group_replication_user',clusterAdminPassword:'group_replication_password'});
dba.checkInstanceConfiguration('group_replication_user:group_replication_password@second02:3306');
MySQL  primary01:3306 ssl  JS > x.status();
MySQL  primary01:3306 ssl  JS >x.addInstance('group_replication_user:group_replication_password>@second02:3306');
MySQL  primary01:3306 ssl  JS > x.status();
{
    "clusterName": "CC_CLU01", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "primary01:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures.", 
        "topology": {
            "primary01:3310": {
                "address": "primary01:3306", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }, 
            "second02:3310": {
                "address": "second02:3306", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "ONLINE"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "primary01:3306"
}


3. check member status after re-join the restored server in MySQL InnoDB cluster
mysql -u mysql -p password  -hsecond02 -P3306
mysql> SELECT member_host, member_port, member_state FROM performance_schema.replication_group_members;
+-------------+-------------+--------------+
| member_host | member_port | member_state |
+-------------+-------------+--------------+
| primary01   |        3306 | ONLINE       |
| second02    |        3306 | ONLINE       |
+-------------+-------------+--------------+
2 rows in set (0.00 sec)

mysql> 






Monday 18 March 2019

Resolve ORA-01034 and ORA-27101

Got error message "ORA-01034" & "ORA-27101" when connect the database with "@", this issue may caused by the listener service is not started well.
I tried to stop and start again the listener, the issue has been resolved.


1. Please check ORACLE_SID
2. Please check listener.ora configuration
3. Please try to restart the listener service 

Here is the example:
Got error message when connect to the database with  TNS name "@"

[oracle@primary01 ~]$ sqlplus 'sysbackup/abc123@testdgbdr as sysbackup'

SQL*Plus: Release 12.1.0.2.0 Production on Mon Mar 18 18:09:19 2019

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show user
USER is "SYSBACKUP"
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0


SQL>exit

Try to startup the DB instance as sysdba
[oracle@primary01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Mon Mar 18 18:10:24 2019

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 2097152000 bytes
Fixed Size                  2999888 bytes
Variable Size            1207962032 bytes
Database Buffers          872415232 bytes
Redo Buffers               13774848 bytes
Database mounted.
Database opened.
SQL> show parameter local_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string      LISTENER_testdgbdr
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Check the listener service status 
[oracle@primary01 ~]$ lsnrctl status p01lsnr1521

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 18-MAR-2019 18:01:15

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=primary01)(PORT=1528)))
STATUS of the LISTENER
------------------------
Alias                     p01lsnr1521
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                19-JAN-2019 16:59:55
Uptime                    28 days 0 hr. 3 min. 20 sec
Trace Level               off
Security                  OFF
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/primary01/p01lsnr1521/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=primary01)(PORT=1528)))
Services Summary...
Service "testdgbdr" has 2 instance(s).
  Instance "testdgbdr", status UNKNOWN, has 1 handler(s) for this service...
  Instance "testdgbdr", status READY, has 1 handler(s) for this service...
Service "testdgbdrXDB" has 1 instance(s).
  Instance "testdgbdr", status READY, has 1 handler(s) for this service...

Try to stop the listener service 

[oracle@primary01 ~]$ lsnrctl stop p01lsnr1521

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 18-MAR-2019 18:17:49

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=primary01)(PORT=1521)))
The command completed successfully

Start the listener again.

[oracle@primary01 ~]$ lsnrctl start p01lsnr1521
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 18-MAR-2019 18:01:15

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=primary01)(PORT=1528)))
STATUS of the LISTENER
------------------------
Alias                     p01lsnr1521
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                18-MAR-2019 18:17:55
Uptime                    28 days 0 hr. 3 min. 20 sec
Trace Level               off
Security                  OFF
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/primary01/p01lsnr1521/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=primary01)(PORT=1528)))
Services Summary...
Service "testdgbdr" has 2 instance(s).
  Instance "testdgbdr", status UNKNOWN, has 1 handler(s) for this service...
  Instance "testdgbdr", status READY, has 1 handler(s) for this service...
Service "testdgbdrXDB" has 1 instance(s).
  Instance "testdgbdr", status READY, has 1 handler(s) for this service...

Try to connect to the database with @, to stop and start the DB instance.
The issue has been resolved after restarted the listener, to stop the database instance without error ORA-01034 & ORA-27101

[oracle@primary01 ~]$ sqlplus 'sysbackup/abc123@testdgbdr as sysbackup'
The command completed successfully
SQL*Plus: Release 12.1.0.2.0 Production on Mon Mar 18 18:19:59 2019

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 2097152000 bytes
Fixed Size                  2999888 bytes
Variable Size            1207962032 bytes
Database Buffers          872415232 bytes
Redo Buffers               13774848 bytes
Database mounted.
Database opened.
SQL> exit