Yesterday, an exception occurred when GTID mode replication was enabled in MySQL. No operations on the MASTER node can be applied on the SLAVE. The RELAYLOG of the SLAVE contains records, however, the BINLOG of SLAVE cannot be found. Since GTID is enabled, it is easy to troubleshoot. You only need to parse RELAYLOG and BINLOG on SLAVE to write this article respectively.
Yesterday, an exception occurred when GTID mode replication was enabled in MySQL 5.6. Any operations on the MASTER node cannot be applied on the SLAVE. The relay log of the SLAVE contains records, however, the BINLOG of SLAVE cannot be found. Since GTID is enabled, it is easy to troubleshoot. You only need to parse the relay log and BINLOG on SLAVE to write this article respectively.
Yesterday, an exception occurred when GTID mode replication was enabled in MySQL 5.6. Any operations on the MASTER node cannot be applied on the SLAVE. The relay log of the SLAVE contains records, however, the BINLOG of SLAVE cannot be found. Since GTID is enabled, it is easy to troubleshoot. You only need to parse the relay log and BINLOG into text files on SLAVE, and then directly search for the UUID of the MASTER, you can check whether the transaction copied by the MASTER is applied on the SLAVE. During troubleshooting, it was once suspected that it was because of setting BINLOG-DO or IGNORE rules, REPLICATION-DO or IGNORE rules, or even a serious BUG in GTID, but no clue was found. The following information is found from show slave status:
[Yejr@imysql.com]> show slave status \ G *************************** 1. row ************************** Slave_IO_State: Waiting for master to send event... master_Log_File: mysql-bin.000001Read_Master_Log_Pos: rule: mysql-relay-bin.000003Relay_Log_Pos: rule: mysql-bin.000001Slave_IO_Running: rule: Yes # two threads work normally Replicate_Do_DB: rule: Replicate_Do_Table: Replicate_Ignore_Table: rule: # No rules set for Last_Errno: 0Last_Error: Skip_Counter: 0Exec_Master_Log_Pos: 2539 # Both binlog file and pos are consistent with the MASTER node. That is to say, the BINLOG reception and relayr log apply are both normal, and Relay_Log_Space: 2778Until_Condition: noneUntil_Log_File: Until_Log_Pos: 0... seconds_Behind_Master: Usage: 0Last_IO_Error: Usage: 0Last_ SQL _Error: Usage: # ignore BINLOGMaster_Server_Id: 123315Master_UUID: Usage:/data/db11_3316/master on some server-IDs. infoSQL_Delay: 0 # the replication delay policy SQL _Remaining_Delay: latency: Slave has read all relay log; waiting for the slave I/O thread to update itMaster_Retry_Count: Failed: Master_SSL_Crl: master_SSL_Crlpath: Retrieved_Gtid_Set: Priority: 1-451Executed_Gtid_Set: 35cc99c6-0297-11e4-9916-782bcb2c9453: 1-2455: 792490-4517929Auto_Position: 1
What did I find from the above logs? Especially the two rows:
Retrieved_Gtid_Set: Priority: 1-451Executed_Gtid_Set: 35cc99c6-0297-11e4-9916-782bcb2c9453: 1-2455: 792490-4517929Auto_Position: 1
This is a bit clear, meaning:
1. The range of GTID replicated by SLAVE from the MASTER is: 1-451; 2. The range of GTID executed by SLAVE is divided into two sections: 1-2455, And the other section is: 792490-4517929;
Nima, it shouldn't be continuous. How can it be so amazing? How can it be ~~~ In a hurry, we can analyze why GTID has a breakpoint from the MASTER to the SLAVE, resulting in a gap.
Normally, after GTID is enabled in MySQL 5.6 and REPLICATION is deployed, you can set MASTER_AUTO_POSITION = 1 so that SLAVE can automatically select an appropriate transaction point for REPLICATION based on GTID, DBAs do not need to pay attention to and worry about the inconsistency between the master and the master, which makes DBAs worry-free. When MASTER_AUTO_POSITION = 1 is enabled, normally there is no gap between GTID and breakpoint issues occur. Unless this is the case below:
1. Manually pause the SLAVE process; 2. Continue to write data on the MASTER; 3. Refresh the LOG on the MASTER; 4. Delete the old BINLOG on the MASTER and keep only the latest BINLOG; 5. when the MASTER is started on the SLAVE, an error will be reported, as shown below: Last_IO_Errno: 1236Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is ing using change master to MASTER_AUTO_POSITION = 1, but The master has purged binary logs containing GTIDs that the slave requires.'
The solution to this problem can be as follows:
1. Disable MASTER_AUTO_POSITION, that is, set MASTER_AUTO_POSITION = 0; 2. Manually change binlog file & POS;
In this case, you cannot set MASTER_AUTO_POSITION = 1 again; otherwise, an error is returned again. There is also a GTID gap breakpoint problem, for example:
1. normally configure REPLICATION, but set MASTER_AUTO_POSITION = 0, that is, manually specify the traditional method of binlog file & POS; 2. temporarily disable the SLAVE process during REPLICATION; 3. manually modify the binlog file and POS information to point to the new binlog file and POS point. 4. Start the SLAVE and then the GTID breakpoint will be restored;
In master-slave high availability mode, switching between the master and slave nodes may occur, and then switching back again. In this case, the above breakpoint problem may also occur. Therefore, we recommend that you use dual-master nodes to deploy high-availability switchover. Basically, you can achieve any round-trip switchover without manually specifying new binlog fiee & POS information.
The last case is that the reset master is executed on the MASTER. As a result, all BINLOG files and POS on the MASTER are RESET, and the information read on the SLAVE is naturally inconsistent.
Now, let's talk about how to deal with the GTID breakpoint.
Method 1: manually modify binlog file & POS
1. Disable SLAVE; 2. Manually change binlog file & POS, point to the latest binlog file & POS generated on the MASTER, and set MASTER_AUTO_POSITION = 0; 3. Start SLAVE;
Method 2: manually modify the GTID_PURGED Value
1. Disable SLAVE; 2. Execute reset master on SLAVE and reset binlog file & POS on SLAVE; 3. Execute SET @ GLOBAL on SLAVE. GTID_PURGED = '35cc99c6-0297-11e4-9916-782bcb2c9453: 1-2455 '; 4. Start SLAVE;
This is a bit confusing, which means that we tell SLAVE to take the initiative to discard the transactions in certain intervals transmitted on the MASTER. In this example, we discard the range 1-2455, that is, starting from 2466 in GTID, we will continue to apply relay log, compared to the information we started:
Retrieved_Gtid_Set: 35cc99c6-0297-11e4-9916-782bcb2c9453: 1-451Executed_Gtid_Set: 35cc99c6-0297-11e4-9916-782bcb2c9453: 1-2455: 792490
We force the SLAVE to ignore the range 1-2455 and continue the replication from 2466, eliminating the previously ignored range: 792490-4517929, make sure that all newly generated transactions are applied. For this practice, refer to the MySQL Manual :.