Solve MySQL use Gtid master-slave replication error problem

Last Update:2017-01-13 Source: Internet

Author: User

Tags error handling mysql in reset

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Do MySQL master and slave will certainly encounter a lot of synchronization problems, most of them are due to machine downtime, restart, or the primary key conflict caused by the server to stop work, here specifically to collect similar problems and provide finishing solutions, only for reference!

1, master and slave network interruption, or the primary server reboot, or reboot from the server, will be based on the time in the profile (default 1 minutes) to automatically reconnect the primary server, until the network and services can be properly connected to the normal connection can automatically continue to sync before the file, do not need any manual intervention!

2, when the master and slave because of man-made reasons for the synchronization, you can use the following command to synchronize:

The code is as follows	Copy Code
LOAD DATA from MASTER; LOAD TABLE tblname from MASTER;

Note that the above command will lock the primary database, if the database is very large, it is recommended for downtime, or a short lock backup to view show master status; After copying the database in a way.

3, when the Bin-log inside the SQL level error causes master and subordinate can not sync, you can use the following method to skim the Error statement line, continue to sync:

The code is as follows	Copy Code
Stop slave; Set global sql_slave_skip_counter=1; Start slave;

4. When set global sql_slave_skip_counter=1; Yes, there may be a mistake.
ERROR 1858 (HY000): Sql_slave_skip_counter can not being set when the "server is" running with Gtid_mode = on. Instead, for each transaction which you want to skip, generate a empty transaction with the same as the Gtid

The reason is clear. databases that do not support Gtid_mode mode running
What about it?
Let's talk about the Gtid mode of master-slave error skipping method

No more words. Directly on the method, executed in order
First determine the Gtid point, which is the point of synchronization errors recorded, the following method, you must first log in to MySQL before viewing

The code is as follows	Copy Code
Mysql> show Slave statusg;

Take a look at the information and record it.
executed_gtid_set:7f8d9eb8-a7fe-11e2-84fd-0015177c251e:1-260

Next, reset the master and slave on the slave.
Note:
(note here that the master and slave from the server, if the primary master copy would be cumbersome)
(note here that reset Master will cause all slave resets on this slave, the main purpose of Reset Master is to make gtid_executed empty.) This cannot be done simply by changing master to, which does not appear to be an error, but in fact slave is not updated, and the server references the Executed_gtid_set parameter in the show slave statusg for data.)

The code is as follows	Copy Code
mysql> Reset Master; Query OK, 0 rows affected (0.20 sec) mysql> stop Slave; Query OK, 0 rows affected (0.05 sec) mysql> Reset Slave; Query OK, 0 rows affected (0.42 sec)

Now we need to reset the Gtid to skip the wrong message. Remember the executed_gtid_set we recorded in the first step? Yes, it's the wrong way to do it, so be conservative and skip this one, plus 1 on its ID.

The code is as follows	Copy Code
mysql> set global gtid_purged= ' 7f8d9eb8-a7fe-11e2-84fd-0015177c251e:1-261′; Query OK, 0 rows affected (0.18 sec)

Since we have just reset master and slave, we need to change master again:

The code is as follows	Copy Code
Change MASTER to Master_host= ' 192.168.1.136′, master_port=3306, master_user= ' dbadmin ', master_password= ' 123456′, Master_auto_position=1;

and restart Slave.

The code is as follows	Copy Code
Start slave; show slave statusg;

What do you think? Is the problem solved? What the? Also reported wrong? Then you carefully look at the error is not the same as the previous one? To prove that you have skipped the previous error, you need to do is to continue to repeat the above operation, until you skip all the wrong me, do not bother, after all, the data is very important Oh!

Synchronous replication Error

In the afternoon a master three from the MySQL replication, resulting in all the servers are configured, found from the following error

Last_io_error:fatal error:the slave I/O thread stops because master and slave have MySQL server IDs; These IDs must is different for replication to work (or the--replicate-same-server-id option must is used on slave but th Is does to always make sense; Please check the manual before using it).

The meaning is from the server_id on the same as the Lord, through the view found from the/etc/my.cnf in the server_id=1 this line I did not comment out (in the copy section below I set the server_id), so immediately the line commented out, and then restart MySQL, Find the same mistakes.

Use the following command to view the server_id

The code is as follows	Copy Code
Mysql> Show variables like ' server_id '; +---------------+-------+ \| variable_name \| Value \| +---------------+-------+ \| server_id \| 1 \| +---------------+-------+ 1 row in Set (0.00 sec)

Found that MySQL did not update server_id from the my.cnf file, since this can only be manually modified

The code is as follows	Copy Code
mysql> set global server_id=2; #此处的数值和my the same as in CNF. mysql> slave start;

After this execution, slave returned to normal.

But later, mosquitoes use/etc/init.d/mysqld restart restart the MySQL service, and then look at the slave state, found that the above error, and then see server_id found that the value returned to 1.

After the mosquito again looked at the contents of the/ETC/MY.CNF, to confirm that it should not be the problem of this file, so go to Google to check, see MySQL in the startup will look for/ETC/MY.CNF, datadir/my.cnf,user_home/ My.cnf.

So I carried out

The code is as follows	Copy Code
Find/-name "MY.CNF"

Incredibly in/usr/local/mysql this directory found my.cnf file, so the mosquito will delete this file, and then restart the MySQL service, found that everything back to normal

Some error handling and routine maintenance

Check that the show slave status command is generally used from the server

code is as follows

copy code

mysql> Show SLAVE STATUSG
*************************** 1. Row ***************************
Slave_io_state:waiting for Master to send event
master_host:192.168.0.100
master_user:root
master_port:3306
Connect_retry:3
& nbsp master_log_file:mysql-bin.003
read_master_log_pos:79
relay_log_file:mysql-relay-bin. 003
Relay_ log_pos:548
Relay_master_log_file:mysql-bin. 003
Slave_io_running:yes
Slave_sql_running:yes
replicate_do_db:
replicate_ignore_db:
last_errno:0
...

In the information above, our main concern is slave_io_running and slave_sql_running.
Slave_io_running: Reading the Binlog log from the server from the primary server and writing the relay log from the server
Slave_sql_running: The process is reading the Binlog relay log from the server and converting it to SQL execution
Previously there was a process that was no State, indicating that the replicated process stopped, and what would be seen in Last_errno

Sometimes because the main server update is too frequent, resulting in a slow update from the server, of course, the problem is varied, there may be the structure of the network is not good or poor performance of the hardware, so that the gap between the master and the server is growing, and eventually some applications have been affected, in this case We need regular data synchronization of the master-slave server, the following steps
On the primary server

The code is as follows	Copy Code
Mysql> FLUSH TABLES with READ LOCK; Query OK, 0 rows affected (0.03 sec) Mysql> Show Master STATUSG; 1. Row *************************** file:mysql-bin.000004 position:102 binlog_do_db: binlog_ignore_db: 1 row in Set (0.00 sec)

Record the name and offset of the log, which is the destination for replication from the server

From the server, use the master_pos_wait () function to get the copy coordinate value

code is as follows	copy code
mysql> Select master_pos_wait (' mysql-bin.000004 ', ' 102 '); +-------------------------------------------+ \| master_pos_wait (' mysql-bin.000004 ', ' 102 ') \| +--------- ----------------------------------+ \| 0 \| +-------------------------------------------+ 1 row in Set (0.00 sec)

This SELECT statement blocks until the specified log file and offset are reached from the server, and 0 is returned, if-1, the timeout is rolled out, and the query is 0 o'clock, indicating that the server is synchronized with the primary server

In some cases, a failure to update from the server will first need to determine whether the table from the server is different from the primary server, and if it is the result of a table structure, you need to modify the table from the server and the primary server, and then rerun start slave
If it is not a different table structure caused by the failure of the update, you need to confirm that the manual update is safe, and then ignore to the Autonomic server update failure statement, jump over to the autonomous server statement, command for the set GLOBAL sql_slave_skip_counter=n, which, n= 1 means that the UPDATE statement to the autonomic server does not use Auto_increment or last_insert_id (), n=2 when it is otherwise, because the statement using Auto_increment or last_insert_id needs to obtain two events from the binary log.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More