MySQL master-Slave Synchronization (4)-Synchronization delay status considerations (Seconds_behind_master and Pt-heartbea)

Source: Internet
Author: User
Tags set time

In general, we are through the "show slave status \g;" Provides the Seconds_behind_master value to measure the delay of MySQL master-slave synchronization. See: MySQL Master-Slave synchronization (4)-slave latency status monitoring, which is really feasible in most cases. But experience tells me that it is absolutely unreliable to rely solely on the value of Seconds_behind_master to monitor master-slave synchronization data for delay!!!

Once encountered a pit:
MySQL master-slave environment after the deployment, just start master-slave data synchronization is not a problem, but also by monitoring the value of Seconds_behind_master to determine whether synchronization delay. But after running for a while, suddenly one day found that the Main library after writing new data, from the library and did not synchronize on time!! the "Show slave status \g" is immediately executed from the library. Seconds_behind_master is found to be 0, and the slave_io_running and slave_sql_running thread states are Yes, that is , the connection from the library to the main library is still there, and there is no disconnection! However, the change data on the main library is not synchronized to the library for a long time. If there is no human intervention, the main library will be automatically reconnected from the library until one hours later, to continue synchronizing changes to the main library .
When this happens, there is no data delay from the library through normal monitoring. Thus, it is clear that it is not enough to determine whether synchronization is delayed simply by seconds_behind_master=0 ...

After discovering this problem, our manual intervention only requires that the following two steps are re-replicated from the repository to resolve this issue:
mysql> stop Slave;
mysql> start slave;

After you re-execute the copy, modify the Slave_net_timeout parameter as soon as possible.

The reason to wait 1 hours to resynchronize, because slave_net_timeout This parameter is the default is 3600s, it is set in how many seconds to confiscate the main library from the binary Logs events, from the library that the network timeout, Slave The IO thread will reconnect to the main library.

Mysql> Show variables like ' slave_net_timeout '; +-------------------+-------+| Variable_name     | Value |+-------------------+-------+| Slave_net_timeout | 3600  |+-------------------+-------+1 row in Set (0.00 sec)

If you deploy MySQL master-slave synchronization, do not set the Slave_net_timeout this parameter from the library side, encountered above, it will follow the default 3600s (one hour) automatically reconnect the main library, and then continue to synchronize the changes of the main library. This parameter can not be set too large, too general Convention caused database delay or the main repository direct link exception can not be found in time, but the setting is too small will cause the main library No data update frequently re-connected.
As for the Slave_net_timeout This parameter is how much to set, according to their own MySQL main library data Update frequency: The main library Data update frequently, the parameter value is set to a small point, the update does not frequently set a large point.
This parameter is typically set to 5s, 10s, 15s, 20s, 30s, and so on.

Setup method:
Log in directly from the library of MySQL online modification:

mysql> Set Global slave_net_timeout = 5; Query OK, 0 rows affected, 1 Warning (0.00 sec) mysql> Show variables like ' slave_net_timeout '; +-------------------+---- ---+| Variable_name     | Value |+-------------------+-------+| Slave_net_timeout | 5     |+-------------------+-------+1 row in Set (0.01 sec)

Or add in the myc.nf from the library:
[Email protected] ~]# CAT/USR/LOCAL/MYSQL/MY.CNF
....
[Mysqld]
.....
Slave_net_timeout = 5
[Email protected] ~]#/etc/init.d/mysql restart

Therefore, when this parameter is set properly, when encountering the above problem, from the library will follow the set time to actively reconnect the main library synchronization data, there is no need for manual intervention.

Of course, the above scenes are very special, generally the probability is relatively small, but as operations personnel, we need to figure out how to deal with this situation. This requires more in-depth thoroughly understand MySQL replication retry mechanism.

The next step is to analyze this phenomenon based on the MySQL master-slave replication principle
MySQL's replication is the key to differentiating other databases, and is also the basis for scalability and high availability. It is already very intelligent, and only requires us to call change master to specify the Binlog file name and offset location to build the copy relationship from the main library to the standby.
The MySQL replication thread automatically logs the current copy location, automatically connects to the main library when the primary and standby replication is interrupted, and restarts the replication from the location where it was last interrupted. These operations are fully automated and do not require human intervention. This has brought a lot of convenience to our OPS staff, and also hides a lot of details. To really understand the truth of the problem and how to solve it, we still need to really understand the principle of MySQL replication.

1) MySQL master-slave copy of the action is "push" or "pull"
The replication of MySQL is "push", not "pull".
"Pull" means that the MySQL repository is constantly circulating to ask if the main library has data updates, which is a resource-intensive and inefficient way.
"Push" means that MySQL's main library pushes this change to the repository when it has its own data update, which only interacts when the data is changed and consumes less resources.
It is obvious that the "push" approach is more consistent with the energy-saving principles of program operation.

So how does MySQL specifically "push" the column?
In fact, when requesting data change records from the main library, the repository needs to specify the exact number of bytes offset (master_log_pos) of which file (Master_log_file) Binlog from the main library. Correspondingly, the main library initiates a thread of Binlog dump, which starts a record of the change from this location and sends a message to the repository. The repository always listens for changes from the main library and receives one to apply this data change locally.

2) Cause analysis
From the above analysis, we can roughly guess why the show slave status shows everything is OK, but in fact the changes in the main library are not synchronized to the standby library:
In the event of a problem, the Binlog dump program was killed. And the repository as the listener, it has not received any changes, it will assume that the main library for a long time without any changes, resulting in no change data pushed over.
The repository is not able to determine whether the corresponding Binlog dump thread on the main library terminated unexpectedly or for a long time without any data changes. Therefore, for both cases, the repository is displayed as normal.

So the key to this problem is:
Main Library Binlog dump thread kill message cannot be sent to the standby due to network congestion or other reasons, while the repository considers the data on the main library to be changed because the data on both sides is different.
And the repository can only be actively re-connected to the main library after the default 3600s, then it will find the main library data changes, will automatically sync over, this is a long time to wait.

3) Problem avoidance
Based on the above analysis, you can know that MySQL is really unavoidable in this case, then there are ways to avoid:
The passive processing : Modify the Delayed monitoring method, found that the problem of timely processing.
2--Proactive Prevention : set--master-retry-count,--master-connect-retry,--slave-net-timeout copy retry parameters correctly.

   1--passive processing
   MYSQL Latency monitoring most direct capture show The seconds_behind_master in slave status.
    Then, as mentioned above, Seconds_behind_master cannot be used to really measure replication latency between master and standby.
    Recommended monitoring scheme with Percona (pt-heartbeat )

   2--Proactive prevention
    except After you manually re-copy the Stop slave and start slave from the library, you also need to specify three parameters for the copy line Chengtonglian Main library , respectively
    Master-retry-count: Number of connection retries.
   master-connect-retry: Number of seconds to wait after connection failure
   slave-net-timeout : Described above
    where master-connect-retry and master-retry-count need to be specified when the change master is set up for primary and standby replication, and Slave-net-timeout is a global variable that can be set online at MySQL runtime.
    but note that: The Master-connect-retry and Master-retry-count parameters were removed in the Mysql5.6 version, Therefore, the Mysql5.6 version and later versions only set the Slave-net-timeout parameter, which is .

the specific retry policy is:
The repository is over. Slave-net-timeout seconds have not received the data from the main library, it will start the first retry. Then every master-connect-retry second, the repository attempts to re-connect the main library again. It will not give up the retry until Master-retry-count is retried. If the main library is attached to the retry process, it considers the current main library to be good and will start waiting for slave-net-timeout seconds.

The default value for Slave-net-timeout is 3,600 seconds, master-connect-retry defaults to 60 seconds, and Master-retry-count defaults to 86,400 times.
That is, if no data changes are sent to the main library for one hours, the repository will attempt to re-connect the main library.
That's why I'm in the scene, one hours later, the repository will be re-connected to the main library, and the reason for synchronizing data changes.
In this case, if your main library changes more frequently, you can consider a small set of slave-net-timeout to avoid the main library Binlog dump thread terminated, unable to push the latest update.
Of course, slave-net-timeout settings too small also have problems, this will cause if the main library changes are indeed relatively small, the repository frequently reconnect to the main library, resulting in a waste of resources.

MySQL master-Slave Synchronization (4)-Synchronization delay status considerations (Seconds_behind_master and Pt-heartbea)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.