Slave latency monitoring in MySQL replication

Last Update:2014-08-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://imysql.com/2014/08/30/mysql-faq-howto-monitor-slave-lag.shtml

In a MySQL replication environment, we usually only judge the latency of the slave based on the value of the seconds_behind_master . This is acceptable in most cases, but is not accurate enough and should be considered for more factors.

First, let's look at the state of slave:

[Email protected] [(none)]> show slave status\g*************************** 1. Row ***************************slave_io_state:waiting for Master to send event***master_log_file: Mysql-bin.000327read_master_log_pos:668711237relay_log_file:mysql-relay-bin.002999relay_log_pos:214736858relay _master_log_file:mysql-bin.000327slave_io_running:yesslave_sql_running:yes***skip_counter:0exec_master_log_pos : 654409041relay_log_space:229039311***seconds_behind_master:3296* * *

You can see that the value of seconds_behind_master is 3296, which means that slave has delayed at least 3296 seconds.

Let's take a look at the status of the 2 replication processes on slave:

[Email protected] [(None)]> Show full processlist\g*************************** 1. Row ***************************id:6user:system userHost:db:NULLCommand:Connecttime:22005006State: Waiting for Master to send eventinfo:null*************************** 2. Row ***************************id:7user:system userHost:db:NULLCommand:Connecttime:3293state:updating info:update * * SET * * WHERE * *

You can see that the SQL thread has been performing the update operation, noting that the value of time is 3293, and that it looks like the update operation was executed for 3,293 seconds, a normal SQL, and certainly not take so long. In fact, in the replication process, the value of the time column may be in several cases: 1, the current binlog of the SQL thread (actually relay log) and the current timestamp of the IO thread, which is usually considered The Seconds_behind_master value is not the actual execution time of some SQL, 2, the SQL thread is currently not active SQL in the execution of the word value is the SQL thread is idle;

The time value of the IO thread, however, is the total length of time (in seconds) since the thread was started, which can cause the timing value to be abnormal, such as a negative number, or very large if the system is modified after the IO thread has started.

Look at the following states:

#设置pager, view only a few status values of interest [email protected] [(none)]> Pager Cat | Egrep-i ' System user| exec_master_log_pos| seconds_behind_master| Read_master_log_pos ' #这是没有活跃SQL的情况, time value is idle time, and Seconds_behind_master is 0[email protected] [(none)]> Show Processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004245 | Waiting for Master to send event | NULL | | 7 | System user | | NULL | Connect | -| Have read all relay log;**read_master_log_pos:445167889exec_master_log_pos:445167889seconds_behind_master:0#和上面一样 [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004248 | Waiting for Master to send event | NULL | | 7 | System user | | NULL | Connect | -| Have read all relay log;**read_master_log_pos:445167889exec_master_log_pos:445167889seconds_behind_master:0#这时有活跃SQL了, the time value is the same as Seconds_behind_master, that is, the SQL thread is "slower" than the IO thread for 1 seconds [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004252 | Waiting for Master to send event | NULL | | 7 | System user | | Floweradmin | Connect |  1  | Updating | Update **read_master_log_pos:445182239exec_master_log_pos:445175263  Seconds_behind_master:1  #和上面一样 [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004254 | Waiting for Master to send event | NULL | | 7 | System user | | Floweradmin | Connect |   1  | Updating | Update **read_master_log_pos:445207174exec_master_log_pos:445196837  Seconds_behind_master:1

OK, finally we say how to correctly judge the delay of slave: 1, first look at Relay_master_log_file and Master_log_file is different; 2, if Relay_master_log_file and Master_log_file that's the same thing, look again. Exec_master_log_pos and Read_ Master_log_pos , compare how many Binlog events the SQL thread is slower than the IO thread, 3, if relay_master_ Log_file and Master_log_file not the same, that means the delay may be large, Need to get Binlog status from master to judge the gap between the current Binlog and master;

Therefore, it is relatively more rigorous that, on the third-party monitoring node, both master and SLAVE are simultaneously initiating a request for show BINARY LOGS and show SLAVE status\g . Finally, the differences between the two binlog are judged, as well as The differences between Exec_master_log_pos and Read_master_log_pos .

For example: The result of executing SHOW BINARY LOGS on Master is:

+------------------+--------------+| Log_name | File_size |+------------------+--------------+| mysql-bin.000009 | 1073742063 | | mysql-bin.000010 | 107374193 |+------------------+--------------+

The result of performing SHOW SLAVE status\g on SLAVE is:

master_log_file:mysql-bin.000009 read_master_log_pos:668711237 relay_master_log_file:mysql-bin.000009slave_io_running:yesslave_sql_running:yes***Exec_Master_Log_Pos: 654409041* * *seconds_behind_master:3296* * *

At this time, slave actual delay should be: mysql-bin.000009 This binlog in the Binlog position 1073742063 and The differential delay between binlog position read on slave, i.e.:

1073742063-668711237 = 405030826 a binlog event

And also add mysql-bin.000010 This binlog has produced 107374193 binlog event, a total of

107374193 + 405030826 = 512405019 Binlog Event

PostScript Update:

A monitoring table can be maintained on master, which has only one field, which stores the latest timestamp (the higher version can be updated with Event_scheduler, the lower version can be updated with Cron in conjunction with the automatic looping script), and the time to read the field on slave. As long as the system time of master and slave is consistent, you can quickly know how much slave and master latencies are poor. However, in a highly concurrent system, this timestamp can be refined to milliseconds, otherwise even if the time is consistent, it is possible to delay the thousands of Binlog event.

Slave latency monitoring in MySQL replication

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Slave latency monitoring in MySQL replication

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Slave latency monitoring in MySQL replication

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support