Slave latency monitoring in MySQL replication

Source: Internet
Author: User

Slave latency monitoring in MySQL replication

Http://imysql.com/2014/08/30/mysql-faq-howto-monitor-slave-lag.shtml

In a MySQL replication environment, we usually only judge the latency of the slave based on the value of the seconds_behind_master . This is acceptable in most cases, but is not accurate enough and should be considered for more factors.

First, let's look at the state of slave:

[Email protected] [(none)]> show slave status\g*************************** 1. Row ***************************slave_io_state:waiting for Master to send event***master_log_file: Mysql-bin.000327read_master_log_pos:668711237relay_log_file:mysql-relay-bin.002999relay_log_pos:214736858relay _master_log_file:mysql-bin.000327slave_io_running:yesslave_sql_running:yes***skip_counter:0exec_master_log_pos : 654409041relay_log_space:229039311***seconds_behind_master:3296* * *

You can see that the value of seconds_behind_master is 3296, which means that slave has delayed at least 3296 seconds.

Let's take a look at the status of the 2 replication processes on slave:

[Email protected] [(None)]> Show full processlist\g*************************** 1. Row ***************************id:6user:system userHost:db:NULLCommand:Connecttime:22005006State: Waiting for Master to send eventinfo:null*************************** 2. Row ***************************id:7user:system userHost:db:NULLCommand:Connecttime:3293state:updating info:update * * SET * * WHERE * *

You can see that the SQL thread has been performing the update operation, noting that the value of time is 3293, and that it looks like the update operation was executed for 3,293 seconds, a normal SQL, and certainly not take so long. In fact, in the replication process, the value of the time column may be in several cases: 1, the current binlog of the SQL thread (actually relay log) and the current timestamp of the IO thread, which is usually considered The Seconds_behind_master value is not the actual execution time of some SQL, 2, the SQL thread is currently not active SQL in the execution of the word value is the SQL thread is idle;

The time value of the IO thread, however, is the total length of time (in seconds) since the thread was started, which can cause the timing value to be abnormal, such as a negative number, or very large if the system is modified after the IO thread has started.

Look at the following states:

#设置pager, view only a few status values of interest [email protected] [(none)]> Pager Cat | Egrep-i ' System user| exec_master_log_pos| seconds_behind_master| Read_master_log_pos ' #这是没有活跃SQL的情况, time value is idle time, and Seconds_behind_master is 0[email protected] [(none)]> Show Processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004245 | Waiting for Master to send event | NULL | | 7 | System user | | NULL | Connect | -| Have read all relay log;**read_master_log_pos:445167889exec_master_log_pos:445167889seconds_behind_master:0#和上面一样 [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004248 | Waiting for Master to send event | NULL | | 7 | System user | | NULL | Connect | -| Have read all relay log;**read_master_log_pos:445167889exec_master_log_pos:445167889seconds_behind_master:0#这时有活跃SQL了, the time value is the same as Seconds_behind_master, that is, the SQL thread is "slower" than the IO thread for 1 seconds [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004252 | Waiting for Master to send event | NULL | | 7 | System user | | Floweradmin | Connect |  1  | Updating | Update **read_master_log_pos:445182239exec_master_log_pos:445175263  Seconds_behind_master:1  #和上面一样 [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004254 | Waiting for Master to send event | NULL | | 7 | System user | | Floweradmin | Connect |   1  | Updating | Update **read_master_log_pos:445207174exec_master_log_pos:445196837  Seconds_behind_master:1     

OK, finally we say how to correctly judge the delay of slave: 1, first look at Relay_master_log_file and Master_log_file is different; 2, if Relay_master_log_file and Master_log_file that's the same thing, look again. Exec_master_log_pos and Read_ Master_log_pos , compare how many Binlog events the SQL thread is slower than the IO thread, 3, if relay_master_ Log_file and Master_log_file not the same, that means the delay may be large, Need to get Binlog status from master to judge the gap between the current Binlog and master;

Therefore, it is relatively more rigorous that, on the third-party monitoring node, both master and SLAVE are simultaneously initiating a request for show BINARY LOGS and show SLAVE status\g . Finally, the differences between the two binlog are judged, as well as The differences between Exec_master_log_pos and Read_master_log_pos .

For example: The result of executing SHOW BINARY LOGS on Master is:

+------------------+--------------+| Log_name | File_size |+------------------+--------------+| mysql-bin.000009 | 1073742063 | | mysql-bin.000010 | 107374193 |+------------------+--------------+

The result of performing SHOW SLAVE status\g on SLAVE is:

master_log_file:mysql-bin.000009 read_master_log_pos:668711237 relay_master_log_file:mysql-bin.000009slave_io_running:yesslave_sql_running:yes***Exec_Master_Log_Pos: 654409041* * *seconds_behind_master:3296* * * 

At this time, slave actual delay should be: mysql-bin.000009 This binlog in the Binlog position 1073742063 and The differential delay between binlog position read on slave, i.e.:

1073742063-668711237 = 405030826 a binlog event

And also add mysql-bin.000010 This binlog has produced 107374193 binlog event, a total of

107374193 + 405030826 = 512405019 Binlog Event

PostScript Update:

A monitoring table can be maintained on master, which has only one field, which stores the latest timestamp (the higher version can be updated with Event_scheduler, the lower version can be updated with Cron in conjunction with the automatic looping script), and the time to read the field on slave. As long as the system time of master and slave is consistent, you can quickly know how much slave and master latencies are poor. However, in a highly concurrent system, this timestamp can be refined to milliseconds, otherwise even if the time is consistent, it is possible to delay the thousands of Binlog event.

Slave latency monitoring in MySQL replication

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.