Slave latency monitoring in MySQL replication

Source: Internet
Author: User

From Imysql Chinese network http://imysql.com/2014/08/30/mysql-faq-howto-monitor-slave-lag.shtml

In a MySQL replication environment, we usually only judge the latency of the slave based on the value of the Seconds_behind_master. This is acceptable in most cases, but is not accurate enough and should be considered for more factors.

First, let's look at the state of slave:

[Email protected] [(None)]>Show Slave Status\g***************************1. Row ***************************slave_io_state:waiting forMaster to send event***Master_log_file:mysql-bin.000327Read_master_log_pos:668711237Relay_log_file:mysql-relay-bin.002999Relay_log_pos:214736858Relay_master_log_file:mysql-bin.000327Slave_IO_Running:YesSlave_SQL_Running:Yes***Skip_counter:0Exec_master_log_pos:654409041Relay_log_space:229039311***seconds_behind_master:3296***

You can see that The value of Seconds_behind_master is 3296, which means that slave has delayed at least 3296 seconds.

Let's take a look at the status of the 2 replication processes on slave:

[Email protected] [(None)]>Show full Processlist\g***************************1. Row ***************************Id:6User:system userHost:db:NULLCommand:Connect time: 22005006state:waiting forMaster to send Eventinfo:null***************************2. Row ***************************Id:7User:system userHost:db:NULLCommand:Connect time: 3293 state:updatinginfo:update* * SET * * WHERE * *

You can see that the SQL thread has been performing the update operation, noting that the value of time is 3293, and that it looks like the update operation was executed for 3,293 seconds, a normal SQL, and certainly not take so long. In fact, in the replication process, the value of the time column may have several things:

1, SQL thread currently executing binlog (actually relay log) in the timestamp and IO thread the latest timestamp difference, this is usually considered seconds_behind_master value, Not the actual execution time of some SQL;

2, SQL thread currently if there is no active SQL execution, the time value is the SQL thread idle time;

The time value of the IO thread, however, is the total length of time (in seconds) since the thread was started, which can cause the timing value to be abnormal, such as a negative number, or very large if the system is modified after the IO thread has started.

Look at the following states:

#设置pager, see only a few status values of interest
Cat Egrep ' system user| exec_master_log_pos| seconds_behind_master| Read_master_log_pos'

#这是没有活跃SQL的情况, the time value is idle time, and Seconds_behind_master is 0

[Email protected] [(None)]> show processlist; show slave status\g622004245for713  | Have read all relay log;**445167889445167889seconds_behind_master:0

#和上面一样

[Email protected] [(None)]> show processlist; show slave status\g622004248for716  | Have read all relay log;**445167889445167889seconds_behind_master:0

#这时有活跃SQL了, the time value is the same as Seconds_behind_master, where the SQL thread is "slower" than the IO thread for 1 seconds

[Email protected] [(None)]> show processlist; show slave status\g622004252for71  | Updating | UPDATE **4451822394451752631

#和上面一样

[Email protected] [(None)]> show processlist; show slave status\g622004254for71  | Updating | UPDATE * *445207174445196837seconds_behind_master:1


Well, finally we say how to correctly judge the delay of slave:

1, first see relay_master_log_file and master_log_file whether there are differences;

2. If relay_master_log_file and master_log_file are the same words, look again The difference between Exec_master_log_pos and read_master_log_pos, comparing how many binlog events the SQL thread slows than the IO thread;

3, if relay_master_log_file Master_log_file

Therefore, it is relatively more rigorous that, on the third-party monitoring node, both master and SLAVE are simultaneously initiating a request for show BINARY LOGS and show SLAVE status\g . Finally, the differences between the two binlog are judged, as well as the differences between Exec_master_log_pos and read_master_log_pos .

For example: The result of executing SHOW BINARY LOGS on Master is:

+------------------+--------------+| Log_name | File_size |+------------------+--------------+| mysql-bin.000009 | 1073742063 | | mysql-bin.000010 | 107374193 |+------------------+--------------+

The result of performing SHOW SLAVE status\g on SLAVE is:

master_log_file:mysql-bin.000009 read_master_log_pos:668711237 relay_master_log_file:mysql-bin.000009slave_io_running:yesslave_sql_running:yes***Exec_Master_Log_Pos: 654409041* * *seconds_behind_master:3296* * * 

At this time, slave actual delay should be: mysql-bin.000009 This binlog in the Binlog position 1073742063 and the difference delay between the Binlog position read on the slave, i.e.:

1073742063-654409041 = 419333022 a binlog event

And also add mysql-bin.000010 This binlog has produced 107374193 binlog event, a total of

107374193 + 419333022 = 526707215 Binlog Event

PostScript Update:

1, can maintain a monitoring table on the master, it has only one field, store the latest time stamp (high version can be updated with Event_scheduler, the lower version can be updated with cron combined with automatic looping script), the time to read the field on slave, As long as the system time of master and slave is consistent, you can quickly know how much slave and master latencies are poor. However, in a highly concurrent system, this timestamp can be refined to milliseconds, otherwise even if the time is consistent, it is possible to delay the thousands of Binlog event.

2, Netizens (Li Daiu, qq:407361231) Careful expenditure above the calculation delay error, should be mysql-bin.000009 the maximum number of events minus the number of events that have been executed, that is 1073742063–654409041= 419,333,022 Binlog event, plus mysql-bin.000010 this binlog has produced 107,374,193 Binlog event, a total of 526,707,215 Binlog event.

Slave latency monitoring in MySQL replication

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.