Slave latency monitoring in MySQL replication

Last Update:2014-11-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From Imysql Chinese network http://imysql.com/2014/08/30/mysql-faq-howto-monitor-slave-lag.shtml

In a MySQL replication environment, we usually only judge the latency of the slave based on the value of the Seconds_behind_master. This is acceptable in most cases, but is not accurate enough and should be considered for more factors.

First, let's look at the state of slave:

[Email protected] [(None)]>Show Slave Status\g***************************1. Row ***************************slave_io_state:waiting forMaster to send event***Master_log_file:mysql-bin.000327Read_master_log_pos:668711237Relay_log_file:mysql-relay-bin.002999Relay_log_pos:214736858Relay_master_log_file:mysql-bin.000327Slave_IO_Running:YesSlave_SQL_Running:Yes***Skip_counter:0Exec_master_log_pos:654409041Relay_log_space:229039311***seconds_behind_master:3296***

You can see that The value of Seconds_behind_master is 3296, which means that slave has delayed at least 3296 seconds.

Let's take a look at the status of the 2 replication processes on slave:

[Email protected] [(None)]>Show full Processlist\g***************************1. Row ***************************Id:6User:system userHost:db:NULLCommand:Connect time: 22005006state:waiting forMaster to send Eventinfo:null***************************2. Row ***************************Id:7User:system userHost:db:NULLCommand:Connect time: 3293 state:updatinginfo:update* * SET * * WHERE * *

You can see that the SQL thread has been performing the update operation, noting that the value of time is 3293, and that it looks like the update operation was executed for 3,293 seconds, a normal SQL, and certainly not take so long. In fact, in the replication process, the value of the time column may have several things:

1, SQL thread currently executing binlog (actually relay log) in the timestamp and IO thread the latest timestamp difference, this is usually considered seconds_behind_master value, Not the actual execution time of some SQL;

2, SQL thread currently if there is no active SQL execution, the time value is the SQL thread idle time;

The time value of the IO thread, however, is the total length of time (in seconds) since the thread was started, which can cause the timing value to be abnormal, such as a negative number, or very large if the system is modified after the IO thread has started.

Look at the following states:

#设置pager, see only a few status values of interest

Cat Egrep ' system user| exec_master_log_pos| seconds_behind_master| Read_master_log_pos'

#这是没有活跃SQL的情况, the time value is idle time, and Seconds_behind_master is 0

[Email protected] [(None)]> show processlist; show slave status\g622004245for713  | Have read all relay log;**445167889445167889seconds_behind_master:0

#和上面一样

[Email protected] [(None)]> show processlist; show slave status\g622004248for716  | Have read all relay log;**445167889445167889seconds_behind_master:0

#这时有活跃SQL了, the time value is the same as Seconds_behind_master, where the SQL thread is "slower" than the IO thread for 1 seconds

[Email protected] [(None)]> show processlist; show slave status\g622004252for71  | Updating | UPDATE **4451822394451752631

#和上面一样

[Email protected] [(None)]> show processlist; show slave status\g622004254for71  | Updating | UPDATE * *445207174445196837seconds_behind_master:1

Well, finally we say how to correctly judge the delay of slave:

1, first see relay_master_log_file and master_log_file whether there are differences;

2. If relay_master_log_file and master_log_file are the same words, look again The difference between Exec_master_log_pos and read_master_log_pos, comparing how many binlog events the SQL thread slows than the IO thread;

3, if relay_master_log_file Master_log_file

Therefore, it is relatively more rigorous that, on the third-party monitoring node, both master and SLAVE are simultaneously initiating a request for show BINARY LOGS and show SLAVE status\g . Finally, the differences between the two binlog are judged, as well as the differences between Exec_master_log_pos and read_master_log_pos .

For example: The result of executing SHOW BINARY LOGS on Master is:

+------------------+--------------+| Log_name | File_size |+------------------+--------------+| mysql-bin.000009 | 1073742063 | | mysql-bin.000010 | 107374193 |+------------------+--------------+

The result of performing SHOW SLAVE status\g on SLAVE is:

master_log_file:mysql-bin.000009 read_master_log_pos:668711237 relay_master_log_file:mysql-bin.000009slave_io_running:yesslave_sql_running:yes***Exec_Master_Log_Pos: 654409041* * *seconds_behind_master:3296* * *

At this time, slave actual delay should be: mysql-bin.000009 This binlog in the Binlog position 1073742063 and the difference delay between the Binlog position read on the slave, i.e.:

1073742063-654409041 = 419333022 a binlog event

And also add mysql-bin.000010 This binlog has produced 107374193 binlog event, a total of

107374193 + 419333022 = 526707215 Binlog Event

PostScript Update:

1, can maintain a monitoring table on the master, it has only one field, store the latest time stamp (high version can be updated with Event_scheduler, the lower version can be updated with cron combined with automatic looping script), the time to read the field on slave, As long as the system time of master and slave is consistent, you can quickly know how much slave and master latencies are poor. However, in a highly concurrent system, this timestamp can be refined to milliseconds, otherwise even if the time is consistent, it is possible to delay the thousands of Binlog event.

2, Netizens (Li Daiu, qq:407361231) Careful expenditure above the calculation delay error, should be mysql-bin.000009 the maximum number of events minus the number of events that have been executed, that is 1073742063–654409041= 419,333,022 Binlog event, plus mysql-bin.000010 this binlog has produced 107,374,193 Binlog event, a total of 526,707,215 Binlog event.

Slave latency monitoring in MySQL replication

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Slave latency monitoring in MySQL replication

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Slave latency monitoring in MySQL replication

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support