In a MySQL replication environment, we usually only judge the latency of the slave based on the value of the seconds_behind_master . This is acceptable in most cases, but is not accurate enough and should be considered for more factors.
First, let's look at the state of slave:
[Email protected] [(none)]> show slave status\g*************************** 1. Row ***************************slave_io_state:waiting for Master to send event***master_log_file: Mysql-bin.000327read_master_log_pos:668711237relay_log_file:mysql-relay-bin.002999relay_log_pos:214736858relay _master_log_file:mysql-bin.000327slave_io_running:yesslave_sql_running:yes***skip_counter:0exec_master_log_pos : 654409041relay_log_space:229039311***seconds_behind_master:3296* * *
You can see that the value of seconds_behind_master is 3296, which means that slave has delayed at least 3296 seconds.
Let's take a look at the status of the 2 replication processes on slave:
[Email protected] [(None)]> Show full processlist\g*************************** 1. Row ***************************id:6user:system userHost:db:NULLCommand:Connecttime:22005006State: Waiting for Master to send eventinfo:null*************************** 2. Row ***************************id:7user:system userHost:db:NULLCommand:Connecttime:3293state:updating Info:update * * SET * * WHERE * *
You can see that the SQL thread has been performing the update operation, noting that the value of time is 3293, and that it looks like the update operation was executed for 3,293 seconds, a normal SQL, and certainly not take so long.
In fact, in the replication process, the value of the time column may have several things:
1, SQL thread currently executing binlog (actually relay log) in the timestamp and IO thread the latest timestamp difference, this is usually considered seconds_behind_master value, Not the actual execution time of some SQL;
2, SQL thread currently if there is no active SQL execution, the time value is the SQL thread idle time;
The time value of the IO thread, however, is the total length of time (in seconds) since the thread was started, which can cause the timing value to be abnormal, such as a negative number, or very large if the system is modified after the IO thread has started.
Look at the following states:
#设置pager, view only a few status values of interest [email protected] [(none)]> Pager Cat | Egrep-i ' System user| exec_master_log_pos| seconds_behind_master| Read_master_log_pos ' #这是没有活跃SQL的情况, time value is idle time, and Seconds_behind_master is 0[email protected] [(none)]> Show Processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004245 | Waiting for Master to send event | NULL | | 7 | System user | | NULL | Connect | -| Have read all relay log;**read_master_log_pos:445167889exec_master_log_pos:445167889seconds_behind_master:0#和上面一样 [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004248 | Waiting for Master to send event | NULL | | 7 | System user | | NULL | Connect | -| Have read all relay log;**read_master_log_pos:445167889exec_master_log_pos:445167889seconds_behind_master:0#这时有活跃SQL了, the time value is the same as Seconds_behind_master, that is, the SQL thread is "slower" than the IO thread for 1 seconds [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004252 | Waiting for Master to send event | NULL | | 7 | System user | | Floweradmin | Connect |1| Updating | Update **read_master_log_pos:445182239exec_master_log_pos:445175263seconds_behind_master:1#和上面一样 [email protected] [(none)]> show processlist; Show Slave status\g| 6 | System user | | NULL | Connect | 22004254 | Waiting for Master to send event | NULL | | 7 | System user | | Floweradmin | Connect |1| Updating | Update **read_master_log_pos:445207174exec_master_log_pos:445196837seconds_behind_master:1
Well, finally we say how to correctly judge the delay of slave:
1, first see relay_master_log_file and master_log_file whether there are differences;
2, if relay_master_log_file and master_log_file are the same, then see exec_master_log_pos and Read_ Master_log_pos differences, compared to how many binlog events a SQL thread slows than an IO thread;
3, if relay_master_log_file and master_log_file are not the same, it indicates that the delay may be larger, need to obtain binlog status from the Master, Judging the current gap between Binlog and master;
Therefore, a relatively more rigorous approach is:
On the third-party monitoring node, the master and SLAVE simultaneously initiate the request of show BINARY LOGS and show SLAVE status\g , and finally determine the difference between the Binlog and exec_ the difference between Master_log_pos and Read_master_log_pos .
For example:
The result of executing SHOW BINARY LOGS on Master is:
+------------------+--------------+| Log_name | File_size |+------------------+--------------+| mysql-bin.000009 | 1073742063 | | mysql-bin.000010 | 107374193 |+------------------+--------------+
The result of performing SHOW SLAVE status\g on SLAVE is:
master_log_file:mysql-bin.000009 read_master_log_pos:668711237 relay_master_log_file:mysql-bin.000009slave_io_running:yesslave_sql_running:yes***Exec_Master_Log_Pos: 654409041* * *seconds_behind_master:3296* * *
At this time, the actual delay of slave should be:
mysql-bin.000009 The difference delay between 1073742063 slave read on binlog position Binlog and position in this binlog, i.e.:
1073742063-654409041 = 419333022 a binlog event
And also add mysql-bin.000010 This binlog has produced 107374193 Binlog event, a total of
107374193 + 419333022 = 526707215 Binlog Event
Monitor slave latency status under MySQL master-slave replication