Basic tutorial on solving the Slave latency problem in MySQL, mysqlslave

Last Update:2015-11-27 Source: Internet

Author: User

Tags server array

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic tutorial on solving the Slave latency problem in MySQL, mysqlslave

I. Cause Analysis
Generally, slave has a large latency compared to the master. The root cause is that the replication thread on the slave cannot truly achieve concurrency. Simply put, on the master node, the transaction is committed in the concurrency mode (mainly based on the InnoDB engine), while on the slave, the replication thread has only one SQL thread for binlog apply, so it is no wonder that slave will lag far behind the master in high concurrency.

ORACLE MySQL 5.6 supports multi-thread replication. You can configure the slave_parallel_workers option to achieve multi-thread concurrent replication on the slave. However, it can only support concurrent replication between multiple databases under an instance, and cannot truly achieve multi-Table concurrent replication. Therefore, when there is a large concurrent load, slave still cannot catch up with the master in time, and it is necessary to find a way to optimize it.

Another important reason is that the traditional MySQL replication is asynchronous (asynchronous). That is to say, after the master node is submitted, it is applied again on the slave, which is not a real synchronization. Even the subsequent Semi-sync Repication (Semi-synchronous replication) is not a real synchronization because it only guarantees that the transaction is transmitted to the slave, but does not need to wait until the transaction is committed successfully. Since it is asynchronous, there must be latency. Therefore, in a strict sense, MySQL replication cannot be called MySQL synchronization (the interviewer of Virgo may say that MySQL synchronization is always flushed during the interview ).

In addition, slave is not that important in many people's ideas, so it will not provide servers with the same configuration level as the master. Some even use even worse servers and run multiple instances on them.

Based on these two main reasons, slave can try the following methods to keep up with the progress of the master as soon as possible:

The MariaDB release version is used to implement parallel replication in a relatively real sense, which is far better than ORACLE MySQL. In my scenario, MariaDB is used as a server Load balancer instance, and it is almost always able to keep up with the master in time. A primary key must be explicitly specified for each table. If no primary key is specified, a full table scan is required for each modification in row mode, especially for large tables, latency is more serious, and even the entire slave database is suspended. For more information, see the case: hang of the slave database due to the lack of the mysql primary key;
The application end does more things to make MySQL end do less things, especially IO-related activities. For example, the front-end uses the memory CACHE or local write queue to merge multiple reads and writes for one time, even some write requests are eliminated;
Appropriate database/table sharding policies are implemented to reduce the copy pressure on a single database/table, so as to avoid the replication delay of the entire instance due to the pressure on a single database/table;
Based on the advantages and disadvantages of other methods to improve IOPS performance, I made a simple sorting:
Switching to an SSD, or PCIe SSD or other I/O devices, the IOPS capability is improved by hundreds, ten thousand times, or even several 100,000 times of ordinary 15 K SAS disks;
Increase the physical memory, increase the InnoDB Buffer Pool size, and put more hot data in the memory to reduce the frequency of physical IO;
By adjusting the file system to XFS or ReiserFS, The IOPS capability is greatly improved compared with ext3. Under high IOPS pressure, there is more stable IOPS performance than ext4 (some people think that XFS will have a big problem in special scenarios, however, we have not encountered any loss of data except when the remaining disk space is less than 10% );
Adjust the RAID level to raid 1 + 0, which improves IOPS performance compared with RAID 1 and raid 5. If you already have all the SSD devices, you can make two disks into RAID 1, or how fast the disk can be made into RAID 5 (and you can set a global hot spare disk to Improve Array fault tolerance ), even some local tyrants directly make multiple SSD disks into RAID 50;
Adjust the RAID write cache Policy to WB or force wb. For details, refer to: Commonly Used PC server array cards, hard disk health monitoring, and PC server array card management simple manual;
Adjust the I/O schedline of the kernel and use deadline first. If SSD is used, the noop policy can be used. The performance improvement of IOPS is at least several times higher than the default cfq.

Ii. Solutions
Generally, many alarms about the active/standby latency are received:

check_ins_slave_lag (err_cnt:1)critical-slavelag on ins:3306=39438

I believe that slave latency is a problem that MySQL DBAs encounter. Analyze the risks caused by slave latency.
A. Master-slave HA cannot be switched in case of exceptions. The HA software needs to check data consistency. When the delay occurs, the master and slave nodes are inconsistent.
B. Standby Database Replication hang will cause backup failure (flush tables with read lock will timeout for S)
C. The data backed up based on slave is not up-to-date, but delayed.
How can we solve and avoid such problems? Analyze several causes of standby database Delay
1. the ROW mode has no primary key, no index, or the index discrimination is not high.

Has the following features:
A. show slave status: the position has not changed.
B. show open tables: displays that a table is always in_use 1.
C. show create table to view the table structure. You can see that there is no primary key or no index, or the index discrimination is poor.

Solution:
A. You can use this method to locate the fields with High Table discrimination:

select count(*) from xx;   select count(*) from (select distinct xx from xxx) t;

If the results of the two queries are similar to those of count (*), you can index these fields.
B. slave database stop slave;
It may be executed for a long time because the transaction needs to be rolled back.
C. Slave Database

  set sql_log_bin=0;  alter table xx add key xx(xx);

In earlier versions, slave only selects the first index when applying binlog. You need to put the newly added index at the beginning. You can delete the old index and create a new index first, create the old index. It can be executed in an SQL statement.
D. slave database start slave
If innodb is used, you can use show innodb status to view the rows_inserted, updated, deleted, and selected indicators to determine.
If the number of records modified per second is large, the replication process is running at a fast speed.

2. No index or slow SQL statement in MIXED Mode
Show full processlist on the database to view the SQL statement being executed.
Solution:
A. If SQL is relatively simple, check whether the index is missing and add the index.
B. The other type is the insert into select from statement. If the select statement contains group by and multi-Table Association, the efficiency may be relatively low.
You can change binlog_format to row in the master database.

3 There are large transactions in the master database, resulting in slave database latency
Symptom parsing: binlog discovery is similar.

Solution:
Communicate with developers, increase cache, write data to the database asynchronously, and reduce the amount of data written directly to the database.

4. Frequent writes to the master database, and the slave database cannot keep up with the pressure, resulting in latency
The main cause of this problem is that the database has a lot of intrauterine operation, and slave cannot catch up with the master database because of the single thread of SQL _thread.
Solution:
A. Upgrade the hardware configurations of the slave database, such as ssd and fio.
B. Use @ dingqi's push tool-relay fetch
Before the SQL thread of the slave database executes updates, it loads the corresponding data into the memory in advance, which cannot improve the SQL Execution capability of the SQL _thread thread or speed up reading logs by the io_thread thread.
C. The solution implemented by the Alibaba MySQL team using multi-threaded replication-row-based parallel replication.
This scheme allows two transactions modified for the same table to be executed in parallel, as long as the two transactions modify different rows in the table. This scheme can achieve a higher degree of concurrency between transactions, but the limitation is that the binlog in the Row format must be used. Only binlog in the Row format can know the range of rows modified by a transaction, while binlog in the Statement format can only know the modified table object.

5. A large number of myisam tables exist in the Database, resulting in slave latency during Backup

Since the xtrabackup tool backs up to the end, it will execute flash tables with read lock to lock the database table for consistent backup, and then the myisam Table lock will hinder salve_ SQL _thread from stopping and causing hang
A better solution to this problem is to change the table structure to the innodb Storage engine table.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More