A basic tutorial to solve the slave delay problem in Mysql _mysql

Source: Internet
Author: User
Tags mysql version server array

First, the reason analysis
generally speaking, the slave relative master delay is large, the root cause is that the replication thread on slave is not able to really concurrency. Simply put, the concurrency mode (InnoDB engine) completes the transaction submission on master, while on slave, the replication thread has only one SQL thread for Binlog apply, so it is no wonder that slave is far behind master in high concurrency.

ORACLE MySQL version 5.6 begins to support multi-threaded replication, and configuration options slave_parallel_workers enables multi-threaded concurrent replication on slave. However, it can only support concurrent replication between multiple database instances under one instance, and can not really do multiple tables concurrent replication. Therefore, in a larger concurrent load, slave still have no way to catch up with master in time, need to find ways to optimize.

Another important reason is that the traditional MySQL replication is asynchronous (asynchronous), that is, after the master submitted, before the slave on the application again, is not a true sense of synchronization. Even the later Semi-sync repication (semi-synchronous replication) is not a true synchronization, because it only guarantees that the transaction is routed to the slave, but does not require that the transaction commit succeeds until it is confirmed. Since it's asynchronous, that's sure to be a little late. Therefore, strictly speaking, MySQL replication can not be called MySQL synchronization (Virgo interviewers may be in the interview will be said to the MySQL synchronization of all the brush off OH).

In addition, many people in the concept of slave is relatively less important, so will not provide the same level of server with master configuration. Some even not only use worse servers, but also run multiple instances on top.

To synthesize these two main reasons, slave wants to keep up with Master's progress as soon as possible, and you can try the following methods:

The

uses the MARIADB release, which achieves a relatively real parallel replication that is much better than Oracle MySQL. In my scene, using mariadb as an example of slave almost always keeps up with master in time. Each table must explicitly specify a primary key, if you do not specify a primary key, will result in row mode, each modification should be full table scan, especially the large table is very terrible, the delay will be more serious, and even lead to the entire slave library is suspended, can refer to the case: the lack of MySQL primary key lead to the reserve hang;
The application side to do more things, so that the MySQL side less work, especially with IO-related activities, such as: the front-end through memory cache or local write queue, etc., merge multiple read and write for once, or even eliminate some write requests; The
makes the appropriate sub library, the table strategy, reduces the library Tanku duplication pressure, avoids because the library Tanku pressure causes the entire instance the replication delay;
Other ways to improve IOPS performance, according to the effect, I made a simple sort:
to replace the SSD, or PCIe SSD, such as IO equipment, the increase in IOPS capacity of the average 15K SAS disk number of hundred times, million, or even hundreds of thousands of times times;
Increase physical memory, The corresponding increase of InnoDB Buffer pool size, so that more thermal data in memory, reduce the occurrence of physical IO frequency;
Adjust the file system to XFS or ReiserFS, compared to ext3 can greatly improve IOPS capabilities. Under high ioPS pressure, compared to EXT4 has a more robust IOPS performance (some people think that XFS in a special scenario will be a big problem, but we have less than 10% of the remaining disk space when throwing data, other not encountered);
Adjust the raid level to raid 1+0, which increases IOPS performance compared to RAID1, RAID5, and more. If you have all SSD devices, you can make RAID 1 on 2 disks, or as many as RAID 5 (and you can set up a global hot spare, improve array fault tolerance), and even some tyrants users to make a RAID 50 directly from multiple SSD disks;
Adjust the raid write cache policy for WB or Force WB, please refer to: Common PC Server array card, hard disk health monitoring and PC server array card management simple handbook;
Adjust the IO scheduler of the kernel, prioritize the use of deadline, and if it is SSD, you can use the NoOp policy, compared to the default CFQ, the performance of the IOPS is increased by at least several times.

Second, how to solve
usually receive more about the main standby delay of the alarm:

Check_ins_slave_lag (err_cnt:1) Critical-slavelag on ins:3306=39438

It is believed that the slave delay is a long conversation problem for MySQL DBA. First, analyze the risks associated with the slave delay.
A. Under exceptional circumstances, the principal and subordinate ha cannot switch. The HA software needs to check the consistency of the data and the host is inconsistent when the delay occurs.
B. backup copy hang can cause backups to fail (flush tables with read lock 900s timeout)
C. Backups based on slave, the data is not up to date, but is deferred.
How to deal with such problems, how to avoid? Analyze several reasons that lead to a standby delay
1. Row mode has no primary key, no index, or low index discrimination.

Has the following characteristics
A. Show slave status shows position has not changed
B. Show open tables shows that a table has been in_use for 1
C. Show create table look at the table structure to see that there are no primary keys, no indexes, or poor index differentiation.

Workaround:
A. To find several fields with a higher table-distinguishing ratio, you can use this method to determine:

Select COUNT (*) from XX; 
  Select COUNT (*) from (select distinct xx from xxx) t;

If the results of 2 query count (*) are similar, you can index these fields
B. Prepare the library stop slave;
may be performed longer because the transaction needs to be rolled back.
C. Standby repository

  Set sql_log_bin=0;
  ALTER TABLE XX add key xx (XX);

The old version of the slave application Binlog will only select the first index, you need to put the new index on the front, you can first delete the old index, build a new index, and then build the old index. Can be executed in one SQL.
D. Prepare the library start slave
If it is InnoDB, you can use show InnoDB status to see rows_inserted,updated,deleted,selected these indicators to judge.
If more records are modified per second, replication is executing at a faster rate.

2 Mixed mode without index or SQL slow
show the full processlist from the library to see the SQL being executed.
Workaround:
A. SQL is simpler, check for missing indexes, and add indexes.
B. The other is the statement that inserts into the select from, and if the select contains group by, multiple table associations may be less efficient.
This class can go to the main library to change the Binlog_format to row.

3 The main library has a large transaction, resulting in a delay from the library
phenomenon Analysis Binlog found similar to the following picture of the situation look

Workaround:
Communicate with development, increase caching, write asynchronously to the database, and reduce the number of writes directly to DB.

4. The main library writes frequently, from the storehouse pressure does not keep to cause the delay
The main phenomenon of this kind of reason is that the IUD operation of the database is very many, slave because of sql_thread single-threaded reason cannot catch up with the main storehouse.
Workaround:
A upgrades the hardware configuration from the library, such as Ssd,fio.
b Use the @ Ding preheating tool-relay Fetch
Before the standby SQL thread performs an update, loading the appropriate data into memory does not improve the ability of the sql_thread thread to execute SQL, or speed up the io_thread thread to read the log.
C Use multithreading replication Ali MySQL team implementation-based on parallel replication of rows.
This scenario allows two transactions that modify the same table to execute in parallel, as long as the two transactions modify different rows in the table. This scheme can achieve higher concurrency between transactions, but the limitation is that Binlog must be used in row format. Because only binlog with the row format can know the scope of a row modified by a transaction, Binlog with the statement format can only know the modified Table object.

5. There are a large number of MyISAM tables in the database, resulting in slave latency during backup

Because the Xtrabackup tool backs up to the end, flash tables with read lock is executed, the database is locked for a consistent backup, and then for the MyISAM table lock, the Salve_sql_thread stall is blocked and the hang
A better solution to the problem now is to modify the table structure as a InnoDB storage engine.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.