Detailed description of replication consistency with pt-table-checksum and Pt-table-sync checks

Source: Internet
Author: User
Tags crc32 install perl percona

First, Introduction


Pt-table-checksum is one of the Percona-toolkit series tools that can be used to detect the consistency of the data in the master and from the database. The principle is to run on the main library, checksum the synchronized tables and record them. It then compares the checksum of the main table to determine whether the data is consistent. The detection process in blocks, for large tables can be differentiated into multiple blocks, so as to avoid the lock table (according to a unique index to split the table into blocks) will automatically determine the replication delay, the load of master, after the threshold will automatically pause detection.


Pt-table-sync, as its name implies, is used to repair inconsistencies between multiple instances of data. It allows the master-slave data to be repaired to the final consistency, or it can be fixed by applying multiple unrelated database instances that are double-written or multi-writable. At the same time, it also integrates the Pt-table-checksum verification function, which can be repaired on one side or based on the pt-table-checksum calculation results.



Second, the principle of work


1. Calculation of single-row data checksum values

The computational logic, like Pt-table-checksum, is to check the table structure first, get the data type of each column, convert all data types to strings, and then use the The Concat_ws () function joins to calculate the checksum value of the row. Checksum is calculated by default with CRC32.


2. Calculation of checksum value of data block

like the Pt-table-checksum tool, Pt-table-sync intelligently parses the index on the table and then split the table's data into several chunk, calculated with chunk as units. It can be understood that the data of all rows in the chunk is stitched together, and then the value of CRC32 is computed, that is, the checksum value of the chunk is obtained.


3. Bad block detection and repair

In the first two steps, Pt-table-sync and Pt-table-checksum are the same algorithm and principle. And then down, it starts to look different:
Pt-table-checksum is just a checksum, so it stores the checksum results in the statistics table and logs the executed SQL statements into the Binlog, and the task is completed. Statement -level replication passes the calculation logic to the slave library and performs the same calculations from the library. Pt-table-checksum algorithm itself does not care about the delay from the library, the delay is the same calculation (some colleagues do not understand this, can refer to my previous article), does not affect the correctness of the calculation results (but we will still detect the delay, because too much delay will affect  Business, so always add-max-lag to limit the flow).

Pt-table-sync are different. It first completes the calculation of the checksum value of the chunk, and once the checksum value of the same chunk is found to be different, it goes deep into the chunk, compares the rows and fixes the problematic rows. The calculation logic is described below (to fix data inconsistencies in the master-slave structure, for example, the business double-write situation is more complex to fix -because of the problem of conflict resolution and benchmark selection, which is limited to space, not described here):

for each one from the library, each table, looping through the following checksum repair process. for each chunk, a for update lock is added to the checksum. Once the lock is acquired, the show master status value for the current main library is recorded.

Execute the Select master_pos_wait () function from the library, waiting for the location from the library SQL thread to be executed to the show master status. As a guarantee, the contents of the chunk on the master and slave are no longer changed.

Perform a checksum on the chunk, and then compare it to the checksum of the main library.

If the checksum is the same, the master-slave data is consistent, the next chunk continues.

If the checksum is different, it indicates that the chunk is inconsistent. Deep inside the chunk, calculate the checksum and compare rows by line (the comparison process of a single line of checksum is the same as the chunk comparison process, and the single line is actually Chunk's special case of size 1).

If a row is found to be inconsistent, mark it down. Continue to detect the remaining lines until the chunk ends.

a row that is not consistent with the found master, takes the replace into statement, executes the Binlog in the main library once to generate the full amount of the row, and synchronizes to the slave library, which is based on the main library data to fix from the library, for rows in the main library that are not available from the library, insert with replace on the main library (must not be an insert), and for rows that have no main library from the library, Delete is performed in the main library (Pt-table-sync strongly recommends that all data repairs be made only in the main library, not directly from the library, but there are exceptions, as described later ).

All inconsistent rows are fixed until the chunk is repaired. Continue to check and repair the next chunk.

Until this ends all the table fixes from the library. Start repairing the next slave library.


Third, calibration test


1. Related introduction:


Mysql version:5.1.72

OS Version:centos Release 6.4 (Final)

Master ip:10.1.1.6

Slave ip:10.1.1.7


2.Master Server Install Yum Dependency package:


Yum Install Perl perl-devel perl-time-hires perl-dbi perl-dbd-mysq


3. Installing the Percona-toolkit Kit


wget http://www.percona.com/get/percona-toolkit.tar.gz
Tar zxf percona-toolkit-2.2.13.tar.gz
CD percona-toolkit-2.2.13
Perl makefile.pl
Make && make install


4.master/slave database creation user and authorization


Create database pt CHARACTER SET UTF8;
GRANT Update,insert,delete,select, PROCESS, SUPER, REPLICATION SLAVE on * * to ' checksums ' @ ' 10.1.1.6 ' identified by ' ll2g$ 26DQM ';
GRANT all on pt.* to ' checksums ' @ ' 10.1.1.6 ' identified by ' LL2G$26DQM ';

Use PT;
CREATE TABLE IF not EXISTS checksums (
DB char (+) is not NULL,
TBL Char (+) is not NULL,
chunk int not NULL,
Chunk_time float NULL,
Chunk_index varchar (+) NULL,
Lower_boundary text NULL,
Upper_boundary text NULL,
THIS_CRC char (+) not NULL,
this_cnt int not NULL,
MASTER_CRC char (max) NULL,
master_cnt int NULL,
TS timestamp not NULL for DEFAULT current_timestamp on UPDATE current_timestamp,
PRIMARY KEY (db, TBL, chunk),
INDEX ts_db_tbl (TS, DB, TBL)
) Engine=innodb;


5. Check (Master server running)

Pt-table-checksum--nocheck-binlog-format--nocheck-plan--nocheck-replication-filters--replicate=pt.checksums-- Set-vars innodb_lock_wait_timeout=120--databases newtable-u ' checksums '-P ' checksums '-h10.1.1.6


#-h-u-p-p-s-d Connection Information

#--nocheck-replication-filters detection ignores mysql configuration parameter binlog_ignore_db and so on.

#--nocheck-binlog-format Log Format not detected

#--replicate Specifies the checksum stored db and table, such as Test.checksum

#--chunk-size,--chunk-size-limit used to specify the size of the detection block. Stronger controllability

#--ignore-databases/tables/column jumps out of the filter of the specified element

#--lock-wait-timeout InnoDB lock timeout setting, default is 1

#--max-load Set maximum number of concurrent connections

#--replicate-check-only only output inconsistent information.

#--help has this on the line, as well as other details in the documentation.


Note:--no-check-binlog-format ignore check binlog format, otherwise will error, default will go to check statement mode, generally we all use row mode




TS: Time to complete the check.

ERRORS: The number of errors and warnings that occurred while checking.

Diffs:0 is consistent, and 1 means inconsistent. When--no-replicate-check is specified, it will always be 0 when the specified--replicate-check-only displays different information.

Rows: The number of rows in the table.

CHUNKS: The number of blocks that are divided into tables.

Skipped: The number of blocks skipped due to errors or warnings or too large.

Time: The duration of the execution.

Table: The names of the tables being checked.


Note: The main observation diffs column, identifies the difference number.


6. View differences (Slave library Run)


Select db, TBL, sum (this_cnt) as Total_rows, COUNT (*) as chunks  

From checksums

where (master_cnt <> this_cnt or master_crc <> THIS_CRC or isnull ( MASTER_CRC) <> IsNull (THIS_CRC) )

Group by DB, TBL;


7. Copy, eliminate differences (master server run)


Suppose there is a difference in the Newtables database tb1 table

I. Automatic elimination of differences (not recommended)

Pt-table-sync--print--execute--sync-to-master h=10.0.0.7,p=3306,u=checksums,p= ' checksums '--databases=newtables- -tables=tb1


II. Print out SQL statements, human intervention to Slave library execution (recommended)

Pt-table-sync--print--sync-to-master h=10.0.0.249,p=3306,u=checksums,p= ' checksums '--databases=newtables--tables =tb1

Pt-table-sync--print--sync-to-master h=10.0.0.249,p=3306,u=checksums,p= ' checksums '--replicate pt.checksums


#--sync-to-master: Specifies a DSN, which is the IP from which he will go through show processlist or show slave status to Auto Brahma.

#--replicate: Specifies the table obtained through Pt-table-checksum, which will almost always be used for all 2 tools.

#--print: print, but do not execute the command.

#--execute: Executes the command.


Note: Slave need to authorize the main library drop and create temporary tables permissions


8. Inspection


Re-execute the pt-table-checksum once to see if there are any differences.



Iv. Matters of note


1. Replace into to repair the master is never consistent, you must ensure that the table is replaced with a primary key or unique key, or replace into degenerate into insert, can not repair the effect. In this case, Pt-table-sync uses other checksum repair algorithms, but is inefficient, such as group by for all columns and then count (*) (the table must have a primary key!). )。

2. The master-slave data inconsistency needs to be fixed by replace into, which must be a statement-level SQL statement. Pt-table-sync will set all the SQL statements it initiates to the statement format, regardless of the global Binlog_format value. This in the Cascade A-B-C structure, also encounter pt-table-checksum once encountered problems, causing the row format of the relay library from Cucacu is inevitable. Pt-table-sync, however, checks and warns from the Binlog format of the library by default, infinitely recursive.


3. Since pt-table-sync can only fix one table at a time, if the parent table is repaired, it may cause the child table data to be repaired, which may fix an inconsistency and introduce another inconsistency, and you may encounter the same problem if there are triggers on the table. So use caution in cases where there are triggers and primary foreign KEY constraints. Pt-table-sync Tools also do not welcome the structure of master-slave heterogeneous. The Pt-table-sync tool will check for prerequisites by default.

4.pt-table-sync cannot tolerate the delay from the library during the repair process, which is exactly the opposite of Pt-table-checksum. If there is too much delay from the library, Pt-table-sync will hold a for update lock on Chunk for a long time, and then wait for the master_pos_wait from the library to complete or time out. The greater the delay from the library, the longer the wait process, and the longer the primary insecure library lock, the greater the impact on the line. Therefore, the Max-lag must be set strictly.

5. The repair of the data from the library is usually performed in the main library to synchronize the SQL to the slave library. Therefore, when there are multiple slave libraries, repairing the data from the library actually synchronizes the repair statement to all the slave libraries. The cost of data repair depends on the degree of inconsistency from the library to the main library, and if one is very inconsistent with the main library from the library data, for example, if the library has only a table structure, then all the data from the main library needs to be re-poured and then synchronized through Binlog and passed to all the slave libraries. This can put a lot of pressure on the line and even bring down the cluster. The correct approach is to use the Pt-table-checksum check again, determine the degree of inconsistency: if the number of non-synchronization, with Pt-table-sync directly repaired; otherwise, replace it with a backup, and then use Pt-table-sync repair. Description: This actually provides a way of thinking about MyISAM backup: If there is only one MyISAM main library to add from the library, you can: mysqldump the table structure to the from library, start the synchronization, and then use Pt-table-sync to fix the data.


Detailed description of replication consistency with pt-table-checksum and Pt-table-sync checks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.