MySQL data consistency check tool-Pt-table-checksum

Source: Internet
Author: User
Tags percona

Mysql itself does not have a Data Consistency Verification tool (mainly for master-slave). It is easy to see data inconsistency between master-slave, so someone developed such a tool, more famous is maatkit a series of tools, but the background it seems not updated, now mainly maintained by http://www.percona.com/doc/percona-toolkit/2.1/ percona.

In fact, the maatkit tool was known last year, because at that time it was always felt that the data between the master-slave was highly inconsistent, but the previous database was mainly a MyISAM table, I am afraid of a large number of table deadlocks.

Percona has a large number of functions in this script, mostly statistics, but there are several awesome scripts. To put it simply, Pt-table-checksum and Pt-table-sync. You can read the document by yourself. I will focus on the principle.

Pt-table-checksum is the script used to check whether the data in the two databases is consistent (the table can also be checked)
Pt-table-sync is a script for synchronizing data differences between two databases (or tables ).

These two scripts are both written in Perl and have over lines for a long time.

Pt-table-checksum
Pt-table-checksumThe main method to check data consistency is to scan a table by using a suitable index. This script is only available, and it is not executed if the index is not suitable, he divides a table into several trunks. If a table has 100 rows and trunks, each trunk has 3 million rows, and he locks the trunk, calculate the checksum and match.

 For example, if both master and slave have a table t, run the following command:

1./Pt-table-checksum -- create-replicate-table -- Host = 'hostnmae-I '-- databases = test -- tables = t -- Port =4331-- Explain-uroot-p123

If this parameter is used for the first time, it cannot be less: -- create-replicate-table, because it mainly initializes the system library. Add -- explain to the end to print more detailed information, such as my test. t table:

1Replace into 'percona '. 'checksums' (dB, TBL, Chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) Select ?, ?, ?, ?, ?, ?, Count (*) as CNT, coalesce (lower (Conv (bit_xor (cast (CRC32 (concat_ws ('#', 'A', 'B', Concat (isnull ('A'), isnull ('B') as unsigned )),10,16)),0) As CRC from 'test'. 'T'/*Checksum table*/ 

 

The preceding statement will be executed on both the master database and slave database. The calculation result is used to determine whether the trunk data is the same.

This statement is mainly used to execute Perl scripts with over lines.CodeIt mainly handles some adaptive work.
This SQL statement looks very complicated. Let's explain how it works:

It will combine all the fields in each row of data in a trunk into a string, and then take the 32-bit verification code for the string. Then, all the calculated verification codes in the trunk will be used for an exception or operation, convert from decimal to hexadecimal. Therefore, both the master database and slave database have a result called CRC, which is used to determine whether the data is consistent by comparing the CRC. Although an operation is performed by a trunk and a trunk, it performs select for update on each trunk mainly because it is worried that there will be write during checksum, so each checksum is not suitable for a large value, he will wait until slave and mater are fully synchronized, and then perform a checksum for the trunk, and then save the checksum result to the table. If slave and master have a delay, he will wait for 1 s for full synchronization and then perform the checksum.

I tested a table of about 50 GB and ran the script. It should be done within an hour.
A table in the system database is used to store the validation results of each trunk.

Pt-table-sync
The Pt-table-sync tool is similar to the Pt-table-checksum principle. It is also the trunk that finds data inconsistency first. After finding the tool, it will compare data row by row. If this table does not have a primary key, this process cannot be successful, because if there is no primary key, it cannot be judged whether the data is the same. If the data difference between the two trunks is small, there will be an update operation to fix it, if a large number of inconsistencies are found, the table on the slave database will be rebuilt.

References:
<A href = 'HTTP: // container> percona_toolkit_2_1_2_operations_manual </a>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.