Use pt-table-sync to restore inconsistent data

Source: Internet
Author: User
Tags crc32
In the previous article, pt-table-checksum was used to verify data consistency. This section describes how to verify master-slave data consistency. when determining whether master-slave data is inconsistent, you must consider how to fix it. Push here

In the previous article, pt-table-checksum was used to verify data consistency. This section describes how to verify master-slave data consistency. when determining whether master-slave data is inconsistent, you must consider how to fix it. Push here

In the previous article, pt-table-checksum was used to verify data consistency. This section describes how to verify master-slave data consistency. when determining whether master-slave data is inconsistent, you must consider how to fix it. We recommend another important tool of percona-toolkit: pt-table-sync.

Percona-toolkit installation and introduction

Percona-toolkit: pt-summary, pt-mysql-summary, pt-config-diff, and pt-variable-advisor

Introduction to pt-table-sync

As the name suggests, it is used to repair data inconsistency between multiple instances. It can restore the master-slave data to the final consistency, or repair multiple unrelated database instances through the Application dual-write or multi-write. At the same time, it also integrates the pt-table-checksum verification function, which can be repaired while being verified, or repaired based on the calculation results of pt-table-checksum.

Working Principle 1. Calculation of the checksum value of a single row

Like pt-table-checksum, the calculation Logic first checks the structure, obtains the Data Types of each column, converts all data types to strings, and then uses concat_ws () function to calculate the checksum value of the row. Checksum is calculated using crc32 by default.

2. Calculation of the checksum value of the data block

Like the pt-table-checksum tool, pt-table-sync intelligently analyzes table indexes and splits table data into several chunks. chunks are used as the unit for calculation. It can be understood that the data of all rows in the chunk is spliced, and then the crc32 value is calculated to obtain the checksum value of the chunk.

3. Bad Block Detection and repair

In the previous two steps, pt-table-sync and pt-table-checksum have the same algorithm and principle. Next, it starts to be different:

Pt-table-checksum is only for verification. Therefore, it stores the checksum result to the statistical table and records the executed SQL statement to the binlog. The task is completed. Statement-level replication transfers the computing logic to the slave database and performs the same computing in the slave database. The pt-table-checksum algorithm does not care about the slave database latency. The latency is calculated in the same way (some colleagues do not understand this. For details, refer to my previous article ), it will not affect the correctness of the calculation results (but we will still check the delay because too much delay will affect the business, so we always need to add-max-lag to limit the traffic ).
Pt-table-sync is different. It first needs to complete the checksum value calculation of the chunk. Once the checksum value of the same chunk on the master and slave is found to be different, it goes deep into the chunk and compares and fixes problematic rows row by row. Its computing logic is described as follows (for example, to fix data inconsistency between the master and slave structures, business dual writes are more complicated to fix-due to issues related to conflict resolution and benchmark selection, it is limited to space, ):

Important options Security Options

-[No] check-triggers: check whether a trigger exists. If yes, a warning is returned.
-[No] foreign-key-checks the primary and foreign key constraints by default. If any, a warning is displayed.
-[No] unique-checks: Check whether a unique index exists. If no index exists, a warning is displayed.

Filter options

-Ignore-databases
-Ignore-engines
-Ignore-tables

Other options

-Replicate = s is used in combination with pt-table-checksum. It is repaired only and not verified. Results verified before pt-table-checksum
-Bidirectional synchronization. Generally, the data in the master database prevails. If bidirectional synchronization is enabled, it is complicated to define conflict resolution rules.

Example

Assume 10.55.55.55 is the master database and 10.73.73.73 is its slave database. The port number is 3306.
1. Verify first:
PTDEBUG = 1. /pt-table-checksum -- user = user -- password = pass -- host = 10.55.55.55 -- port = 3306 -- databases = elink -- tables = my_cms_10 -- recursion-method = processlist 2. based on the verification results, only the inconsistency between the slave database 10.73.73.73 and the master database is fixed:
PTDEBUG = 1. /pt-table-sync -- execute -- replicate percona. checksums -- sync-to-master h = 10.73.73.73, P = 3306, u = user, p = pass 3. after the repair, verify again. Execute the statement in step 1.
4. check the repair result: log on to 10.73.73.73 and run the following SQL statement. If the returned result is null, the repair is successful: select * from percona. checksums where master_cnt <> this_cnt OR master_crc <> this_crc or isnull (master_crc) <> ISNULL (this_crc)

Summary

The pt-table-sync tool is complex and powerful. It modifies the database content, which may cause security incidents. To ensure security, we recommend that you use pt-table-checksum to regularly check data consistency, manually re-build the slave database with severe damage, and finally Use pt-table-sync to fix the remaining slave database.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.