pt-table-checksum verifying MySQL master-slave data consistency

Source: Internet
Author: User
Tags crc32 vcard percona

Consistency checking of master-slave data is a headache, and occasionally it is frustrating to be inconsistent with the master-slave data from a business complaint, or a few inconsistencies between the data from the library. Usually we only have one way to hot-spare the main library and then replace all the slave libraries. This is not only very costly, but also similar to the solution of a cure, it is very disturbing. So we need the right tools, at least for us to answer the following three questions:

    • is the delay from the library causing the user to see inconsistent data, or is it true that the master-slave data is inconsistent?
    • If not, how large is this ratio?
    • Will you show up again next time?

Answering these questions will help us decide whether to fix them and how to fix them, and also help us find inconsistent data to pinpoint the source of the problem. And Percona's pt-table-checksum is what we want.

Pt-table-checksum Introduction

Pt-table-checksum is one of the tools of the famous Percona-toolkit toolset. It generates the checksum of the main library data block by executing statement-based SQL statements in the main library, passes the same SQL statements to the Slave library, computes the checksum of the same data block from the library, and finally compares the checksum values of the same data blocks on the master and slave libraries. This determines whether the master-slave data is consistent. This check is done on a sub-table and is chunked within each table, and the PT tool itself provides a very limited number of current-limiting options, so the impact on the online service is small.

Checksum calculation Principle 1. Calculation of single-row data checksum values

The PT tool checks the structure of the table, obtains the data type of each column, converts all data types to strings, and then joins with the Concat_ws () function to calculate the checksum value of the row. Checksum by default with CRC32, you can define more efficient UDFs yourself.

2. Calculation of checksum value of data block

If a row of calculations checksum and compared from the library, then the efficiency will be very low. The PT tool selects the index on the Smart analysis table and then split the table data into a chunk, which is also calculated in chunk. Therefore, the aggregation function Bit_xor () is introduced. Its function can be understood as the chunk of all the rows in the data, and then calculate the value of CRC32, you get the checksum value of this chunk. The SQL statements are as follows:
There is also count (*), which is used to calculate the number of rows that the chunk contains. Each time the chunk is checksum, the PT tool will perform statistical analysis of time-consuming and intelligently adjust the size of the next chunk to avoid the impact of chunk too large on the line, while also avoiding the chunk being too small and inefficient.

3. How consistency guarantees

When the PT tool calculates the checksum of a chunk on the main library, the main library may be in the update, while the library may be delayed so that there are updates related to this chunk data in Relay-log, how do you ensure that the main library and the From library are "the same" data? The answer is to add for update the current read lock, which guarantees the consistency of the internal data of one of the chunk's main libraries. Otherwise, 1000 people chekcusm the same 1000 rows of data, you can get 1000 different results, you can not avoid MVCC interference! After obtaining the for update lock, the PT tool starts to calculate the checksum value of the chunk and saves the results to the PT tool's self-built result table (in the form of replace into select) and then releases the lock. The statement is eventually passed to the library and executes the same calculation logic.

Internal work process

With the key points above, let's look at the internal working process of the PT tool, such as: a simple explanation of the working process:

  • 1. Connect to the main library: the PT tool connects to the main library and then automatically discovers all the slave libraries from the main library. By default, show full processlist is used to find the slave library, but this is only valid if the master-slave instance port is the same.
  • 3. Find the main library or whether there are replication filtering rules from the library: This is the option to check for security and default. You can close this check, but this can cause checksum SQL statements to either not sync to the from library or to the table from the library that is not being checksum from the library, which will cause the library to be synchronized from the repository.
  • 5. Start to get the table, a calculation of each.
  • 6. If it is the first chunk of the table, then chunk-size is generally 1000; if it is not the first chunk of the table, then the results analyzed in 19 steps are taken.
  • 7. Check the table structure, make data type conversion, and generate checksum SQL statements.
  • 8. Select the most appropriate method for the split table based on the index and the distribution of the data on the table.
  • 9. Start the checksum table.
  • 10. By default, before chunk a table, the calculation results related to the last table are deleted. Unless-resume.
  • 14. Based on the results of explain, determine whether the size of chunk exceeds the upper limit of the chunk-size you define. If this is exceeded, the chunk will be ignored in order not to affect the performance on the line.
  • 15. Add the for update lock for the line to be checksum and calculate.
  • 17-18. Store the results of the calculation in the MASTER_CRC master_count column.
  • 19. Adjust the size of the next chunk.
  • 20. Wait for the main library to be chased from the library. If there is no delay in running the backup from the library, it is best to check all from the library, and if the delay is found to be the largest from the library delay of more than max-lag seconds, the PT tool will be paused here.
  • 21. If the max-load of the main library is found to exceed a certain threshold, the PT tool will be paused here.
  • 22. Continue to the next chunk until the table is chunk complete.
  • 23-24. Wait for the checksum to be completed from the library to generate aggregated statistical results. Each table is summarized and counted once.
  • 25-26. Loop through each table until the end.
    After the checksum is complete, execute the following SQL statement on each slave library to see if there is a master-slave inconsistency:
  • SELECT * from Percona.checksums where master_cnt <> this_cnt or MASTER_CRC <> THIS_CRC or ISNULL (MASTER_CRC) & Lt;> ISNULL (THIS_CRC) \g

      

Important Options Security Options:

-check-replication-filters whether to check replication filtering rules
-check-slave-tables Check if all the tables and columns are checked from the library
-chunk-size-limit each chunk maximum cannot exceed this size, ignoring it

Speed Limit options:

-check-interval how often to check the master-slave delay, the main library load is up to the limit
-check-slave-lag whether to check only this delay from the library
-max-lag maximum delay, wait more than this
-max-load maximum load, more than this wait

Filter options:

-databases Check only some libraries
-tables Check only some tables
These filtering options are useful when repairing inconsistent data and checking for repair effects.

Other options

-resume interrupted for some reason, next time, do not start from scratch
-chunk-time the time that each chunk is calculated, typically 0.5 seconds by default
-replicate-check-only: Show only information that is not synchronized
-no-check-binlog-format: Do not check the format of Binlog
-create-replicate-table you need to create a table the first time you use it

Examples of usage

master:10.0.0.96
slave:10.0.0.30

Master-Slave data Check command (Main library execution):
./pt-table-checksum--nocheck-replication-filters--databases=db1--replicate=db1.checksums-- Create-replicate-table--host=10.0.0.96--port=3306-uroot-pqazwsx$%^456
./pt-table-checksum--nocheck-replication-filters--databases=common--replicate=common.checksums-- Create-replicate-table--host=10.0.0.96--port=3306-uroot-pqazwsx$%^456--no-check-binlog-format

Note: It is a good idea to create a new user that is dedicated to verifying master-slave data, and the user needs to have the following permissions: SELECT, PROCESS, SUPER, REPLICATION SLAVE (Master library Authorization)

Defects and Precautions
    • If the table does not have a primary key or a unique index, or simply does not have any indexes, then the PT tool will be at a loss when chunk the table. However, we must have a primary key for each table when we are building a table.
    • -check-binlog-format is the default option and it is not recommended to close it. All SQL statements generated by the Pt-table-checksum tool itself are synchronized to the slave library based on the statement format, which is determined by its implementation principle. However, in A-B-C's cascade replication structure, if B is a copy of the row format, then the data consistency check for B and C will not work. Setting the SQL statement to a statement level on a does not log the set action to Binlog, and this property cannot be passed in cascade.
    • In the case of master-slave heterogeneity, the checksum statement may fail to execute from the library, even if the index is inconsistent. For example, there is an index of force index in the SQL statement, but there is no index on the table from the library, which results in a card library.
Summarize

Pt-table-checksum is the best tool for verifying master-slave data inconsistencies. Due to MySQL duplication of its own flaws, or the master-slave switch is not rigorous, or backup software bugs, etc., may lead to inconsistent master-slave data. Regardless of whether you do not care, inconsistencies are there, it is important to see the data for you, the importance of the regular check and repair it.
And see tell: Repair inconsistent data with Pt-table-sync

Article Reprint Connection http://nettedfish.sinaapp.com/blog/2013/06/04/check-replication-consistency-by-pt-table-checksum/

English connection http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

Pt-table-checksum check MySQL master-slave data consistency

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.