Consistency checking of master-slave data is a headache, and occasionally it is frustrating to be inconsistent with the master-slave data from a business complaint, or a few inconsistencies between the data from the library. Usually we only have one way to hot-spare the main library and then replace all the slave libraries. This is not only very costly, but also similar to the solution of a cure, it is very disturbing. So we need the right tools, at least for us to answer the following three questions:
- is the delay from the library causing the user to see inconsistent data, or is it true that the master-slave data is inconsistent?
- If not, how large is this ratio?
- Will you show up again next time?
Answering these questions will help us decide whether to fix them and how to fix them, and also help us find inconsistent data to pinpoint the source of the problem. And Percona's pt-table-checksum is what we want.
Pt-table-checksum Introduction
Pt-table-checksum is one of the tools of the famous Percona-toolkit toolset. It generates the checksum of the main library data block by executing statement-based SQL statements in the main library, passes the same SQL statements to the Slave library, computes the checksum of the same data block from the library, and finally compares the checksum values of the same data blocks on the master and slave libraries. This determines whether the master-slave data is consistent. This check is done on a sub-table and is chunked within each table, and the PT tool itself provides a very limited number of current-limiting options, so the impact on the online service is small.
Checksum calculation Principle 1. Calculation of single-row data checksum values
The PT tool checks the structure of the table, obtains the data type of each column, converts all data types to strings, and then joins with the Concat_ws () function to calculate the checksum value of the row. Checksum by default with CRC32, you can define more efficient UDFs yourself.
2. Calculation of checksum value of data block
If a row of calculations checksum and compared from the library, then the efficiency will be very low. The PT tool selects the index on the Smart analysis table and then split the table data into a chunk, which is also calculated in chunk. Therefore, the aggregation function Bit_xor () is introduced. Its function can be understood as the chunk of all the rows in the data, and then calculate the value of CRC32, you get the checksum value of this chunk. The SQL statements are as follows:
There is also count (*), which is used to calculate the number of rows that the chunk contains. Each time the chunk is checksum, the PT tool will perform statistical analysis of time-consuming and intelligently adjust the size of the next chunk to avoid the impact of chunk too large on the line, while also avoiding the chunk being too small and inefficient.
3. How consistency guarantees
When the PT tool calculates the checksum of a chunk on the main library, the main library may be in the update, while the library may be delayed so that there are updates related to this chunk data in Relay-log, how do you ensure that the main library and the From library are "the same" data? The answer is to add for update the current read lock, which guarantees the consistency of the internal data of one of the chunk's main libraries. Otherwise, 1000 people chekcusm the same 1000 rows of data, you can get 1000 different results, you can not avoid MVCC interference! After obtaining the for update lock, the PT tool starts to calculate the checksum value of the chunk and saves the results to the PT tool's self-built result table (in the form of replace into select) and then releases the lock. The statement is eventually passed to the library and executes the same calculation logic.
Internal work process
With the key points above, let's look at the internal working process of the PT tool, such as: a simple explanation of the working process:
Important Options Security Options:
-check-replication-filters whether to check replication filtering rules
-check-slave-tables Check if all the tables and columns are checked from the library
-chunk-size-limit each chunk maximum cannot exceed this size, ignoring it
Speed Limit options:
-check-interval how often to check the master-slave delay, the main library load is up to the limit
-check-slave-lag whether to check only this delay from the library
-max-lag maximum delay, wait more than this
-max-load maximum load, more than this wait
Filter options:
-databases Check only some libraries
-tables Check only some tables
These filtering options are useful when repairing inconsistent data and checking for repair effects.
Other options
-resume interrupted for some reason, next time, do not start from scratch
-chunk-time the time that each chunk is calculated, typically 0.5 seconds by default
-replicate-check-only: Show only information that is not synchronized
-no-check-binlog-format: Do not check the format of Binlog
-create-replicate-table you need to create a table the first time you use it
Examples of usage
master:10.0.0.96
slave:10.0.0.30
Master-Slave data Check command (Main library execution):
./pt-table-checksum--nocheck-replication-filters--databases=db1--replicate=db1.checksums-- Create-replicate-table--host=10.0.0.96--port=3306-uroot-pqazwsx$%^456
./pt-table-checksum--nocheck-replication-filters--databases=common--replicate=common.checksums-- Create-replicate-table--host=10.0.0.96--port=3306-uroot-pqazwsx$%^456--no-check-binlog-format
Note: It is a good idea to create a new user that is dedicated to verifying master-slave data, and the user needs to have the following permissions: SELECT, PROCESS, SUPER, REPLICATION SLAVE (Master library Authorization)
Defects and Precautions
- If the table does not have a primary key or a unique index, or simply does not have any indexes, then the PT tool will be at a loss when chunk the table. However, we must have a primary key for each table when we are building a table.
- -check-binlog-format is the default option and it is not recommended to close it. All SQL statements generated by the Pt-table-checksum tool itself are synchronized to the slave library based on the statement format, which is determined by its implementation principle. However, in A-B-C's cascade replication structure, if B is a copy of the row format, then the data consistency check for B and C will not work. Setting the SQL statement to a statement level on a does not log the set action to Binlog, and this property cannot be passed in cascade.
- In the case of master-slave heterogeneity, the checksum statement may fail to execute from the library, even if the index is inconsistent. For example, there is an index of force index in the SQL statement, but there is no index on the table from the library, which results in a card library.
Summarize
Pt-table-checksum is the best tool for verifying master-slave data inconsistencies. Due to MySQL duplication of its own flaws, or the master-slave switch is not rigorous, or backup software bugs, etc., may lead to inconsistent master-slave data. Regardless of whether you do not care, inconsistencies are there, it is important to see the data for you, the importance of the regular check and repair it.
And see tell: Repair inconsistent data with Pt-table-sync
Article Reprint Connection http://nettedfish.sinaapp.com/blog/2013/06/04/check-replication-consistency-by-pt-table-checksum/
English connection http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
Pt-table-checksum check MySQL master-slave data consistency