Use pt-table-sync to restore inconsistent data
In the previous article, pt-table-checksum was used to verify data consistency. This section describes how to verify master-slave data consistency. when determining whether master-slave data is inconsistent, you must consider how to fix it. We recommend another important tool of percona-toolkit: pt-table-sync.
Percona-toolkit installation and introduction
Percona-toolkit: pt-summary, pt-mysql-summary, pt-config-diff, and pt-variable-advisor
Introduction to pt-table-sync
As the name suggests, it is used to repair data inconsistency between multiple instances. It can restore the master-slave data to the final consistency, or repair multiple unrelated database instances through the Application dual-write or multi-write. At the same time, it also integrates the pt-table-checksum verification function, which can be repaired while being verified, or repaired based on the calculation results of pt-table-checksum.
Working Principle 1. Calculation of the checksum value of a single row
Like pt-table-checksum, the calculation Logic first checks the structure, obtains the Data Types of each column, converts all data types to strings, and then uses concat_ws () function to calculate the checksum value of the row. Checksum is calculated using crc32 by default.
2. Calculation of the checksum value of the data block
Like the pt-table-checksum tool, pt-table-sync intelligently analyzes table indexes and splits table data into several chunks. chunks are used as the unit for calculation. It can be understood that the data of all rows in the chunk is spliced, and then the crc32 value is calculated to obtain the checksum value of the chunk.
3. Bad Block Detection and repair
In the previous two steps, pt-table-sync and pt-table-checksum have the same algorithm and principle. Next, it starts to be different:
Pt-table-checksum is only for verification. Therefore, it stores the checksum result to the statistical table and records the executed SQL statement to the binlog. The task is completed. Statement-level replication transfers the computing logic to the slave database and performs the same computing in the slave database. The pt-table-checksum algorithm does not care about the slave database latency. The latency is calculated in the same way. (Some colleagues do not understand this. refer to my previous article ), it will not affect the correctness of the calculation results (but we will still check the delay because too much delay will affect the business, so we always need to add-max-lag to limit the traffic ).
Pt-table-sync is different. It first needs to complete the checksum value calculation of the chunk. Once the checksum value of the same chunk on the master and slave is found to be different, it goes deep into the chunk and compares and fixes problematic rows row by row. Its computing logic is described as follows (for example, to fix data inconsistency between the master and slave structures, business dual writes are more complicated to fix-due to issues related to conflict resolution and benchmark selection, it is limited to space, ):
- Perform the following checksum and repair processes for each slave database and table cyclically.
- Apply the for update lock to each chunk during verification. Once the lock is obtained, the show master status value of the current master database is recorded.
- Execute the select master_pos_wait () function on the slave database, and wait for the SQL thread from the slave database to run to the position obtained by show master status. In this way, the contents of the chunk on the master and slave nodes will not change.
- Run checksum on the chunk and compare it with the checksum of the master database.
- If the checksum is the same, the master and slave data are consistent, and the next chunk is continued.
- If the checksum is different, the chunk is inconsistent. Go deep into the chunk and calculate and compare the checksum row by row (the comparison process of single row checksum is the same as that of chunk, and the single row is actually a special case of chunk size 1 ).
- If a row is inconsistent, mark it. Continue to detect the remaining rows until the chunk ends.
- Use the replace into statement to run the replace into Statement on the primary database to generate the full binlog of the row and synchronize it to the slave database, this will fix the slave database based on the data in the master database. For the rows in the master database but not in the slave database, replace is used to insert the rows in the master database (which must not be insert ); delete the rows from the slave database but not the master database by executing delete in the master database (pt-table-sync strongly recommends that all data repair be performed only in the master database, it is not recommended to directly modify the slave database data. However, there are also special cases, which will be discussed later ).
- Until all inconsistent rows of the chunk are fixed. Continue to check and fix the next chunk.
- Until the repair of all the tables in the slave database ends. Start to fix the next slave database.
Important options Security Options
-[No] check-triggers: check whether a trigger exists. If yes, a warning is returned.
-[No] foreign-key-checks the primary and foreign key constraints by default. If any, a warning is displayed.
-[No] unique-checks: Check whether a unique index exists. If no index exists, a warning is displayed.
Filter options
-Ignore-databases
-Ignore-engines
-Ignore-tables
Other options
-Replicate = s is used in combination with pt-table-checksum. It is repaired only and not verified. Results verified before pt-table-checksum
-Bidirectional synchronization. Generally, the data in the master database prevails. If bidirectional synchronization is enabled, it is complicated to define conflict resolution rules.
Example
Assume 10.55.55.55 is the master database and 10.73.73.73 is its slave database. The port number is 3306.
1. Verify first:
PTDEBUG=1 ./pt-table-checksum --user=user --password=pass --host=10.55.55.55 --port=3306 --databases=elink --tables=my_cms_10 --recursion-method=processlist
2. Based on the verification results, only the inconsistency between the slave database 10.73.73.73 and the master database is fixed:
PTDEBUG=1 ./pt-table-sync --execute --replicate percona.checksums --sync-to-master h=10.73.73.73,P=3306,u=user,p=pass
3. After the repair, verify again. Execute the statement in step 1.
4. Check the repair result: log on to 10.73.73.73 and run the following SQL statement. If the returned result is null, the repair is successful:select * from percona.checksums where master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)
Notes
- Replace into is used to fix Master inconsistencies. You must ensure that the table to be replace has a primary key or a unique key. Otherwise, replace into degrades to insert into, which cannot be restored. In this case, pt-table-sync uses other checksum repair algorithms, but the efficiency is very low. For example, to group by all columns, calculate count (*) (The table must have a primary key !).
- The master-slave data inconsistency must be solved through replace into. The SQL statement must be at the statement level. Pt-table-sync sets all SQL statements it initiates to the statement format regardless of the global binlog_format value. In the cascade A-B-C structure, this will also encounter pt-table-checksum once encountered problems, causing the row format of the relay library from the library card library is inevitable. However, by default, pt-table-sync checks the binlog format of the slave database infinitely recursively and warns:
- Since pt-table-sync can only repair one table at a time, if the parent table is repaired, the sub-table data may be jointly and severally repaired, this may fix one inconsistency and introduce another inconsistency; if a trigger exists on the table, the same problem may occur. Therefore, use it with caution when there are triggers and foreign key constraints. The pt-table-sync tool also does not welcome master-slave heterogeneous structures. By default, the pt-table-sync tool checks the prerequisites.
- Pt-table-sync cannot tolerate slave database latency during restoration, which is exactly the opposite of pt-table-checksum. If the slave database has too much latency, pt-table-sync will hold the for update lock on the chunk for a long time, and then wait until the master_pos_wait of the slave database is completed or timed out. The longer the slave database latency, the longer the wait process. The longer the master database is locked, the greater the impact on the online database. Therefore, you must strictly Set max-lag.
- The restoration of slave database data is usually to execute SQL statements in the master database to synchronize data to the slave database. Therefore, when multiple slave databases exist, the repair statement is actually synchronized to all slave databases when the data in a slave database is repaired. The cost of data restoration depends on the degree of inconsistency between the slave database and the master database. If the data of a slave database is inconsistent with that of the master database, for example, the slave database has only a table structure, in this case, all data in the master database needs to be refilled, and then synchronized through binlog, and transmitted to all slave databases. This puts a lot of pressure on the online, or even drag down the cluster. The correct method is to use pt-table-checksum to verify the inconsistency. If not, Use pt-table-sync to fix the inconsistency. Otherwise, replace it with backup, and then fix it with pt-table-sync. Note: This actually provides a idea for myisam backup: if there is only one myisam master database, to add a slave database to it, you can: First mysqldump the table structure to the slave database, start synchronization and Use pt-table-sync to restore data.
Summary
The pt-table-sync tool is complex and powerful. It modifies the database content, which may cause security incidents. To ensure security, we recommend that you use pt-table-checksum to regularly check data consistency, manually re-build the slave database with severe damage, and finally Use pt-table-sync to fix the remaining slave database.
This article permanently updates the link address: