MySQL Crash-safe Replication

Last Update:2018-08-01 Source: Internet

Author: User

Tags dba mysql version percona

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL database success is inseparable from its replicaiton, compared to Oracle DG and Microsoft SQL Server Log Shipping, it is easy to get started, basically 1, 2 minutes in accordance with the manual to complete the environment. However, with the depth of use, replication's own problems are slowly revealed, and the non-crash safe features make many DBAs feel headache and even fail to understand the cause of their problems. Simply put, crash-safe replication means that when the server restarts, the data of the Master/slave is guaranteed to be consistent when the master/slave of any node goes down unexpectedly.

Crash-safe Master is relatively straightforward, as long as the transaction's storage engine is used, and the correct configuration can achieve the crash safe effect. For the most common InnoDB storage engine, the following settings are required in the configuration file:

Sync_binlog=1 InnoDB-flush-Log-at-Trx-Commit=1
There was a bug before MySQL version 5.6 that when these two parameters are enabled, the group commit of the InnoDB storage engine is invalidated, resulting in a dramatic drop in performance in a write-intensive environment. As a result, the DBA compromises performance and data consistency, typically setting the parameter Innodb-flush-log-at-trx-commit to 2, which leads to the fact that Master is no longer crash safe and the master-slave data may be inconsistent. Mariadb really solves this problem, so many branch versions, such as Percona,facebook Mysql,innosql, have ported the MARIADB Group commit scheme to their own branch to resolve the issue of group commit failure. The Crash-safe slave situation is somewhat complex, and this may be a more common problem for DBAs. For example, slave constantly reported 1062 errors, or found that the master-slave data is inconsistent (especially if the table does not have a primary key). At this point, the DBA's choice is often very helpless, basically is the whole library rebuilt. So when you have the experience of having more than 200 MySQL servers in operation, you'll find that's a big problem. There are two reasons why Crash-safe slave cannot be implemented, that is, SQL thread and IO thread in replication. First look at SQL thread, which mainly completes two operations:

Run the corresponding transaction information in the relay log
Update Relay-info.log file

The Relay-info.log file is updated to record where the relay log has been executed, and the relay log can continue to be synchronized based on this location when the slave is restarted. However, here the user will find that the two operations are not in one transaction, one is a database operation, one is a file operation, so the atomic effect cannot be achieved. In addition, the MySQL database defaults to the file Relay-info.log is written to the operating system cache, so in the event of downtime can cause a large number of updated locations lost, resulting in repeated execution of SQL statements, the final phenomenon is the master-slave data inconsistency. MySQL 5.5 Adds the parameter sync_relay_log_info, which controls the relay-info.log of a fdatasync operation after each transaction update, which adds to the burden of the system and, even so, the possibility of the last transaction being lost. As early as MySQL 4.0, Google released a patch to solve the problem (https://code.google.com/p/google-mysql-tools/wiki/TransactionalReplication), It records the location information of the binary log to the segment header of the transaction system segment each time the InnoDB storage engine commits. Regenerate the Relay-info.log file after the slave reboot to save this part of the information. Percona also uses this approach and controls whether to replace the Relay-info.log file at startup with the parameter Innodb_recovery_update_relay_log. MySQL 5.6 uses another method, which is to save relay-info.log information in the INNODB transaction table, when two operations are database operations, in a transaction can be atomic. For example, for slave log playback, the process is:

BEGIN; Apply log event;
Apply log event; UPDATE MySQL. Slave_relay_log_info SET master_log_pos = exec_master_log_pos, master_log_name = relay_master_log_file, relay_log_name = relay_log_file, relay_log_pos = relay_log_pos; COMMITThis approach solves the atomicity problem of updating relay-info.log, but is this the ultimate perfect solution? Unfortunately, there are some flaws. you can see that because the final table The Slave_relay_log_info update locks the record, causing the transaction commits on the slave to be serial. Although MySQL 5.6 supports parallel replication, the serial Update table Slave_relay_log_info, again causing group commit to fail. Therefore, by--log-slave-updates the Cascade replication, the performance will be limited. MARIADB is working on this issue and is very likely to see the perfect solution in the mariadb GA release. The IO thread is used to synchronize binary logs on Master, but it can still cause inconsistent data when crash. IO Thread writes the received binary log to the relay log, with each binary log composed of multiple log event, so that every log event is accepted to update master-info.log. Like Relay-info.log, it is also written to the operating system cache, parameter Sync_master_info can control Fdatasync time. Because the update of IO thread does not perform atomic operations on a transaction like SQL thread, it has an impact on data consistency and assumes that a log event is routed two times in relay log. However, it is good to start with the MySQL 5.5 version of the parameter relay_log_recovery, when the occurrence of crash lead to re-connected master, It does not reconnect according to the information of Master-info.log, but instead restarts the log data on the master based on the location information from Relay-info to master (although it is necessary to ensure that the log still exists on Master, otherwise ...) Crash-safe is a point that OPS personnel cannot ignore, or the DBA will be busy dealing with situations where the slave service is stopped due to these abnormal conditions. MySQL 5.6 offers a solution for crash safe, but there are still some imperfect possibilities. So, little friends, let's look forward to the release of the Mariadb GA release. Source: http://insidemysql.blog.163.com/blog/static/202834042201385190333/

MySQL Crash-safe Replication

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More