Analysis on innodb_support_xa and innodb_flush_log_at_trx_commit

Source: Internet
Author: User

A long time ago, innodb_flush_log_at_trx_commit and innodb_flush_log_at_trx_commit had some misunderstandings about innodb_support_xa. They thought that innodb_support_xa only controls external xa transactions, and internal xa transactions are internally controlled, human intervention is not allowed (the internal xa transaction here refers to the internal xa transaction used to maintain consistency between the binlog and the innodb redo log ). I did not go to the manual carefully until I saw someone discussing mysql Data Security on Weibo. I have discussed innodb_support_xa again these days, so I thought about recording it. First read the explanation of innodb_support_xa in the official manual: "EnablesInnoDBsupport for two-phase commit in XA transactions, causing an extra disk flush for transaction preparation. this set-ting is the default. the XA mechanic is used internally and is essential for any server that has its binary log turned on and is accepting changes to its data from more than one thread. if you turn it off, transactions can be written to the B Inary log in a different order from the one in which the live database is committing them. this can produce different data when the binary log is replayed in disaster recovery or on a replication slave. do not turn it off on a replication master server unless you have an unusual setup where only one thread is able to change data. from the official explanation, innodb_support_xa has two functions: first, it supports distributed transactions (external xa transactions) with multiple instances, which are generally used in Distributed Database environments. It is widely used. Second, internal xa transactions are supported. To put it bluntly, data consistency between binlog and innodb redo log is supported. Today's focus is on the second type of internal xa transactions. First, we need to understand why we need to maintain data consistency between the binlog and the redo log. Here we will explain in two aspects: first, ensure that the transactions in the binlog must exist in the redo log, that is, the binlog won't have more transactions than the redo log (it can be less, because some transactions recorded in the redo log may not have a commit, and these transactions may eventually be rolled back ). First, let's look at this scenario (the subsequent scenarios assume that binlog is enabled): In an AB replication environment, the master database crash and then perform the crash recovery, in this case, if the transaction information in the binlog is inconsistent with the information in the redo log, then the content of the binlog part will be copied to the slave database after the master database recovers using the redo log, then the Master/Slave Data is inconsistent. Therefore, the transaction consistency between binlog and redo log must be ensured. Second, ensure that the transaction order in binlog is consistent with that in redo log. This is also very important. Assuming that the transaction order recorded by the two is inconsistent, the transaction execution sequence similar to that of the master database is ta, tb, tc, td, however, the binlog records ta, tc, tb, td, and binlog copies to the slave database, causing data inconsistency between the master and slave databases. Of course, the BGC was broken due to the poor design. I will not go into detail here. How does mysql implement the above two points? Yes, the answer is internal xa transactions (the core is 2 pcs ). The internal processing process of mysql is like this: 1. prepare, and then persists the redo log to Disk 2. if the prepare is successful, the transaction log will continue to be persistent to binlog3. if the transaction is successful, write a commit record in the redo log. If the transaction fails again in three steps, how does crash recovery work? At this time, the transaction starting from the last checkpoint will be read from the redo log, and then the transaction in the binlog will be restored according to the transaction. If it is at 1 crash, the entire transaction will be rolled back naturally; if it is at 2 crash, the entire transaction will be rolled back; if it is in 3 crash (only the commit record is not successfully written), it does not matter because 2 has recorded the binlog of the transaction, so this is committed. In summary, all transactions in the redo log that succeeded in the prepare but failed in the commit will first go to the binlog to find out whether the transaction exists (through the XID, is it often seen in the binlog that Xid = xxxx? This is the xa transaction id). If yes, commit the transaction; otherwise, rollback. In these three steps, fsync is required for each step because of persistence requirements. But if fsync is required for each step, what are the meanings of the parameters sync_binlog and innodb_flush_log_at_trx_commit? It is not clear here. I hope I can make up for it or who will help me solve it. The previous explanation has completed the internal xa transaction to ensure that the transactions recorded in the binlog will not be more than the redo log (or indirectly, it can be understood that the binlog must only record the commit transaction ), the reason for doing so is to maintain consistency between the master and slave after the crash recovery. Next we will explain how to ensure that the order between binlog and redo log is consistent. The reason for ensuring transaction sequence consistency in binlog and redo log has been explained before. To ensure this, I believe that anyone who knows BGC knows this-the notorious prepare_commit_mutex. That's right, it makes it impossible to implement BGC under normal circumstances. What is the principle? During each xa transaction, in the prepare stage, the transaction first obtains a global prepare_commit_mutex, and then executes the persistence (fsync) redo log and binlog described above, after fsync is complete, the prepare_commit_mutex will be released. This is equivalent to the serialization effect. Although the order consistency between binlog and redo log is ensured, each transaction requires an fsync operation, as we all know, the most costly operation in a persistence process is fsync, and the cost of unimplemented operations like write () is relatively small. Therefore, the core of BGC is to merge fsync required by many transactions into one fsync. After talking about this, we only want to explain the value of innodb_support_xa = 1. But we also mentioned that because the xa transaction requires multiple fsync operations, the performance will be affected after it is enabled. From the percona blog, I saw that tps dropped by half after they started the test in, but I actually used mysql-5.5.12 + sysbench-0.5 + 10 SAS (raid 10) test results in less obvious performance below. In oltp mode, tps is almost the same, but its default read/write ratio is. later it was changed to a pure update test, and the xa transaction performance began to drop by about 5%, there is no such big difference. I suspect there are two possible reasons: First, the current mysql performance is greatly improved compared with 06; second, I tested the machine better (10 SAS disks for raid 10 ), in this way, even if xa transactions are enabled, a lot of fsync is required. However, because the storage can resist it, it does not reflect a great disadvantage. Next, let's take a look at the meaning and reasonable settings of innodb_flush_log_at_trx_commit. Innodb_flush_log_at_trx_commit has three values: 0, 1, and 2, respectively representing different redo log landing policies. 0 indicates that flush is performed once per second, but no operation is performed on each transaction commit (fsync is called every second to bring data to the disk. However, note that if the underlying storage has a cache, such as raid cache, so it will not be implemented at this time, but because raid cards generally have backup power supply, data is generally considered safe at this time ). 1 indicates flush is performed for each transaction commit, which is the safest mode. 2 indicates flush per second. When a transaction is committed, it does not flush. Instead, it calls write to fl the redo log in the redo log buffer to the OS page cache. Now we can compare the advantages and disadvantages of the three strategies: 1. Since each transaction commit will be implemented by the redo log, It is the safest, but the performance will decrease sharply due to the increase in the number of fsync operations. 0 indicates flush per second, and no operation is performed for each transaction commit. Therefore, mysql or OS crash will lose one second of transactions. 2 compared with 0, there will be a write operation every time a transaction is committed. At this time, even if the data is not stored on the disk, as long as there is no OS crash, even if mysql crash, the transaction will not be lost. 2 is slightly safer than 0. Therefore, we recommend that the master database start with innodb_support_xa = 1 and the slave database does not open (because the slave database generally does not remember binlog). Therefore, data consistency is very important. For innodb_flush_log_at_trx_commit, we recommend that you set it to 2 unless it is important to the data and cannot lose the transaction. I see that some companies are set to 0. In fact, I personally think that if it is set to 0, there is not much reason to set it to 2, because the performance loss caused by 2 is a write operation for each transaction, the overhead of write operations is much smaller than that of fsync, but this overhead will not be lost even if mysql crashes the transaction.
 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.