Go MySQL Log Group Submission

Source: Internet
Author: User
Tags mysql in

Original: https://jin-yang.github.io/post/mysql-group-commit.html

Group commit is to optimize the brush disk problem when writing logs, from initially supporting only the InnoDB Redo log group submission, to the 5.6 official version support redo log and Binlog group submissions, greatly improving the transaction performance of MySQL.

The following will take the InnoDB storage engine as an example, detailing how group submissions are implemented at each stage.

Brief introduction

Since 5.1, Binlog and InnoDB have adopted similar two-phase submissions, but group commit is not supported, and in 5.6, the commit phase of binlog is divided into three stages: flush stage, Sync stage, and comm It stage.

In each of these three phases, each stage is maintained with a queue, and the individual lists are defined as follows.

Mutex_queue m_queue[STAGE_COUNTER];

As above, each stage is maintaining a queue, the first to enter the queue as a leader thread, otherwise as a follower thread; leader thread collects follower transactions and is responsible for sync,follower threads waiting for leader notification operations to complete.

Although three queues are maintained, all of the THD in the queue are actually connected by Next_to_commit. Binlog begins the process of 3 stages during the commit phase of the transaction, that is, in the Mysql_bin_log::ordered_commit () function.

Next, look at how the transaction is committed in MySQL.

Transaction commit

Next, look at the process submitted by InnoDB and Binlog.

Two-phase commit

Details the next two stages of the submission process.

When Binlog is not turned on

InnoDB recovers the database (safe crash recovery) through the redo and undo logs, and when the data is restored, all transactions that have been committed within the storage engine are applied redo log recovery via the redo logs, all prepared but no C Ommit transactions are rolled back by undo log.

The client then connects to see that the data that has been committed exists in the database, and that uncommitted rollback data needs to be re-executed.

When Binlog is turned on

To ensure consistency between the storage engine and MySQL Binlog, a two phase commit was introduced (the phase commit, 2pc).

Because the repositories replay the transactions submitted by the main library through Binlog, assuming that the main repository storage engine has been committed and the binlog is not consistent, the loss of the repository data results in inconsistent primary and standby data.

Two-phase commit

The following is the two-phase commit process.

The detailed execution process is:

    1. InnoDB transaction Prepare phase, that is, SQL has successfully executed and generated redo and undo memory logs;

    2. Binlog commits, writes the Binlog memory log data to the file system cache via write ();

    3. Fsync () permanently writes the Binlog file system cache log data to disk;

    4. InnoDB internal commits, commit phase commits within the storage engine, controlled by the Innodb_flush_log_at_trx_commit parameter, so that undo and redo are permanently written to disk.

When you turn on Binlog MySQL in crash Recovery (crash recovery):

    • When the prepare phase crashes, the transaction is not written to Binlog and InnoDB is not committed, and the transaction is rolled back directly;

    • The Binlog has Fsync () permanently written to Binlog, but the InnoDB crashes when the commit is not available, and when it resumes, the submitted information is obtained from the Binlog, the transaction is redo and committed, and InnoDB and Binlog are always consistent.

The above mentioned two-phase commit process for a single transaction ensures that InnoDB and binlog are consistent, but how do you ensure that the storage engine and Binlog commit in the same order in concurrency? When concurrent commits, what is the impact if the two inconsistencies?

Group Commit exception

First of all, for the above problem, when concurrent commit, if the two inconsistencies will have any impact?

As shown above, transactions begin in T1, T2, T3 order, and are written in the same order as the Binlog log file system cache, call Fsync () for a group commit, and permanently write log files to disk.

But the order of the storage engine commits is T2, T3, T1, and when T2, T3 commits the transaction, a on-line backup program is created to create a new slave to replicate, and when the repository is built, CHANGE MASTER TO the log offset is after the T3 transaction.

Then the transaction T1 in the standby recovery MySQL database, found that T1 is not committed within the storage engine, then when the recovery, the T1 transaction will be rolled back, this will cause the master and standby data inconsistency.

Conclusion: it is necessary to ensure that the Binlog write order is consistent with the InnoDB transaction submission order for Xtrabackup backup recovery.

Early Solutions

Earlier, the Prepare_commit_mutex was used to guarantee the order that the next transaction could perform the Prepara operation only if the lock was released after the last transaction commit, and Binlog no Fsync () call in each transaction procedure.

Because memory data is expensive to write to disk, if you frequently Fsync () log data to disk permanently, the performance of the database will drop sharply. To do this, provide the Sync_binlog parameter to set how many Binlog logs are generated when a fsync () is called to flash the binary log into the disk to improve overall performance, which is set to:

    • Sync_binlog=0, the operation of binary log Fsync () is based on system automatic execution.

    • Sync_binlog=1, each transaction commit invokes Fsync (), which maximizes data security, but affects performance.

    • Sync_binlog=n, a N-1 transaction may be lost when the database crashes.

Prepare_commit_mutex's locking mechanism can severely affect performance at high concurrency, and Binlog cannot perform group submissions.

Improvement Programme

Next, take a look at how to ensure that the Binlog write order is consistent with the order in which the storage engine is submitted, and be able to commit binlog groups. 5.6 Introduces group submission and divides the commit process into three stages of Flush stage, Sync stage, commit stage.

In this way, transaction commits are divided into the following phases:

InnoDB, Prepare    SQL已经成功执行并生成了相应的redo和undo内存日志;Binlog, Flush Stage    所有已经注册线程都将写入binlog缓存;Binlog, Sync Stage    binlog缓存将sync到磁盘,sync_binlog=1时该队列中所有事务的binlog将永久写入磁盘;InnoDB, Commit stage    leader根据顺序调用存储引擎提交事务;

Each stage stage has its own queue, which allows transactions to be queued for each session, improving concurrency performance.

If a thread registers to an empty queue, the thread acts as a leader for that queue, and the thread that is registered to the queue is follower, and subsequent operations are follower behavior in the leader control queue.

Leader will also lead all the current queue follower to the next stage to execute, when the next stage is a non-empty queue, leader will become follower registered to this queue; Note: Follower threads are never likely to become leader.

Configuration parameters

The parameters related to Binlog group submission mainly include the following two parameters.

Binlog_max_flush_queue_time

The unit is subtle, used to take a transaction from the flush queue time-out, which is primarily to prevent concurrent transactions from being too high, causing the RT of some transactions to rise, and the details to see the function MYSQL_BIN_LOG::process_flush_stage_queue() .

Note: This parameter has been canceled after 5.7.

Binlog_order_commits

When set to 0 o'clock, transactions may be submitted in a different order from Binlog, with slightly improved performance, but not particularly noticeable.

SOURCE parsing

Binlog group submissions are managed through Stage_manager, where the core content is as follows.

class Stage_manager {  public: enum StageID { // binlog的组提交包括了三个阶段 FLUSH_STAGE, SYNC_STAGE, COMMIT_STAGE, STAGE_COUNTER }; private: Mutex_queue m_queue[STAGE_COUNTER];};

Group commit three-stage process, implemented in detail as follows.

Mysql_bin_log::ordered_commit () ← Performing transactional order submission, Binlog Group commit mainstream | |-#########>>>>>>>> >← into Stage_manager::flush_stage stage |-change_stage (..., &lock_log) | |-stage_manager.enroll_for (                                      ← Add the current thread to M_queue[flush_stage] | | | |                        ← (Follower) returns true | |-mysql_mutex_lock () ← (leader) lock on Lock_log and return False | |-finish_commit () ← (follower) for follower Direct return | |-ha_commit_low () | |-process_flush_stage_queue () ← (leader) for Followe R is returned directly | |-fetch_queue_for () ← Get the member in the queue by Stage_manager | | |-fetch_and_empty () ← Get the element and Empty Queue | |-ha_flush_log () | |-flush_thread_caches () ← Do this for each thread | |-my_b_tell () ← If the break is over max_bin_log_size, then switch binlog file | |-flush_cache_to_file () ← (follower) writes the contents of the I/O cache to the file |-run_hoo K () ← Call hook function, i.e. Binlog_storAge->after_flush () | |-#########>>>>>>>>>← into the stage_manager::sync_stage order   Duan |-change_stage () |-sync_binlog_file () | |-mysql_file_sync () |     |-my_sync () |                     |-fdatasync () ← Call system API to write to disk or Fsync () | |-#########>>>>>>>>> ← Enter Stage_manager::commit_stage stage |-change_stage () ← This phase is subject to binlog_order_commits parameter limits |-pro   Cess_commit_stage_queue () ← will be all over the thread and then call the following storage Engine interface | |-ha_commit_low () |   |-ht->commit () ← Call storage Engine Handlerton->commit () |                                    |   ←### Note that the actual invocation of the following two functions |   |-binlog_commit () |                           |-innobase_commit () |-process_after_commit_stage_queue () ← Post-submission processing, such as Semisync | |-run_hook () ← Call Transaction->after_commit | |-stage_manager.signal_done () ← Notify other threads that the transaction has been committed | |-finish_commit ()

In the Enroll_for () function, the thread that was just added is set to the leader thread if it is the first thread of the queue, or follower thread, at which point the thread sleeps until it is awakened by leader (M_cond_done).

Note that the Binlog_max_flush_queue_time parameter has been canceled.

Commit Stage

As mentioned above, the commit phase is affected by the parameter binlog_order_commits, when the parameter is closed, the Lock_sync is released directly, each session enters the InnoDB commit phase by itself, so that Binlog and transaction C are not guaranteed Ommit in the same order.

Of course, if you are not concerned about the consistency of the two, you can turn off this option to slightly improve the performance of the point, and when the above parameters are turned on, the commit stage is entered.

Go MySQL Log Group Submission

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.