InnoDB two-phase log Submission mechanism

Source: Internet
Author: User

InnoDB two-phase log Submission mechanism

ViewThe innodb_flush_log_at_trx_commit official website has some questions.For more information about the innodb_flush_log_at_trx_commit parameter, see the official website.

One of them is written as follows: With a value of 2, the contents of InnoDBLog buffer are written to the log file after each transaction commit and the log file is flushed to disk approximately once per second. it means that if the innodb_flush_log_at_trx_commit value is set to 2, the content in the log buffer will be written to the log file each time it is submitted, and then the logfile will be flush to the disk. Because innodb's log file is ib_logfile on the hard disk as far as I know, I am confused that the log file is flushed to disk, is there a layer of structure between log buffer and disk that can cache log files? After reading a large amount of Chinese and English documents, I finally got a preliminary understanding. I. Glossary in the innodb Storage engine, there is a unique log file, that is, redo log file. Therefore, for the innodb Storage engine, there are two types of logfiles: redo log and binlog. redo log: ib_logfile0 and ib_logfile1 under the data Directory (the number is controlled by innodb_log_files_in_group). The innodb Storage engine is unique and has the corresponding redo log buffer in the memory. Therefore, when writing redo logs, the layer-3 structure is: redo log buffer ---> redo logfile in the file system cache ---> redo log filebinlog on disk: by default, it is in the data directory, you can also use the log_bin parameter to directly specify the path. The file name is a file with the default
  • The value 0 indicates that the redo log buffer content is written to the redo log cached by the file system every second and flushed to the redo log file on the disk.
  • 1 indicates that the content of the redo log buffer will be written to the redo log cached by the file system during the transaction commit, and flush (solidified) to the redo log file on disk.
  • 2 indicates that the content of the redo log buffer will be written to the redo log cached by the file system during the transaction commit, and the redo log cached by the file system will be flushed once per second (solidified) to the redo log file on disk.
  • 2. write binlog stage: this stage calls two methods write () and fsync (). The former is responsible for writing the binlog in the file system cache, the latter is responsible for writing the binlog in the file system cache to the bin log on disk. The former will be called at this stage, and the latter's calling mechanism is controlled by the sync_binlog parameter. About sync_binlog parameters:
    • Sync_binlog = 0: indicates that the fsync () call is completely handed over to the operating system, that is, whether the binlog in the file system cache is refreshed to the disk is completely controlled by the operating system.
    • Sync_binlog = 1: indicates that the binlog will be solidified to the disk when the transaction is committed.
    • Sync_binlog = N (N> 1): When the database crashes, N-1 transactions may be lost. For more information, see
    3. final commit stage: this stage mainly includes: the server tells the storage engine that both binlog and redo log have been written (at least at the file system cache level). submit data according to the normal mechanism, then, return committed's confirmation submission information to the session. Iii. fault recovery resolution 1. If a crash occurs after the first phase, the data is apparently not committed because the binlog is not written. This is a failed transaction and does not require roll-back or rollback. (For Oracle databases, the situation is more complex. Some large transactions may have solidified data even if they are not committed, so rollback is required. It is not clear whether there is a mechanism for writing uncommitted data into the disk in advance in mysql large transactions.) 2. if the system crashes after the second stage, only one situation can ensure that data is not lost at all, that is, innodb_flush_log_at_trx_commit and sync_binlog are both set to 1. At this time, both the redo log and binlog are solidified to the disk, this ensures that data not written after commit is rolled forward and submitted during recovery. If any of them is not 1, binlog and redo log may be inconsistent, and transactions may be lost. Therefore, to ensure full consistency between the master and slave databases, both innodb_flush_log_at_trx_commit and sync_binlog of the master database must be set to 1. As for the three-phase group commit mechanism that emerged after 5.6 to solve the concurrent transaction commit exception, further research is required.

    Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.