InnoDB Redo Flush and Dirty page refresh mechanism in-depth analysis

Source: Internet
Author: User
Tags server memory

Profile:

We know that InnoDB uses the write Ahead log policy to prevent the loss of downtime data, that is, when a transaction commits, the redo log is written, and then the memory data page is modified, resulting in dirty pages. Now that you have redo logs to ensure data persistence, you can also fetch data directly from the buffer pool page when querying, why refresh the dirty pages to disk? If the redo log can grow infinitely and the buffer pool is large enough to cache all the data, it is not necessary to flush the dirty pages in the buffer pool to disk. However, there are usually several issues:

    • Limited server memory, buffer pool not enough to cache all data
    • Redo Log Unlimited increase cost requirements too high
    • If redo all log recovery time is too long during downtime
In fact, when the database goes down, the database does not need to redo all the logs, only the logs after the last brush-in point are executed. This point is called Checkpoint, which solves the problem above:
    • Reduce Database recovery time
    • Flush a dirty page to disk when the buffer pool is not sufficient
    • Refresh dirty pages When redo logs are not available

Redo logs are designed to be recyclable, and when a log file is full, the data in the redo log that has been flushed to the disk is no longer required to be overwritten and reused.

The InnoDB engine marks the version with the LSN (log Sequence number), which is the end point of each log in the log space, expressed as a byte offset. Each page has a Lsn,redo log and Lsn,checkpoint also has an LSN. can be observed by command show engine innodb status :

[Plain]View PlainCopy
    1. ---
    2. LOG
    3. ---
    4. Log Sequence Number 1039878815567
    5. Log flushed up to 1039878815567
    6. Pages flushed up to 1039878814486
    7. Last checkpoint at 1039878814486
    8. 0 Pending Log writes, 0 pending CHKP writes
    9. 5469310 log I/O ' s done, 1.00 log I/O ' S/second

How many pages does the checkpoint mechanism refresh each time, where to fetch dirty pages, and when to trigger a refresh? These are all very complex. There are two kinds of checkpoint, respectively:
    • Sharp Checkpoint
    • Fuzzy Checkpoint
Sharp checkpoint occurs when the database is closed and all dirty pages are brushed back to disk. A partial dirty page refresh is performed at run time using the fuzzy checkpoint. Some dirty pages are refreshed in the following ways:
    • Master Thread Checkpoint
    • Flush_lru_list Checkpoint
    • Async/sync Flush Checkpoint
    • Dirty Page Too much Checkpoint
The master thread Checkpoint master thread refreshes a percentage of the page-back disk from the dirty pages list of the buffer pool at a rate of every second or every 10 seconds. This process is asynchronous and does not block the query thread.

Flush LRU List Checkpoint

InnoDB to ensure that there are around 100 free pages available in the LRU list. Before the innodb1.1.x version, to check whether there are enough pages in the LRU for the user query operation thread, if not, the end of the LRU list will be retired, if there is a dirty page in the obsolete page, will be forced to perform checkpoint brush back dirty page data to disk, obviously this will block the user query thread. Starting with the innodb1.2.x version, this check is placed in a separate page Cleaner thread, and the user can innodb_lru_scan_depth control the number of pages available in the LRU list by default of 1024.

Async/sync Flush Checkpoint

When redo log files are not available, you need to force some pages in the dirty pages list to be flushed back to disk. This ensures that redo log files can be reused. Before the innodb1.2.x version, Async flush checkpoint blocks the user query thread that found the problem, and Sync flush checkpoint blocks all query threads. Innodb1.2.x is then placed on a separate page Cleaner Thread.

Dirty Page Too Much Checkpoint

With too many dirty pages, the InnoDB engine enforces checkpoint. The purpose is also to ensure that there are enough free pages available in the buffer pool. It can be set by parameter innodb_max_dirty_pages_pct , which defaults to 75%:

[Plain]View PlainCopy
    1. ([email protected]) [(None)]> show variables like ' innodb_max_dirty_pages_pct ';
    2. +----------------------------+-------+
    3. | variable_name | Value |
    4. +----------------------------+-------+
    5. | innodb_max_dirty_pages_pct | 75 |
    6. +----------------------------+-------+
    7. 1 row in Set (0.00 sec)

The above is a dirty page refresh of several trigger mechanism, next, elaborate on the log mechanism and the 3rd Async/sync flush checkpoint principle.     Log and Checkpoint Introduction

InnoDB transaction log refers to redo log, which is called log, stored in the log file ib_logfile* inside. InnoDB There is another log of undo logs, but the undo log is stored in the shared tablespace (ibdata* file).

Because log and checkpoint are closely related, these two parts are analyzed together.

Noun Explanation: LSN, log sequence number, InnoDB's log sequence number is a 64-bit integer.

Log Write

The LSN actually corresponds to the log file offset, the new lsn= old LSN + writes the log size. Examples are as follows:

LSN=1G, the log file size of a total of 600M, this write 512 bytes, the actual write operation is:

| ---The offset: Since the LSN value is much larger than the log file size, the offset is 400M by the method of taking redundancy.

| ---write log: Find the offset 400M, write 512 bytes of log content, the next transaction LSN is 1000000512;

Checkpoint Write

InnoDB implements the mechanism of fuzzy checkpoint, each fetch to the oldest dirty page, and then make sure that the LSN of the corresponding LSN of this dirty page has been written to the log file, and then the LSN of this dirty page as a checkpoint point to log file, meaning " The log and data for the LSN preceding this LSN have been written to the disk file ". When recovering a data file, InnoDB scans the log file, and when the LSN is found to be less than checkpoint, the recovery is considered complete.

The location where the checkpoint is written is at a fixed offset at the beginning of the log file, that is, each write checkpoint overwrites the previous checkpoint information.

    Flush Refresh Process and principle introduction

Because checkpoint and logs are closely related, the log and checkpoint are described together, the detailed implementation mechanism is as follows:

As shown, a transaction log for InnoDB has undergone a total of 4 stages:

1) Creation phase: Transaction creates a log;

2) Log brush disk: Log files written to disk;

3) Data Brush Disk: The log corresponding to the dirty page data written to the data file on disk;

4) Write CKP: Log is written as checkpoint log file;

For these 4 stages, the System Records 4 log-related information for various other processing uses:

Log Sequence Number (LSN1): The current system LSN maximum, and the new transaction log LSN will be generated on this basis (lsn1+ the size of the new log);

Log flushed up to (LSN2): The LSN that is currently written to the log file;

Pages flushed up to (LSN3): The LSN of the current oldest dirty page data, which writes the LSN directly to the log file when writing checkpoint;

Last checkpoint at (LSN4): LSN that is currently written to checkpoint;

For the system, the above 4 LSN is decremented, namely: Lsn1>=lsn2>=lsn3>=lsn4.

The specific examples are as follows (using the show engine InnoDB status \g command)

[Plain]View PlainCopy
    1. ---
    2. LOG
    3. ---
    4. Log Sequence Number 1039878815567
    5. Log flushed up to 1039878815567
    6. Pages flushed up to 1039878814486
    7. Last checkpoint at 1039878814486
    8. 0 Pending Log writes, 0 pending CHKP writes
    9. 5469310 log I/O ' s done, 1.00 log I/O ' S/second

    Async/sync Flush Checkpoint principle

InnoDB data is not real-time write disk, in order to avoid downtime data loss, to ensure the acid properties of the data, InnoDB at least to ensure that the data corresponding to the log can not be lost. For different situations, InnoDB take different countermeasures:

1) log loss due to downtime

InnoDB has a log brush disk mechanism, can be controlled by innodb_flush_log_at_trx_commit parameters;

2) Log overwrite causes log loss

InnoDB log file size is fixed, write the time by taking the remainder to calculate the offset, so that there are two LSN write to the same location, the next write to the front is covered, in the "Write mechanism" section of the sample as an example, The offsets for the lsn=100000000 and lsn=1600000000 two logs are the same. In this case, in order to ensure data consistency, must require that the lsn=1000000000 corresponding dirty page data has been brushed to disk, that is, the last checkpoint corresponding LSN must be greater than 1000000000, or overwrite Dogo Katsuragi zhi also no, data also no brush disk , the data is lost once the outage is made.

In order to solve the problem of data loss in the second situation, InnoDB implements a set of log protection mechanisms, which are implemented in detail as follows:

, the line represents the log space (log cap, approximately equal to the total size of the log file *0.8,0.8 is a factor of safety), CKP age and Buf age are two floating points, Buf async, Buf sync, Ckp async, CKP sync are several fixed points. The meanings of each concept are as follows:

Concept Calculation Meaning
CKP Age Lsn1-lsn4 There is no log range for checkpoint, if CKP age exceeds log space, the log and data for the overwritten log (lsn1-lsn4-log cap) are "probably" not being brushed to disk
Buf Age Lsn1-lsn3 There is no scope for the log of the dirty page brush, if BUF age exceeds the log space, the corresponding data for the overwritten log (lsn1-lsn3-log cap) is "definitely" not being brushed to disk
Buf Async Log space Size * 7/8 Forcing the Buf to age-buf the dirty page brush of async, at which point the transaction can continue to execute, so there is no direct impact on the execution speed of the transaction (with indirect effects such as CPU and disk being busier, and the execution speed of the transaction may be affected)
Buf Sync Log space Size * 15/16 Forcing a dirty page brush on the Buf age-buf async, where the transaction stops executing, so for sync, there is a lot of dirty page brush disk, so the time to block is longer than CKP sync.
CKP Async Log space Size * 31/32 Force write checkpoint, at which time the transaction can continue to execute, so for async, there is no impact on the execution speed of the transaction (the indirect effect is not significant, because the operation of writing checkpoint is relatively simple)
CKP Sync Log space Size * 64/64 Force write checkpoint, at this time the transaction stop execution, so for sync, but due to write checkpoint operation is relatively simple, even if blocked, the time is very short

When a transaction executes faster than the dirty page brush speed, CKP age and buf age grow, and when the async point is reached, a dirty page brush or write checkpoint is enforced, and if this does not catch up with the speed of the transaction execution, in order to avoid data loss, when the sync point is reached, will block all other transactions, specifically for dirty page brush or write checkpoint.

So theoretically, as long as the transaction executes faster than the dirty page brush speed, it will eventually trigger the log protection mechanism, and then block the transaction, causing the MySQL operation to hang.

Since writing checkpoint itself is simpler than writing dirty pages, consuming much less time, and CKP sync points after the buf sync Point, most of the blocking is blocked at the buf Sync Point, which is why the IO is high when the transaction is blocked, Because this time in constantly brush dirty page data to disk. For example, the following log shows that many transactions are blocked at the buf Sync point:

Original blog:http://blog.csdn.net/melody_mr/article/details/48930739

InnoDB Redo Flush and Dirty page refresh mechanism in-depth analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.