InnoDB Storage Engine Introduction-(4) Checkpoint mechanism one

Last Update:2017-07-29 Source: Internet

Author: User

Tags flushes

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Checkpoint's working mechanism:

InnoDB will automatically maintain a checkpoint mechanism, called the Fuzzy checkpointing (of course, sharp checkpoint is also one of the checkpoints), the fuzzy checkpointing is the buffer. The data page information in the pool is flushed to disk in small batches. But we don't need to refresh the buffer pool for a single batch of batches, and then affect other SQL processes that are executing.

During crash recovery, MySQL also logs a checkpoint message to the log file. It records all modifications to the database before the database checkpoint occurs, so that the database looks up the checkpoint information in the log file and then reads the log forward again (roll forward).

Page modification information is generally recorded in the buffer pool, and later this information will be flushed to the disk data file, flushing background process to handle this matter. The so-called checkpoint is a record of the last modification written to the disk data file (the specific expression is LSN).

Here's a little bit of a quick look at the MySQL processes and mechanisms associated with checkpoints:

Fuzzy checkpointing: A background process that periodically refreshes a portion of the buffer pool dirty page to disk.

Sharp checkpoint: Flushes all dirty pages in the buffer pool to disk data files at once, which occurs before MySQL reuses log files. Because MySQL log files are recycled, they often occur frequently in high load situations.

Adaptive flushing: An algorithm that reduces the IO burden by causing checkpoints, instead of refreshing all dirty pages at once, adaptive flushing refreshes only a portion of the dirty page at a time, and the algorithm automatically calculates the optimal refresh cycle based on the speed and frequency of data flushing.

Flush: Refreshes the changes to the data file, which is known as a drop disk. In the INNODB storage structure, there are redo Log,undo log and buffer pool regularly refreshed. But when does flush happen? One scenario is when the MySQL memory storage area is full and the flush is triggered because a new change takes place and a new buffer pool space is required to hold the information. If you do not immediately need to refresh all of the buffer pool information to the disk, you will normally use the fuzzy checkpointing process 1.1 points to process.

Seeing so much, how does the checkpoint work? Take a look at the following:

The algorithm for InnoDB checkpoint is not much documented, because it is difficult to understand, and to understand many of InnoDB's many other related things can help you understand checkpoint well.

The first thing we need to know is that there are two kinds of inspection points, one is sharp checkpoint, the other is fuzzy checkpoint.

As described above, sharp checkpoint flushes all dirty pages in the buffer pool to disk data files at once. and record the LSN (log sequence number) to the location of the last committed thing. Of course, things that are not committed will not be flushed to disk. This is a bit different from SQL Server, where SQL Server flushes both commits and uncommitted to disk, which appears to violate the rules of the pre-write log. After recovery, the REDO log starts at the last LSN, which is where the checkpoint occurs. Sharp checkpoint all the data to disk is based on a point-in-time, which is where the so-called checkpoint takes place.

The fuzzy checkpoint is even more complex, and it happens at a fixed point in time, unless he has flushed all the page information to disk, or if it has just happened once when the sharp Checkpoint,fuzzy checkpoint occurs, it records two LSN, That is, the time at which the checkpoint occurred and when the checkpoint ended. However, the refreshed page is not necessarily at a certain point in time is consistent, which is why it is called fuzzy reason. Data that was previously brushed into the disk may have been modified, and later refreshed data may have an LSN that is smaller than the previous LSN update. Fuzzy checkpoint can be understood in a sense as fuzzy checkpoint from the first LSN of redo log to the last LSN. After recovery, REDO log starts at the beginning of the last checkpoint and the LSN that is logged.

In general, it is possible that the frequency of fuzzy checkpoint is much higher than that of sharp checkpoint, and there is no doubt about it. However, when the database is closed, switching redo log files will trigger sharp checkpoint, which is generally the case with more fuzzy checkpoint.

In general, the operation of the normal operation will not occur when the checkpoint, but the fuzzy checkpoint in accordance with the time to advance and continue to occur. Flushing dirty pages has become a common daily operation of the database.

InnoDB maintains a large buffer to ensure that the modified data is not immediately written to the disk. She will keep the modified data in buffer pool so that it can be modified many times before the data is written to disk, which we call a write union. These data pages are managed by list in the buffer pool, and the free list records those spaces that are available, and the LRU list records those data pages that were recently accessed. The flush list records all the dirty page information in the LSN sequence, with the least recently modified information.

Focusing on the flush list here, we know that the InnoDB cache space is limited. If the buffer pool space is used, a disk read occurs when the new data is read again, that is, the flush operation occurs, so it is necessary to release a portion of the unused space to ensure the availability of the buffer pool. Since this is a time-consuming operation, it is said that the InnoDB will perform the refresh operation in a sequential way, so that sufficient clean page is guaranteed to be exchanged without the flush operation. Each refresh will expel the oldest information from the flush list, so that the database buffer hit ratio is guaranteed to be a high value. The old data is selected based on their location on the disk and the LSN (last modified) number to confirm that the data is old and new.

The logs for MySQL data are used in a mixed loop, but they are never overwritten if the page information recorded by these things has not been flushed to disk. If the data that has not been flushed into the disk is overwritten with the log file, then it is not the data that all the overwritten writes will be lost if the database is down. As a result, data modification is also time-bound, as new things or things that are being executed also require log space. The larger the log, the less restrictive the limit is. And every time the fuzzy checkpoint will be the oldest and most inaccessible data expelled, which also ensures that each eviction is the oldest data, the next time the log is overwritten by the data is already brush disk log information. The LSN of the last old, non-accessed data is the low-water tag of the transaction log, and InnoDB has always wanted to increase the value of this LSN to ensure that buffer pool has enough space to flash into the new data. It also ensures that the database transaction log file can be overwritten when it is written with sufficient space to use. A larger set of transaction logs can reduce the urgency of freeing up log space, thereby greatly improving performance.

When InnoDB refreshes the dirty page, he will find the oldest dirty page corresponding LSN and mark it as Low-water, and then log the information to the head of the thing log, so each refresh of the dirty page is going to be from flush The head of the list is refreshed. At the time of advancing the marker position of the oldest LSN, it was essentially a checkpoint.

When InnoDB down, he would do some extra work, first: Stop all data updates and so on, second: the dirty page in buffer data refresh on the disk, third: Record the last LSN, because we said above, this happened is sharp Checkpoint, and this LSN is written to the head of a database file without a header to mark the LSN location at the time the checkpoint last occurred.

We know that the higher the frequency of the dirty page data, the larger the load on behalf of the entire database, the smaller of course the pressure on the database will be smaller. Set the log file very large to be able to re-checkpoint during the time of the reduction of the disk IO, the total size is best to be set to the same size as the buffer pool, of course, if the log file settings are too large, then MySQL will be crash recovery when it takes longer (before 5.5).

InnoDB Storage Engine Introduction-(4) Checkpoint mechanism one

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More