MySQL innodb engine details

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Innodb is the MySQL storage engine for transaction security. It is the preferred storage engine for core tables in oltp applications.

The first storage engine for MySQL to support transactions is BDB.

MySQL's first complete support for ACID transactions is innodb

Innodb features row lock design supports MVCC supports foreign keys to provide consistent non-locked read while being designed for the most effective use and use of memory and cpu

Version	Function
Earlier versions of innodb	Support for ACID row lock design MVCC
Innodb 1.0.x	Added the compress and dynamic page formats.
Innodb 1.1.x	Added multiple Rollback segments for Linux AIO.
Innodb 1.2.x	Added full-text index support for online indexing

Innodb architecture

The innodb storage engine has multiple memory blocks which can be considered as a large memory pool.

1. Maintain multiple internal data structures to be accessed by all processes/threads

2. Cache the data on the disk for easy and fast reading and caching before modifying the data on the disk file

3. redo log buffer

Ps: // yqfile.alicdn.com/ccc45cfe41bcff82cb232618f713c01c9765dd4b.png "/>

The main function of the background thread is to refresh the data in the memory pool in the pool to ensure that the memory in the buffer pool is the most recent data. It refreshes the modified data file to the disk file while ensuring database exceptions. innodb can be restored to normal

Background thread

The innodb storage engine is a multi-threaded model. Therefore, multiple backend threads are responsible for processing different tasks.

1 Master Thread

MT is a very core backend thread. It is responsible for refreshing data in the buffer pool to the disk to ensure data consistency, including refreshing dirty pages, merging, inserting buffer, and undo page recycling.

2 io thread

AIO (Async IO) is widely used in the innodb storage engine) this can greatly improve the performance of the database. IO Thread is mainly responsible for the callback (call back) processing of these IO requests.

Before innodb 1.0, there were four io threads: write read insert buffer and log io thread. However, the number of io threads on the Linux platform cannot be adjusted, but on the windows platform, parameters can be used. innodb_file_io_threads to increase io thread

Since innodb 1.0.x, read thread and write thread are increased to four, respectively, and innodb_file_file_io_threads parameters are not used, but innodb_read_io_threads and innodb_write_io_threads are used respectively.

You can run the show engine innodb status command to observe the IO Thread in innodb.

Show engine innodb status \ G;

3 Purge Thread

After a transaction is submitted, the undolog used by the transaction may not be needed. Therefore, PurgeThread is required to recycle the undo pages that have been used and allocated.

Before innodb 1.1, the purge operation was completed only in the master thread of the innodb storage engine.

The purge operation from innodb 1.1 can be performed independently in a separate thread to reduce the work of the master thread, thus improving the cpu usage and the performance of the storage engine.

You can add the following command to the configuration file of the MySQL database to enable the independent Purge Thread.

[Mysql]

Innodb_purge_threads = 1

In innodb 1.1, even if innodb_purge_threads is set to be greater than 1 when the innodb storage engine is started, it is set to 1 and the following similar prompt appears in the error file:

In innodb 1.2, innodb supports multiple purge threads to further accelerate the collection of undo pages. In this way, the random read performance of disks can also be further exploited. Users can set up four purge threads.

4 page cleaner thread

Innodb 1.2.x introduces page cleaner thread

Purpose: add the refresh operations of zookeeper in earlier versions to a separate thread to reduce the congestion of the original master thread on the user's query thread.

Memory

1 buffer pool

The innodb storage engine provides disk storage and manages incentives on a page. It can be regarded as a disk-base database) because of the gap between cpu and disk speed, the database system on the disk usually uses the buffer pool technology to improve the overall performance of the database.

The buffer pool is a memory area that uses the memory speed to compensate for the slow disk speed and the impact on the database performance. When reading pages in the database, you first store the read pages from the disk in the buffer pool. when you read the same page in the buffer pool for the next time, the system first checks whether the page is in the buffer pool. If the page is hit in the buffer pool, the system directly reads the page. Otherwise, the system reads the disk. page

To modify pages in the database, first modify pages in the buffer pool and then refresh the pages to the disk at a certain frequency. Note that the page is not refreshed from the buffer pool every time when a page is updated, it is triggered by refreshing the page back to the disk through a mechanism called checkpoint. This is also to improve the overall performance of the database.

The size of the buffer pool directly affects the overall performance of the database.

Data pages cached by the buffer pool are of the following types: index page data and undo page insert buffer adaptive hash index innodb lock info) the buffer pool, such as data dictionary information, not only caches index pages and data pages, but also occupies a large part of the buffer pool.

After innodb 1.0.x, multiple buffer pool instances are allowed to be evenly allocated to different buffer pool instances per page based on the hash value. The advantage of this is to reduce intra-database resource competition and increase the concurrent processing capability of the database.

Show variables likes 'innodb _ buffer_pool_instances '\ G;

Change this parameter to a value greater than 1 to obtain multiple buffer pool instances and then run the show engine innodb status \ G command to observe the status of the buffer pool after modification.

MySQL 5.6 can observe the buffer status through the innodb_buffer_pool_status table in the information_schema architecture.

2 LRU List Free List Flush List

The buffer pool in the database is least recently used through LRU (latest recent used) pages that are most frequently used by algorithms for management are at the front end of the LRU list, while pages that are least used are at the end of the LRU list, that is, pages that are frequently used are placed at the front end of the LRU list and are least used. page at the end of the LRU list when the buffer pool cannot store newly read pages, the page at the end of the LRU list will be released first.

In the innodb storage engine, the page size in the buffer pool is 16 kB by default. Similarly, the LRU algorithm is used to manage the buffer pool. The difference is that innodb storage has optimized the traditional LRU algorithm in the innodb storage engine. in the LRU list, we also add the midpoint location, that is, the newly read page is not directly placed in the header of the LRU connection, but in the midpoint location. This algorithm is called the midpoint insertion strategy by default in the innodb storage engine. at 5/8 of the LRU list length

Parameter: innodb_old_blocks_pct control

Show variables like 'innodb _ old_blocks_pct '\ G;

Reasons for using midpoint

Some SQL operations may refresh the pages of the buffer pool and affect the efficiency of the buffer pool. Common operations are index or data scan operations. These operations need to access many or even all pages in the table. generally, this query operation only requires hotspot data that is not active. If the page is placed in the LRU header, it is very likely that the required hotspot data page will be removed from the LRU list. when reading this page, innodb storage engine needs to access the disk again

Innodb_old_blocks_time

It indicates how long it takes for the page to read the mid without support before it will be added to the hot end of the LRU list. Therefore, when you need to execute the aforementioned SQL operation, you can use the following method to make the hot spot in the LRU list data is not flushed out

Set global innodb_old_blocks_time = 1000;

If you predict that your active hotspot data is more than 63 percent, you can modify innodb_old_blocks_pct before executing the SQL statement to reduce the probability of hotspot pages being flushed.

The LRU list manages read pages, but when the database is started, the LRU list is empty and there are no pages. At this time, the pages are stored in the Free list. When you need to pagination from the buffer pool find available Free pages in the Free list. If yes, delete this page from the Free list and put it into the LRU list. Otherwise, the page at the end of the LRU list will be eliminated based on the LRU algorithm. when a page is added from the old part of the LRU list to the new part, the operation is called page made young. Because of the setting of innodb_old_blocks_time, the page is not moved from the old part to the new part. some operations are called page noe made young. You can run the show engine innodb status command to observe the usage and running status of the LRU list and Free list. This status shows not the current status, but a certain time range in the past. the status of the innodb storage engine in per second averages calculated from the last 24 seconds indicates the last 24 hours

Page made young shows the number of page moves to the front-end in the LRU list. Because the innodb_old_blocks_time value of the server is not changed during the running stage, not young is 0 youngs/s non-youngs, which indicates the two operations are wonderful. number of times

Buffer pool hit rate indicates that the buffer pool hit rate is closer to 1 or equal to 1. This indicates that the buffer pool is running in a very good state and should not be less than 9%. If the buffer pool hit rate value is less than 9% check whether the LRU list is contaminated due to full table scan.

Innodb_buffer_pool_stats: Observe the running status of the buffer pool. Table in information_schema

Innodb_buffer_page_lru: Observe the specific information of each page in the lru list.

Innodb1.0.x starts to support page compression. We can say that 16 KB pages are compressed to 1 KB, 2 kB, 4 kB, and 8 KB pages. However, due to the size of the compressed pages, the lru list has been slightly changed for non-16 KB pages. run the show engine innodbdb status command through the unzip_LRU list.

Here, lru len contains the unzip_lru len column.

Page management methods for different compression sizes: in the unzip_LRU list, the pages with different compression sizes are managed separately. Secondly, the partner algorithm is used to allocate memory and add the 4 kb size to the application page from the buffer pool. The process is as follows:

1. Check the 4 KB unzip_LRU list to check whether there are available idle pages;

2. If yes, use it directly.

3 otherwise, check the 8kb unzip_LRU list;

4. If you can go to the idle page, divide the page into two 4kb pages and store them in the 4kb unzip_LRU list.

5. If you cannot obtain a free page and apply for a 16 kB page from the LRU list, the page is divided into one 8 kB page and two 4 KB pages are stored in the corresponding unzip_LRU list respectively.

You can also observe the pages in the unzip_LRU list through the innodb_buffer_page_lru table in the information_schema architecture.

After a page in the LRU list is modified, it is called a dirty page) that is, the page data in the buffer pool is inconsistent with the page data on the disk. In this case, the database refresh the dirty page back to the disk through the checkpoint mechanism, and the page in the flush list is the dirty page storage of the dirty page list. the LRU list also exists in the LRU list and in the flush list. The LRU list is used to manage the availability of the buffer pool page. Flush is used to manage the page refresh back to disk.

3. Redo log cache

The innodb storage engine first puts the smelly log information into this buffer zone and then refreshes it to the redo log file at a certain frequency.

Parameter: innodb_log_buffer_size 8 MB by default

Refresh the redo log file of the redo log buffer to the external disk:

1 master thread caches the redo log to the redo log file every second

2. When each transaction is submitted, the redo log buffer is refreshed to the redo log file.

3. When the remaining space in the redo log buffer pool is less than 1/2, the redo log buffer is refreshed to the redo log file.

4 extra memory pool

The innodb storage engine manages the memory in a way called memory heap.

4. checkpoint technology

Write ahead log write policy to avoid data loss. Redo the log to restore the data.

Recovery conditions for downtime by redoing logs

1. The buffer pool can cache all data in the database.

As the database grows over time, increasing data to the memory is insufficient to cache all data, which is hard to guarantee for databases in production environments.

2. Redo logs can be increased wirelessly.

Redo logs increase in wireless mode, which requires too many pages to facilitate O & M. DBA or SA does not know when to redo logs to determine if they are close to the threshold of available disk space and allows storage devices certain skills and device support are required to support dynamic expansion pages.

3. The longer the recovery time, the higher the cost.

Advantages of checkpoint

1. Shorten the database recovery time

2. When the buffer pool is insufficient, refresh the dirty pages to the disk.

3. Refresh dirty pages when redo logs are unavailable

In this way, when the database goes down, no redo logs are required. Because the pages before the checkpoint have been refreshed to the disk, you only need to recover the redo logs after the checkpoint. This greatly shortens the recovery time.

When the buffer pool is insufficient, it will overflow the least recently used pages based on the LRU algorithm. If this page is a dirty page, you need to force the checkpoint dirty pages to be the new version of the page to be flushed back to the disk.

Redo log reuse

The innodb storage engine uses the LSN (log sequence number) to mark version 2 as an 8-byte number unit, each page has an LSN redo log page with an lsn checkpoint page with an LSN can be observed according to show engine innodb status

Checkpoint function: fl the dirty pages in the buffer pool to the disk, but the number of pages to be flushed each time to which the dirty pages are retrieved from the disk and when the checkpoint is triggered is different.

Checkpoint type

1 sharp checkpoint

Purpose: refresh all dirty pages to the disk when the database is closed. This is the default method.

Parameter: innodb_fast_shutdown = 1

2. fuzzy checkpoint

If the database uses sharp checkpoint during normal operation, the possibility of the database will be greatly affected. Therefore, the innodb storage engine uses fuzzy checkpoint to refresh the page (only part of the dirty page is refreshed rather than the refresh all dirty pages back to disk)

Fuzzy checkpoint

1 master threadcheckpoint

Refresh a certain percentage of pages from the dirty page list of the buffer pool every second or every 10 seconds (the asynchronous query thread will not block)

2 flush_lru_list checkpoint

Before innodb 1.1x, check whether the LRU list contains sufficient information (100) when the available space operation occurs in the user query thread, the query will be blocked. If there is no innodb storage engine, the page at the end of the lru list will overflow. If there are dirty pages in these pages, the checkpoint is required.

Innodb 1.2.x starts this check and is placed in a separate page cleaner thread. The number of available pages in the innod_lruscan depth control list is 1024 by default.

3 async/sync flush checkpoint

When the redo log is unavailable, you need to force some pages to refresh the disk. In this case, the dirty page is selected from the dirty page list. If The LSN that has been written to the redo log is marked as redo_lsn checkpoint_lsn at the end of the lsn on the latest disk page has been refreshed

Checkpoint_age = redo_lsn-checkpoint_lsn

Async_water_mark = 75% total_redo_log_file_size

Async/sync flush checkpoint is used to ensure the availability of redo logs.

Before innodb 1.2x, async flush checkpoint will block the user query thread that finds the problem. sync flush checkpoint will block the user query thread and wait until the dirty page is refreshed.

MySQL official version does not check whether refresh is performed by checkpoint in the flush list or LRU list. It does not know the number of async/sync flush times generated by redo logs. However, the innoSQL version can use the show engine innodb status to observe

4 dirty page too much checkpoint

Too many dirty pages lead to the innodb storage engine force checkpoint. In general, the goal is to ensure that there are enough pages in the buffer pool to ensure that there are enough available pages in the buffer pool.

Parameter innodb_max_dirty_pages_pct

5. How the master thread works

Master thread before innodb 1.0.x

The master thread has the highest priority among threads. It consists of multiple loops and consists of a main loop. The backgroup loop is used to refresh the loop) the suspend loop master thread will switch in these loops.

Loop

Loop loops use thread sleep to implement latency when load is high.

Operations per second include

1. Refresh the log buffer to the disk even if the transaction has not been committed (always)

2 merge insert buffer (possible)

3. Refresh dirty pages in the buffer pool of up to 100 innodb to the disk (possible)

4. If no user activity exists, switch to the background loop (possibly)

The reason why another large transaction commit takes a short time: The innodb storage engine will refresh the content in the redo log buffer to the redo log file every second.

The innodb storage engine determines whether the number of I/O operations in the current second is less than 5. If the number of innodb operations is less than 5, the merge insert buffer can be executed if the innodb workload is low. operations

Buf_get_modified_ratio_pct is more than 90 percent of innodb_max_dirty_pages_pct before Manual disk synchronization is required to write 100 dirty pages to the disk

Operations every 10 seconds

1. Refresh 100 dirty pages to the disk (possible)

2. Merge up to 5 insert buffers (always)

3. Refresh the log buffer to the disk (always)

4. Delete useless undo pages (always)

5. Refresh 100 or 10 dirty pages to the disk (always)

Process

1. Determine whether the disk I/O operations within 10 seconds are less than 200

2 less than innodb thinks there is sufficient IO capability to refresh 100 of dirty pages to the disk

3. The innodb storage engine will merge and insert the buffer. This merge and insert the buffer will always be in this clip.

4. The innodb storage engine will refresh the entry buffer to the disk again. This is the same as the operation that occurs once in seconds.

5 innodb further executes full purge to delete useless undo pages when performing operations such as table update and delete, the original row is marked as deleted but because of consistent read (consistent read) the innodb storage engine determines whether the row that has been deleted by the transaction system can be deleted during full purge. For example, sometimes there are query operations that need to read the previous version. if innodb can be deleted, it will be deleted immediately.

6 buf_get_modified_ratio_pct> 70% refresh 100 dirty pages to the disk <70% refresh 10% dirty pages to the disk

When no user is active (the database is idle) or the database is shut down)

Operations

1. Delete useless undo (always)

2. Merge 20 insert buffers (always)

3. Jump back to the main loop (always)

4. Refresh 100 pages constantly to know the conditions are met (may jump to the flush loop to complete)

After jumping to the flush loop, if there is nothing to do with the innodb storage engine, it will switch to the suspend loop to suspend the master thread and wait for the event. If the user enables the (enable) innodb does not use any innodb tables, so the master thread remains suspended.

Master thread before innodb 1.2.x

The innodb_io_capacity parameter is used to indicate that the disk io throughput is 200 by default. The number of refreshed disk pages is controlled by the innodb_io_capacity percentage.

1. The number of merged insert buffers is innodb_io_capacity.

2. The number of dirty pages refreshed from the buffer zone is innodb_io_capacity.

It is used to solve the problem that the master thread is too busy to generate more than 20 inserts of buffer in write-intensive applications.

Parameter innodb_max_dirty_pages_pct

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More