InnoDb system architecture and features (Innodb Storage engine Reading Notes summary), innodb Reading Notes

Last Update:2017-04-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background thread

• Master Thread

The core background thread is responsible for Asynchronously refreshing the data in the buffer pool to the disk. For example, refresh dirty pages, merge insert buffering, and recycle undo pages.

Operations per second:

1. The log buffer is refreshed to the disk even if the transaction has not been committed. This operation always happens. This is to make a large transaction and the commit time is very short.

2. When the I/O pressure is very small (the number of I/O operations in 1 s is less than 5% innodb_io_capacity), the insert buffer of 5% innodb_io_capacity is merged.

3. When the proportion of dirty pages is greater than innodb_max_dirty_pages_cnt, refresh the dirty pages in the innodb_io_capacity buffer pool to the disk. Otherwise, if innodb_adaptive_flush is enabled, select the appropriate number of dirty pages to be refreshed Based on buf_flush_get_desired_flush_rate.

Operations every 10 seconds:

1. If the I/O operation in the past 10 S is less than innodb_io_capacity, refresh the dirty pages in the innodb_io_capacity buffer pool to the disk.

2. Merge 5% innodb_io_capacity insert buffers.

3. Refresh the log buffer to the disk.

4. Delete useless undo pages.

5. If the proportion of dirty pages in the buffer pool exceeds 70%, refresh innodb_io_capacity dirty pages to the disk again. Otherwise, refresh the 10% innodb_io_capacity dirty pages.

Background loop (when the database is idle or the database is closed ):

1. Delete useless undo pages.

2. Merge innodb_io_capacity insert buffers.

Flush loop (idle database ):

1. Refresh innodb_io_capacity dirty pages

• IO Thread

The Innodb Storage engine uses a lot of AIO, And the IO Thread is mainly responsible for the callback of IO requests. You can use innodb_read_io_threads and innodb_write_io_threads to adjust the parameter list.

• Purge Thread

After the transaction is committed. Undolog related to this transaction may no longer be required. Purge Thread is used to recycle unwanted undo pages.

• PageCleaner Thread

Updates dirty pages. Reduce the work of the master thread and the blocking of the user query thread.

Memory Buffer Pool

To modify pages in a database, first modify the pages in the buffer pool, and then refresh the pages to the disk at a certain frequency. This means that the disk is refreshed not every time the page in the buffer pool is modified, but by using the checkpoint technology. You can set the buffer pool size through innodb_buffer_pool_size.

The data page types of the buffer pool include: data page, index page, undo page, insert buffer, adaptive hash index, innodb stored lock information, and dictionary information.

Currently, the innodb Storage engine allows multiple buffer pool instances. In this way, the lock competition is reduced by hash to different buffer pool instances. This parameter can be set through innodb_buffer_pool_instance.

The buffer pool is a large memory area, and the database is managed using the LRU algorithm. However, because of the full table scan operation. Therefore, the simple LRU algorithm is not used. The position of the midpoint added to the LRU list. The newly read page is not directly placed in the header of the lru list, but in the midpoint position. By default, 5/8 of the lru list length is displayed. Controlled by the innodb_old_blocks_pct parameter.

Insert Buffer

For non-clustered Index Insert and update operations, the Innodb Storage engine is not directly inserted into the index page, but the Insert Buffer. Then, insertbuffer and merge of the secondary index leaf node are performed at a certain frequency. Usually Merge multiple random inserts into one operation. This greatly improves the performance of non-clustered index insertion.

Innodb uses the insertbuffer condition:

• Non-clustered Indexes

• The index is not unique (if it is unique, you need to find the index to ensure unique)

Insert Buffer internal implementation

The data structure of Insert Buffer is a B + tree. After Mysql 4.1, there is only one B + tree in the world, which is responsible for inserting Buffer into the secondary indexes of all tables. In addition, this tree is stored in the shared tablespace, Which is ibdata1 by default. Therefore, if you only restore data in an independent tablespace ibd file, it may fail. You also need to use insert buffer to recover secondary indexes on the table.

The non-leaf nodes of the Insert Buffer store the query key. The structure is as follows: space (4 bytes) + marker (1 byte) + offset (4 bytes ). Space indicates the tablespace ID of the table where the record is located, and offset indicates the offset of the page. Marker is used to be compatible with earlier versions.

The structure of the Insert Buffer leaf is as follows: space + marker + offset + metadata + records. Space, marker, and offset have the same meaning as above. IBUF_REC_OFFSET_COUNT in metadata stores two-byte Integers to sort records into the Insert Buffer sequence. The correct value of the record is returned in this order. The records actually inserted start from column 5th of the Insert Buffer leaf node.

After the Insert Buffer Index is enabled, records on the secondary index page may be inserted into the Insert Buffer B + tree. To ensure that the buffer is successfully inserted for each merge, you must have a place that can mark the available space of each secondary index page. Insert Buffer uses a special page to mark. The page type is Insert Buffer Bitmap. Each Insert Buffer Bitmap page is used to track 16384 pages, that is, 256 zones. Each Insert Buffer Bitmap page is located on the second page of the 16384 page. Each secondary index page occupies 4 bytes in the Bit map, which is mainly used to indicate the number of available secondary index pages.

Merge insert Buffer

The records in the Insert Buffer are merged into the real secondary index in the following cases:

• The secondary index page is read to the buffer pool;

• When the Insert Buffer Bitmap page traces that the secondary index page has no available space;

• Master Thread scheduling;

In this way, multiple record operations on the secondary index page are merged to the original secondary index page through one operation, thus improving performance.

Double Write)

InsertBuffer improves the performance of the Innodb Storage engine, and the two writes bring the reliability of data pages to the Innodb Storage engine.

There may be questions. If a write failure occurs, isn't it possible to restore it by redoing the log? This is indeed a method, but it must be known that redo logs record physical page operations, such as offset 800, write 'aaa' records. However, it is meaningless to redo the damaged page. This means that a correct copy of the page must exist before you modify the page. When a write failure occurs, restore the page through the copy of the page and then redo it, this is double write.

Double write is composed of two parts, one of which is the double write buffer in the memory. The other part is 128 consecutive pages in the shared tablespace on the physical disk. The size is the same as that in the memory (2 MB. When refreshing pages in the buffer pool, the disk is not directly written, but memcpy to double write buffer. then, the double write buffer is used for two times to write 1 Mbit/s data to the shared tablespace in sequence, and fsync is immediately called to synchronize the disk. Because the double write pages of the shared tablespace are continuous, the overhead is not very large. After writing the double write page, writing the pages in the double write buffer to each tablespace is a discrete write.

If the operating system crashes when writing pages to the disk. Then, you can find the copy of the page from the double write buffer page in the shared tablespace. Copy it to the tablespace and then apply the redo log.

Adaptive HASH Index

The Innodb Storage engine monitors queries on the index pages of a table. If you find that creating a hash index can increase the speed. A hash index is called an adaptive hash index (AHI ).

AHI has a requirement that the continuous access mode for this page must be the same. For example, the joint index (a, B) can enable the access mode:

WHERE a = xxx

WHERE a = xxx and B = yyy

The same Access Mode means that the query conditions are the same. If the preceding query operation is executed alternately. Then no AHI is created.

In addition, AHI requires 100 accesses in the same mode and N accesses to pages in this mode, where N = records in the page * 1/16

Refresh the adjacent page

When a dirty page is refreshed, the Innodb Storage engine checks all pages in the dirty area of the page. If it is a dirty page, it refresh them together.

Asynchronous IO

Innodb uses asynchronous IO to process disk operations.

Check Point Technology

To avoid data loss, transaction database systems generally adopt the write ahead log policy. That is, when a transaction is submitted, the redo log is first written and then the page is modified.

However, redo logs cannot be infinitely increased, and the buffer value (dirty pages not refreshed to the disk) cannot be infinitely large. Even if it can be infinitely large, it will take a long time to restore the database after it goes down. Therefore, Check Point technology is required. This technology can solve the following problems:

• Shorten the database recovery time;

• When the buffer pool is insufficient, You can refresh the dirty pages to the disk;

• When redo logs are unavailable (redo logs are recycled), refresh the dirty pages to the disk;

When the database is restarted after it is down, you do not need to redo all the logs. Because the pages before Check Point have been refreshed to the disk, the database only needs to restore the redo logs after Check Point. This greatly shortens the recovery time.

For Innodb, we actually use LSN (Log Sequence Number) to compare versions. The LSN is an 8-byte number. Each page has an LSN, the redo log has an LSN, and the CheckPoint also has an LSN. You can see through the following command

mysql> show engine innodb status\G;.............Log sequence number 92561351052Log flushed up to 92561351052Last checkpoint at 92561351052

The above InnoDb architecture and features (Innodb Storage engine Reading Notes summary) are all the content that I have shared with you. I hope you can give us a reference and support for more customers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

InnoDb system architecture and features (Innodb Storage engine Reading Notes summary), innodb Reading Notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

InnoDb system architecture and features (Innodb Storage engine Reading Notes summary), innodb Reading Notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support