InnoDB Key features of the InnoDB storage engine

Source: Internet
Author: User

1. Insert Buffer
A.insert Buffer
It makes sense to listen to a name. The insert buffer is part of the buffer pool. It's not like that. The InnoDB buffer pool has insert buffer information, but insert buffer is a part of the physical page as well as the data page. In the InnoDB storage engine, the insertion order of row Records is inserted in the order in which the primary key is incremented. Therefore, inserting a clustered index (Primary Key) is generally sequential and does not require random reads of the disk. But not all of the primary keys are sequential. If the primary key is a UUID, then the insertion and the secondary index are random. So the primary key when building a table is usually the self-increment ID and is not empty.
For an INSERT or update operation on a nonclustered index, instead of inserting directly into the index page every time, it is decided whether the inserted nonclustered index page is in the buffer pool, if it is inserted directly, or in inset buffer if not. The merge operation of the insert buffer and the secondary Index page child node is then performed at a certain frequency and condition. It is often possible to merge many inserts into one operation (because in an index page), which greatly improves performance for nonclustered indexes. However, the use of inset buffer needs to meet two conditions: 1. The index is a secondary index; 2. The index is not unique. If it is a unique index, the database will look up the index page to determine the uniqueness of the inserted record, and this will happen with discrete reads, causing the insert buffer to lose meaning. You can view the insert buffer information by command show engine InnoDB status. However, in a write-intensive situation, the insertion buffer consumes too many buffer pools, which can account for up to 1/2 of the buffer pool by default. This may have some impact on other operations. Percona released patches to fix the situation. Can be set by the Ibuf_pool_size_per_max_size parameter. Specific can be found on the official website.
B.change Buffer

InnoDB introduced the change Buffer starting with the 1.0.x release. The DML operation-insert, delete, and update are buffered. are: Insert buffer, Delete buffer, Purge buffer. The object used by change buffer is still a non-unique secondary index. An update operation on a record may be divided into two procedures: 1. The record tag has not been deleted; 2. Actually delete the record. So delete buffer is the first procedure for the update operation, Purge buffer corresponds to the second procedure for the update operation. You can use the parameter innodb_change_buffering to open various buffer options. The optional values for this parameter are: Inserts, deletes, purges, all, none. Changes means that enabling inserts and Deletes,all means all is enabled, none means not enabled. Default all. The InnoDB 1.2.x can also control the maximum amount of memory used by the parameter innodb_change_buffer_max_size (percent).


There are graphs to see the merged operations and discarded operations shown here. And the bottom shows the number of times each operation in the change buffer is specified. Insert indicates that insert Buffer;delete mark indicates that the delete buffer;delete represents Purge buffer;discarded operations means that when merge is occurring in Change Buffer, The table has been deleted, and you do not need to merge the records into the secondary index.
Internal implementation of the C.insert buffer
In versions prior to MySQL 4.1, each table had a single insert buffer B + tree. In the current version, there is only one global insert buffer B + Tree, which is responsible for insert buffer for all non-unique secondary indexes on all tables. and this B + tree is placed in the shared tablespace. Therefore, attempting to recover data from a table by using a standalone tablespace ibd file often causes the check table to fail. This is because the data in the secondary index of the table may still be in insert buffer, so after recovering from the IBD file, you also need to rebuild the secondary index in the table through repair table.

Insert Buffer is a B + tree, so it is also composed of leaf nodes and non-leaf nodes, and non-leaf nodes hold the query Amount search key (key value), which is constructed as follows:


Search key occupies 9 bytes, where space (4 bytes) represents the Tablespace ID of the table in which the record is to be inserted (in the InnoDB storage engine, each table has a unique space ID, which can be queried by the space ID). The marker occupies 1 bytes and is used to accommodate the old version of Insert Buffer. Offset indicates the offset of the page, which accounts for 4 bytes.

When a secondary index is to be inserted into the page (space, offset), if the page is not in the buffer pool, then the InnoDB storage engine first constructs a search key based on the above rules, and then queries insert buffer for the B + tree, This record is then inserted into the leaf node of the insert Buffer B + Tree. The records inserted into the leaf nodes of the InnoDB Buffer B + Tree are not inserted directly, but are constructed according to the following rules:


Space, marker, and offset fields have the same meaning as non-leaf nodes. The metadata occupies 4 bytes and stores the contents as follows:


Ibuf_rec_offset_count holds a 2-byte integer that is used to sort each record into the insert buffer order. Starting with the 5th column of the insert buffer leaf node is the field where the record is actually inserted. Therefore, the Insert Buffer B + Tree requires an additional 13 bytes of overhead compared to the original insert record.

Because the Insert buffer index is enabled, records in the secondary index page (space, page_no) may be inserted into the insert buffer B + tree, so to ensure that each merge insert buffer page must succeed, You also need a special page to mark the available space for each secondary index page (space,page_no). The type of this page is called Insert Buffer Bitmap. Each insert Buffer bitmap page is used to track 16384 (256 extents (Extent)) secondary index pages, and each insert Buffer bitmap page is on the second page of 16,384 pages. Each secondary index page occupies 4 bits (bit) in the Insert Buffer bitmap page, with the following structure:


D.merge Insert Buffer
In summary, the Merge Insert buffer operation may occur in the following situations:
1. When the secondary index page is read to the buffer pool;
2.Insert Buffer Bitmap page trace to the secondary index page no free space;
3.Master Thread;
The first case is when a secondary index page is read into the buffer pool, as this is done in the Select query operation, you need to check the Insert Buffer bitmap page, and then the secondary index page has records stored in the insert buffer B + tree. A record of the page in the Insert Buffer B + tree is inserted into the secondary Index index page.
In the second case, the Insert Buffer bitmap page is used to chase the free space of each secondary page and has at least 1/32 pages of space, and when inserting a secondary index record detects that the available space is less than 1/32 pages after inserting the record, a merge is forced to read the secondary index page, Inserts a record of the index page in the Insert Buffer B + Tree and the record to be inserted into the secondary index page.
In the third case, the merge Insert buffer operation is performed every 10 seconds in the master thread thread every second. The difference is that the number of merge operations pages is different each time. Each merge operation has more than one page, and instead determines how many secondary index pages are actually merged based on the percentage of srv_innodb_io_capactiy. In the Insert Buffer B + Tree, the secondary index page is sorted according to (space, offset), so the page can be selected based on the sort order (space, offset). However, for the selection of the Insert buffer page, the InnoDB storage engine does not use this method to randomly select a page in the insert buffer B + tree to read space on the page and the number of pages needed later. In the case of merge, the table for the merge operation has been deleted, and data records that have been insert/change buffer can be discarded directly.

2. Write two times

Insert buffer improves the performance of the InnoDB storage engine, while Doublewrite (two write) brings reliability to the data pages of the InnoDB storage engine. This is because, when the database is down, the InnoDB storage engine may be writing a page to the table, and this time only writes a portion (such as a 16K page and writes only the first 4K), which is known as partial write invalidation (partial page write). You might be thinking about recovering with a redo log. This is a way. However, the redo log records the physical operation of the page, such as offset 800, which writes ' AAAA ' records. If the page itself has been damaged, it is not interesting to redo it. This is the need for a copy of the page before the redo log is applied, and when the write is invalidated, the page is restored by the duplicate of the page before being re-made. This is doublewrite. Such as


The doublewrite is made up of two parts, a doublewrite buffer in memory, a size of 2M, and a portion of 128 consecutive pages (i.e. 2 extents (extent)) in a shared tablespace on a physical disk that is also 2M. When a dirty page in the buffer pool is refreshed, it is not written directly to the disk, but instead the dirty page is copied to the in-memory Doublewrite buffer by the memcpy function, and then two times through Doublewrite buffer, each 1M sequential write to the shared table space on the physical disk , then call the Fsync function immediately and synchronize the disk to avoid the problem of buffer writing. When the Doublewrite page is finished writing, the writes in Doublewrite buffer are written to each tablespace file, and the write is discrete. Can be ordered by the show global status like '%innodb_dblwr% ';


You can see that Doublewrite wrote a total of 1,413,988 pages, but the actual number of writes was 111623. If Innodb_dblwr_pages_written:innodb_dblwr_writes is less than 64:1, the system write pressure is not very high. The parameter innodb_buffer_pool_pages_flushed represents the current number of flushes from the buffer pool to the disk page. As you can see from the above, the safest way to write the statistics in a production environment is to pass through the Innodb_dblwr_pages_written parameter. Parameter innodb_doublewrite can be used to set whether the Doublewrite function is turned on. Skip_innodb_doublewrite can also disable the use of the Doublewrite feature.
Note: Some file systems themselves provide a mechanism for preventing partial write failures, such as ZFS file systems. In this case, you do not have to enable doublewrite.
3. Adaptive Hash Index
Hashing is a very fast method of finding, in the general case the time complexity is O (1). The number of B + trees, depending on the height of the B + tree, in the spawning environment, the B + tree is typically 3-4 levels high and does not need to be queried 3-4 times. The InnoDB storage engine monitors queries on the index pages on the table. If the resume hash index is observed to increase speed, this CV hashes the index, called an Adaptive Hash Index (Adaptive hash, AHI). Ahi is constructed from the B + Tree page of the buffer pool. It is therefore very fast to build and do not build a hash index on the entire table. InnoDB Storage Yo inquiry will automatically build a hash index for some hot pages based on the frequency of the house and the strange.

Ahi There is a requirement that the continuous access mode (query condition) for this page must be the same. For example, a federated index (a, b) whose access mode can have the following conditions: 1.WHERE a=xxx;2.where a=xxx and b=xxx. If you alternate between the two queries, the InnoDB storage engine does not construct ahi for the page. In addition, AHI has the following requirements: A. Accessed 100 times in this mode; B. The page accesses n times through this mode, where n= page Records/16. According to official documents, when AHI is enabled, the speed of reading and writing can be increased by twice times, and the link operating performance for the index can be increased by 5 times times. The design idea is to liberalize the database without the need for DBAs to manually adjust the database.


You can see the usage information for AHI, including the size of the ahi, usage, and the use of AHI searches per second. A hash index can only be used to query for equivalence, whereas a hash index cannot be used for other types. So here appears Non-hash searches/s. The parameter Innodb_adaptive_hash_index can be used to determine whether to turn on.
4. Asynchronous IO
To improve the performance of disk operations, the current database system uses asynchronous IO (AIO). Prior to InnoDB 1.1.x, the implementation of AIO was simulated by the code in the InnoDB storage engine. But since then, support for kernel-level AIO has been provided, called native AIO. Native AIO requires an operating system support. Both Windows and Linux are supported, while Macs are not available. This is a factor to consider when selecting the operating system of the MySQL database server. MySQL can use the parameter Innodb_use_native_aio to decide whether to enable native AIO. In the InnoDB storage engine, read ahead reads are done through AIO, dirty page refreshes, and also through AIO.
5. Refresh the adjacency page
When a dirty page is refreshed by the InnoDB storage engine, all pages of the page's region (extent) are detected and refreshed together if the page is dirty. The benefit of this is that multiple IO writes can be combined into an IO operation through AIO. This working mechanism has a significant advantage over traditional mechanical disks. But there are two things to think about: a. Is it not a dirty page to write, and the page will soon become dirty page? B. Solid-state drives have high IOPS, do you still need this feature? For this reason, the InnoDB storage engine version 1.2.x starts with the parameter innodb_flush_neighbors to determine whether it is enabled. It is recommended for traditional mechanical hard drives and can be turned off for solid state drives.

InnoDB Key features of the InnoDB storage engine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.