InnoDB Key Features

Source: Internet
Author: User

Mainly include:

Insert cache (insert buffer), two write (double write), adaptive hash (Adaptive Hash Index), asynchronous IO (Async io), Refresh adjacency page (Flush neighbor page)

1. Insert Cache 1.1 Insert Buffer

Insert Buffer is one of the most exciting and exciting features of the key features of the InnoDB storage engine. But the name may make people think that the insert buffer is an integral part of the buffer pool. In fact, the INNODB buffer pool has insert buffer information is good, but insert buffer and data page, is also an integral part of the physical page.

In general, a primary key is a unique identifier for a row. Typically, the insertion order of row Records in an application is inserted in the order in which the primary keys are incremented. Therefore, inserting a clustered index is generally sequential and does not require random reads of the disk. Because, for this kind of case inserts, the speed is very fast. (If the primary key class is a class such as a UUID, then the insertion and secondary index are random as well.) )

If the index is nonclustered and not unique. In the insert operation, the data stored in the nonclustered index leaf node is not in the order of insertion, then the need for discrete access to the nonclustered index page, due to the existence of random read the insertion operation performance degradation. This is because the properties of the B + tree determine the discretization of nonclustered index inserts.

Insert buffer design, for nonclustered index inserts and update operations, not every time directly inserted into the index page, but first to determine whether the Insert nonclustered index page in the buffer pool, if it exists, directly inserted, does not exist, first put in an insert buffer object. The nonclustered index of the database has been inserted into the leaf node, but is not actually, but is stored in another location. The merge (merge) operation of insert buffer and Secondary index page child nodes is then carried out in a certain frequency and situation, where multiple inserts are usually merged into one operation (because they are in an index page), which greatly improves the performance of inserts for nonclustered indexes.

Two conditions to meet: Index is a secondary index;

Secondary indexes cannot be unique, because when buffering is inserted, the database does not look up index pages to determine the uniqueness of the inserted record. If you look for it, there will be discrete reads, which can cause the insert buffer to lose its meaning. 1.2 Change Buffer

InnoDB has introduced change buffer from the 1.0.x version and can be considered an upgrade of insert buffer. Starting with this version, the InnoDB storage engine can buffer DML operations--insert, delete, update, respectively: INSERT buffer, delete buffer, Purge buffer.

Of course, as before insert buffer, the object of change buffer is still a non unique secondary index.

An update operation on a record can be divided into two procedures: mark the record as deleted (delete buffer), and delete the record (Purge buffer).

So delete buffer corresponds to the first procedure of the update operation, and the record is marked for deletion. Purge buffer corresponds to the second process of the update operation, which is about to record the real deletion. At the same time, the InnoDB storage engine provides the parameter innodb_change_buffering, which is used to open various buffer options. The optional values for this parameter are: Inserts, deletes, purges, changes, all, none. Inserts, deletes, purges are the three situations discussed earlier. Changes indicates that inserts and Deletes,all are enabled and none is enabled. The default value for this parameter is all.

Starting with the InnoDB 1.2.x version, you can control the amount of maximum memory used by the change buffer by parameter innodb_change_buffer_max_size innodb_change_buffer_max_ The size value defaults to 25, which means that up to 1/4 of the buffer pool memory space is used. It is important to note that the maximum valid value for this parameter is 50.

In the MySQL 5.5 version of the command show ENGINE INNODB STATUS, you can observe something similar to the following:

Mysql> show ENGINE INNODB status\g;
..... -------------------------------------
INSERT BUFFER and adaptive HASH INDEX
-------------------------------  ------
ibuf:size 1, free list len 34397, seg size 34399, 10875 merges merged operations

:

Insert 20462, delete Mark 20158, delete 4215

discarded operations:

Insert 0, delete mark 0, delete 0
...

You can see that the merged operations and discarded operation are shown here, and the number of times each operation in the change buffer is shown below. Insert represents insert Buffer;delete mark indicates that the delete buffer;delete represents purge buffer;discarded operations indicates that when the change Buffer occurs, the The table has been deleted, and you no longer need to merge the records (merge) into the secondary index. 1.3 Merge Insert Buffer

Through the previous section the reader should have known that Insert/change buffer is a B + tree. If you need to implement the secondary index page where the inserted record is not in the buffer pool, you need to insert the secondary index record into the actual B + tree. But when the records in insert buffer Merge (merge) into the real secondary index.

In summary, the operation of the Merge Insert buffer may occur when the secondary index page is read to the buffer pool, and the Insert buffer bitmap page traces to the secondary index page has no available space; Master Thread.

The first scenario is when the secondary index page is read into the buffer pool, for example, when performing a normal select query operation, you need to check the Insert Buffer bitmap page, and then confirm that the secondary index page has records stored in the insert buffer B + tree. If so, the record of the page in the Insert Buffer B + tree is inserted into the secondary index page. You can see that the record operations on the page are merged into the original secondary index page by one operation, so performance can be significantly improved.

The Insert Buffer bitmap page is used to track the free space of each secondary index page and has at least 1/32 pages of space. When a secondary index record is inserted and the free space is less than 1/32 pages after the inserted record is detected, a merge operation is forced to read the secondary index page, and the record of the page in the Insert Buffer B + Tree and the record to be inserted are inserted into the secondary index page. This is the second case mentioned above.

There is also a situation where, in analyzing master thread, a merge Insert buffer is performed once per second or every 10 seconds in the master thread thread, except that the number of pages per merge operation is different. 22 times written 2.1 Doublewrite Application Scenario

We know that the general size of the InnoDB data page is the smallest unit of 16kb,mysql access data and the page, and the operating system does not guarantee the atomicity of a data page, which means that when the data is written, it is possible to write half of a page (such as 8K) database downtime, This condition is called partial write invalidation (partial page write), which results in data loss.

People may ask, can't I do data recovery based on redo log? The answer is affirmative is also negative, to be divided into two cases: 1, database downtime, physical files intact, can be through the Redo log crash recovery. 2, the database downtime, the page is being flushed to disk, partial page write, and just on the disk of this data page due to downtime damage, then can not through the Redo log data recovery, why. We must be aware that the redo log records the physical operation of the page. For example, a Redo record "page number XX, offset 800 write record" This is ABC ", then when the page is corrupted, does this redo record make sense? So in this particular case, the doublewrite come in handy. 2.2 Doublewrite System structure and workflow

The doublewrite is made up of two parts, one part of which is the doublewrite buffer in memory, the size is 2MB, the other part is a contiguous 128 pages in the shared tablespace (Ibdata x) on the disk, that is, 2 districts (extent), and the size is 2M. The Doublewrite workflow is as follows: When a series of mechanisms (main function trigger, CHECKPOINT, etc.) triggers a dirty page in the data buffer pool to refresh, the disk is not written directly, but the dirty page is copied to the memory through the memcpy function doublewrite Buffer, which is then written to a physical disk in the shared table space two times, each 1MB order, by Doublewrite buffer. Call the Fsync function immediately, synchronizing the dirty pages into the disk. Because in this process, the Doublewrite page is stored continuously, so write disk in sequential write, performance is very high, after completing Doublewrite, then write dirty pages to the actual table space files, then write is discrete. The modules collaborate as follows (the first step should be the redo record logbuffer for dirty pages, and then Logbuffer write to redo log file, which is directly wired for simplifying minor steps):

To view doublewrite work, you can execute the command:

Mysql> show global status like ' innodb_dblwr% ' \g
*************************** 1. Row ***************************
Variable_name:innodb_dblwr_pages_written
        value:61932183
*************************** 2 row ***************************
variable_name:innodb_dblwr_ Writes
        value:15237891
2 rows in Set (0.00 sec)

The above data shows that Doublewrite wrote a total of 61,932,183 pages, a total of 15,237,891, from this set of data we can analyze, previously said after opening doublewrite, every dirty page refresh must first write Doublewrite, and doublewrite exist on the disk is two contiguous areas, each area consists of consecutive pages, typically a maximum of 64 pages, so one IO write should be able to write up to 64 pages. And according to the above my system Innodb_dblwr_pages_written and innodb_dblwr_writes ratio of view, about 4, far from 64, so from this angle can also be seen, the system write pressure is not high. 2.3 Crash Recovery

If the operating system crashes in the process of writing a page to disk, as shown above, during the recovery process, the InnoDB storage engine can find a recent copy of the page from the doublewrite of the shared tablespace, copy it to a tablespace file, and then apply Redo log to complete the recovery process. Because there are replicas, there is no concern about whether the data pages in the tablespace are corrupted. 3 Adaptive Hash Index

Hashing is a very fast way to find, in general, the time Complexity of O (1). B + Tree lookup times, depending on the height of B + trees, in the production environment, B + Tree height is generally 3-4 layers, do not need to query 3-4 times.

The InnoDB storage engine monitors queries on the index pages on the table. If a resume hash index is observed to increase speed, this CV hash index is called an Adaptive hash Index (Adaptive Hash Index, AHI). The ahi is constructed from the B + Tree page of the buffer pool. Therefore, it is very fast to build and do not build a hash index on the entire table. InnoDB Storage Yo inquiry will automatically set up a hash index for some hotspot pages based on the frequency of the house and the strangeness of the move.

AHI has a requirement that the continuous access mode (query conditions) for this page must be the same. For example, the Federated Index (A,B) its access mode can have the following: where A=xxx; 2.WHERE a=xxx and B=xxx.

If you alternate between these two queries, the InnoDB storage engine does not construct ahi on the page. In addition, AHI has the following requirements: 100 visits in this mode; B. The page accesses n times in this mode, where the n= page Records/16.

According to official documents, the speed of read and write can be increased twice times after Ahi is enabled, and the link operation performance of the index can be increased 5 times times. The design idea is that the database is liberalized without the need for the DBA to make an artificial adjustment to the database. 4 Asynchronous IO

To improve disk operation performance, the current database system uses asynchronous IO to handle disk operations. The same is true of InnoDB.

The corresponding to Aio is Sync IO, that is, each IO operation, you need to wait until the end of the operation to continue the next operation. However, if the user is sending an index-scanned query, the SQL statement may need to scan multiple index pages, that is, multiple IO operations are required. It is not necessary to scan each page and wait for it to complete before the next scan. The user can issue an IO request immediately after another IO request, when all IO requests are sent out, waiting for all IO operations to complete, this is AIO.

Another advantage of AIO is the IO merge operation, which combines multiple IO into one IO operation, which improves IOPS performance.

Prior to InnoDB 1.1.x, the implementation of AIO was modeled by the code in the InnoDB storage engine. But since then, the support of the kernel-level AIO has been provided, known as native Aio. Native AIO requires support from the operating system. Windows and Linux are supported, while Macs are not available. This is a factor to consider when choosing the operating system of a MySQL database server.

MySQL can decide whether to enable native AIO via parameter Innodb_use_native_aio. In the InnoDB storage engine, read ahead way of reading is done through AIO, dirty page refresh, is also done through AIO. 5 Refresh adjacent page

When a dirty page is refreshed by the InnoDB storage engine, it detects all pages of the page's area (extent) and, if it is a dirty page, refreshes it together. The advantage of this is that you can combine multiple IO writes into one IO operation through AIO. The working mechanism has a notable advantage under the traditional mechanical disk. But there are two questions to consider: whether to write less-dirty pages and then quickly become dirty pages. Solid state drives have high IOPS and need this feature.

This InnoDB storage engine 1.2.x version begins to provide parameter innodb_flush_neighbors to determine whether to enable. Recommended for traditional mechanical hard drives, while for solid-state drives can be turned off.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.