Key features of InnoDB-insert cache, write twice, adaptive hash index details, innodbhash

Last Update:2017-04-04 Source: Internet

Author: User

Tags percona

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Key features of InnoDB-insert cache, write twice, adaptive hash index details, innodbhash

Key features of the InnoDB Storage engine include insert buffer, double write, and adaptive hash index ). These features provide better performance and higher reliability for the InnoDB Storage engine.

Insert Buffer

Insert buffering is the most exciting feature of the InnoDB Storage engine. However, this name may be considered as a part of the buffer pool. In fact, the Insert Buffer information in the InnoDB Buffer pool is good, but the Insert Buffer is also a part of the physical page, just like the data page.

A primary key is a unique row identifier. The insert sequence of Row Records in an application is incremental by the primary key. Therefore, inserting clustered indexes is generally sequential and does not require random disk reading.

For example, we define tables according to the following SQL statements: create table t (id int auto_increment, name varchar (30), primary key (id ));

The id column increases automatically. This means that when the insert operation is executed, the id column increases automatically, and the Row Records on the page are stored in the order of IDs. Generally, you do not need to read the execution records of another page randomly. Therefore, the insert operation can be completed quickly in this case. However, it is impossible for each table to have only one clustered index. In more cases, a table has multiple secondary indexes ). For example, we also need to search by name, and the name field is not unique.

The table is defined by the following SQL statement: create table t (id int auto_increment, name varchar (30), primary key (id), key (name ));

In this case, a non-clustered and not unique index is generated. During the insert operation, data pages are stored in the sequence of primary key IDs. However, for non-clustered indexes, insertion of leaf nodes is no longer sequential. In this case, the non-clustered index page needs to be accessed discretely, and the insertion performance is reduced here. However, this is not an index error in this name field, because the characteristics of the B + tree determine the discretization of non-clustered index insertion.

The InnoDB Storage engine pioneered the design of insert buffering. For non-clustered index insertion or update operations, not every direct insert into the index page, instead, it first checks whether the inserted non-clustered index page is in the buffer pool. If it is in, insert it directly. If it is not in, put it into an insert buffer first. It seems that the non-clustered index of the database has been inserted into the leaf node, then, the insert buffer and non-clustered index page subnodes are merged at a certain frequency. In this case, multiple inserts can be merged into one operation (because they are on one index page ), this greatly improves the performance of insert and modify operations on non-clustered indexes.

The use of insert buffer must meet the following two conditions:

1. indexes are secondary indexes.

2. The index is not unique.

When the above two conditions are met, the InnoDB Storage engine uses the insert buffer to improve performance. However, when an application executes a large number of insert and update operations, these operations involve non-unique non-clustered indexes. If the database goes down during this process, at this time, a large amount of insert buffer is not merged into the actual non-clustered index. In this case, recovery may take a long time. In extreme cases, it may even take several hours to perform the merge recovery operation.

Secondary indexes cannot be unique because we do not search for index pages when inserting them into the insert buffer. If you look for it, there will certainly be a situation of discrete read, and the insertion buffer will be meaningless.

View the insert buffer information:

Show engine innodb status \ G

The seg size indicates that the size of the inserted buffer is 2*16 KB. the free list len indicates the length of the idle list, and the size indicates the number of merged record pages.

The following line may be what we really care about, because it shows that it improves performance. Inserts indicates the number of inserted records, merged recs indicates the number of merged pages, and merges indicates the number of merged pages.

Merged recs: merges is about, which indicates that the insert buffer will reduce the IO requests for non-clustered index pages by about three times.

Problem:

At present, there is a problem with the insert buffer. In case of intensive writing, the insert buffer occupies an excessive amount of buffer pool memory. By default, it can occupy a maximum of 1/2 of the buffer pool memory. Percona has released some patches to fix the problem that the inserted buffer occupies too much buffer pool memory. For details, refer to http://www.percona.com/percona-lab.html. Simply put, you can modify IBUF_POOL_SIZE_PER_MAX_SIZE to control the size of the inserted buffer. For example, if you change IBUF_POOL_SIZE_PER_MAX_SIZE to 3, you can only use a buffer pool memory of up to 1/3.

Two writes

If the insert buffer is performance-oriented to the InnoDB Storage engine, the two writes bring data reliability to the InnoDB Storage engine. When the database goes down, the database may write a page, but this page only writes a part (for example, 16 K pages, only the first 4 K pages, this is called partial page write ). Before using double write technology, the InnoDB Storage engine experienced data loss due to some invalid writes.

Some may think that if a write failure occurs, you can restore it by redoing the log. This is a solution. However, it must be clear that redo logs record physical page operations, such as offset 800 and write 'aaa' records. If the page itself is damaged, it makes no sense to redo it. This means that a copy of the page is required before the application retries the log. When the write failure occurs, the page is restored through the copy of the page before redoing, this is doublewrite.

The architecture of InnoDB Storage engine doublewrite is shown in Figure 2-4.

Doublewrite consists of two parts: one is the doublewrite buffer in the memory, which is 2 MB in size; the other is the 128 consecutive pages in the shared tablespace on the physical disk, that is, the two partitions (extent ), the size is also 2 MB (page copy ). When the dirty pages of the buffer pool are refreshed, the dirty pages are not directly written to the disk. Instead, the dirty pages are copied to the doublewrite buffer in the memory through the memcpy function, and then the doublewrite buffer is used twice, write 1 MB each time to the physical disk of the shared tablespace, and then immediately call the fsync function to synchronize the disk to avoid buffer write problems. In this process, because the doublewrite page is continuous, this process is sequential write, and the overhead is not very large. After writing the doublewrite page, write the page in the doublewrite buffer to each tablespace file. The writing is discrete.

Run the following command to check whether doublewrite is running: show global status like 'innodb _ dblwr % '\ G

Doublewrite writes a total of 445 pages, but the actual number of writes is 434, (42: 1) basically conforms to 64: 1.

If you find that your system is at a peak when Innodb_dblwr_pages_written: Innodb_dblwr_writes is much smaller than 64: 1, it indicates that your system is under low write pressure.

If the operating system crashes when writing pages to the disk, the InnoDB Storage engine can find a copy of the changed page from doublewrite in the shared tablespace during the restoration process, copy it to the tablespace file, and then apply the redo log. The following shows a scenario where doublewrite restores data:

090924 11：36：32 mysqld restarted090924 11：36：33 InnoDB：Database was not shut down normally！InnoDB：Starting crash recovery.InnoDB：Reading tablespace information from the.ibd files……InnoDB：Error：space id in fsp header 0，but in the page header 4294967295InnoDB：Error：tablespace id 4294967295 in file./test/t.ibd is not sensibleInnoDB：Error：tablespace id 0 in file./test/t2.ibd is not sensible090924 11：36：33 InnoDB：Operating system error number 40 in a file operation.InnoDB：Error number 40 means'Too many levels of symbolic links'.InnoDB：Some operating system error numbers are described atInnoDB：http：//dev.mysql.com/doc/refman/5.0/en/operating-system-error-codes.htmlInnoDB：File name./now/memberInnoDB：File operation call：'stat'.InnoDB：Error：os_file_readdir_next_file（）returned-1 inInnoDB：directory./nowInnoDB：Crash recovery may have failed for some.ibd files！InnoDB：Restoring possible half-written data pages from the doublewriteInnoDB：buffer……

The skip_innodb_doublewrite parameter prohibits the use of the write function twice. In this case, the previously mentioned write failure may occur. However, if you have multiple slave servers, you may need to provide faster performance (for example, RAID0 is used on slave. However, on the master server that requires high data reliability, we should ensure that the write function is enabled twice at any time.

Note: Some file systems provide some write failure prevention mechanisms, such as ZFS file systems. In this case, we should not enable doublewrite.

Adaptive hash Index

Hash is a fast query method. Generally, the time complexity of the query is O (1 ). It is often used for join operations, such as hash join in SQL Server and Oracle ). However, common databases such as SQL Server and Oracle do not support hash indexes ). The default index type of the MySQL Heap Storage engine is hash, while the InnoDB Storage engine proposes an adaptive hash index (adaptive hash index ).

The InnoDB Storage engine monitors the query of table indexes. If we observe that the speed of hash index creation can be improved, we will create a hash index, which is called adaptive. The adaptive hash index is constructed through the B + tree of the buffer pool, so it is quickly created. The InnoDB Storage engine automatically creates a hash index for certain pages based on the Access frequency and mode.

According to the official InnoDB documentation, after enabling adaptive hash indexes, the Read and Write speeds can be increased by two times. For secondary index connection operations, the performance can be increased by five times. Adaptive hash index is a good optimization mode. Its Design Concept is self-optimization (self-tuning), that is, it does not require DBA to adjust the database.

View the usage of the current adaptive hash index: show engine innodb status \ G

Now we can see the usage information of the adaptive hash index, including the size and usage of the adaptive hash index, and the use of the adaptive hash index search per second. It is worth noting that the hash index can only be used to search for equivalent queries, such as select * from table where index_col = 'xxx'. For other search types, such as range search, is not usable. Therefore, the case of non-hash searches/s appears. You can use the hash searches: non-hash searches command to get a rough idea of the efficiency after using hash indexes.

Because the adaptive hash index is controlled by the InnoDB Storage engine, the information here is for our reference only. However, you can disable or enable this feature by using the innodb_adaptive_hash_index parameter. This feature is enabled by default.

The key features of the above InnoDB-insert cache, write twice, and adaptive hash index details are all the content shared by Alibaba Cloud. I hope to give you a reference, we also hope that you can support the customer's home.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More