MySQL innoDB Storage engine Learning

Source: Internet
Author: User

MySQL innoDB Storage engine Learning

Master thread has the highest priority in the county. it consists of several loops: Main loop, background loop, flush loop, and suspend loop ). The master thread switches between loop, background loop, flush loop, and suspend loop Based on the running status of the data database.

Operations per second:
1. The log buffer is refreshed to the disk even if the transaction has not been committed (always ).
2. Merge insert buffer (possible)
3. Refresh dirty pages in the buffer pool of up to 100 InnoDB to the disk (possibly ).
4. If no user activity exists, switch to the background loop (possibly ).
Even if a transaction has not been committed, the innodb Storage engine will refresh the content in the redo log buffer to the redo log file every second. this point must be known, which can be a good explanation of why a large transaction commit takes a very short time.

Insert buffer does not occur every second. The Innodb Storage engine determines whether the number of I/O operations in the current second is less than 5. If the number is less than 5, innodb considers that the current I/O pressure is low and can merge and insert buffer operations.

Similarly, refreshing 100 dirty pages does not happen every second. The innodb Storage engine determines whether the ratio of dirty pages in the Current Buffer Pool (buf_get_modified_ratio_pct) exceeds the innodb_max_dirty_pages_pct parameter in the configuration file (90 by default, % 90, the Innodb Storage engine considers that Disk Synchronization is required to write 100 dirty pages to the disk.

Innodb Storage engine operations every 10 seconds
1. Refresh 100 dirty pages to the disk (possible)
2. Merge up to five insert buffers (always)
3. Refresh the log buffer to the disk (always)
4. Delete useless Undo pages (always)
5. Refresh 100 or 10 dirty pages to the disk (always)

In the above process, the innodb Storage engine first checks whether the disk IO operations in the past 10 seconds are less than 200 times. If yes, the innodb Storage engine considers that there is sufficient disk I/O operation capability, so it refreshes the 100 dirty pages to the disk. Then, the Innodb Storage engine merges the insert buffer. Unlike the merge insert buffer operation that may occur every one second, this merge insert buffer operation will always be performed at this stage. The Innodb Storage engine then performs a refresh operation to the disk for the log buffer, which is the same as the operation performed per second.
The InnoDB Storage engine then performs a full purge operation to delete useless Undo pages. When performing operations such as update and delete on a table, the original row is marked as deleted. However, the information of these versions must be retained due to the consistency read relationship. However, during the full perge process, the Innodb Storage engine determines whether the row deleted in the current transaction system can be deleted. For example, there may be query operations that need to read the undo information of the previous version. If yes, innodb will immediately delete it. from the source code, we can find that the innodb Storage engine can delete up to 20 undo pages each time when operating full purge.
The Innodb Storage engine then determines the ratio of dirty pages in the buffer pool (buf_get_modified_ratio_pct). If there are more than 70% dirty pages, you only need to refresh 10% dirty pages to the disk.
Finally, the Innodb Storage engine generates a checkpoint, and the checkpoint of the innodb Storage engine becomes a fuzzy checkpoint ). The Innodb Storage engine does not write all dirty pages in the buffer pool to the disk at checkpoint, because this may affect the performance, but only the oldest log serial number (oldest LSN) pages are written to the disk.

Next, let's look at the background loop. If there is no user activity (when the database is idle) or the database is closed, it will switch to this loop. This loop performs the following operations:
1. Delete useless Undo pages (always)
2. Merge 20 insert buffers (always)
3. jump back to the main loop (always)
4. Refresh 100 pages until they meet the conditions (possibly, jump to flush loop)
If nothing can be done in flush lop, the innodb Storage engine will switch to suspend_loop and suspend the master thread to wait for the event to occur. If Innodb Storage engine is enabled. the master thread is always suspended if no Innodb Storage engine is used.
Starting from Innodb plugin, you can use show engine innodb status to view the status information of the current master thread, as shown in the following figure: the main cycle is executed 6300568;


Mysql> show engine innodb status \ G;
* *************************** 1. row ***************************
Type: InnoDB
2015-02-13 13:42:44 2ac95fdf3940 INNODB MONITOR OUTPUT
Per second averages calculated from the last 14 seconds
Srv_master_thread loops: 6300568 srv_active, 0 srv_shutdown, 5298516 srv_idle
Srv_master_thread log flush and writes: 11599055


The average daily, monthly, and monthly data volume increases as disk devices become more advanced on the market, and I/O reads and writes become more and more advanced... if the innodb Storage engine was previously installed in the following way: (processing 100 pages of write and 20 insert buffer mergers within 1 second; not too slow), the Innodb Storage engine starts to provide a parameter; used to indicate the disk I/O throughput. The parameter is Innodb_io_capacity and the default value is 200. the Innodb_io_capacity percentage is used to refresh the relative number of pages. The rules are as follows:
1. When merging insert buffering, the number of merged insert buffering is Innodb_io_capacity value 5%
2. When refreshing dirty pages from the buffer zone, the number of dirty pages refreshed is innodb_io_capacity.
Another problem is the default value of the innodb_max_dirty_pages_pct parameter. Before MySQL 5.1 (including 90%), the default value is 90, indicating that dirty Pages account for of the buffer pool. The default value is 75%;
Another parameter is innodb_adaptive_flushing (auto-Refresh), which affects the number of dirty pages refreshed every 1 second. The original refresh rule is: if the proportion of dirty pages in the buffer pool is smaller than the limit, the dirty pages are not refreshed; if the proportion is greater than innodb_max_dirty_pages_pct, 100 dirty pages are refreshed, while the innodb_adaptive_flushting, the innodb Storage engine uses a function named buf_flush_get_desired_flush_rate to determine the maximum number of dirty pages to be refreshed. Buf_flush_get_desired_flush_rate determines the most suitable number of dirty pages to be refreshed by determining the speed at which redo logs are generated. Therefore, when the proportion of dirty pages is less than that of Innodb_max_dirty_pages_pct, a certain amount of dirty pages will also be refreshed.


The key features of the Innodb Storage engine include insert buffer, double wirte, and adaptive hash index ).
Insert Buffer
Insert Buffer is a key feature of the Innodb Storage engine. Like Data Pages, Insert Buffer is also an integral part of physical pages;
Clustered indexes are generally sequential and do not require random disk reads,
In many cases, a table has multiple secondary indexes that are not clustered)
The InnoDB Storage engine pioneered the design of insert buffering. For non-clustered index insertion or update operations, not every direct insert into the index page. Instead, it first checks whether the inserted non-clustered index page is in the buffer pool. If the data is inserted, insert the data directly. If the data is not inserted, First insert the data into the buffer. It seems that the non-clustered index of the database has been found to be a leaf node. then, the insert buffer and non-clustered index page subnodes are merged at a certain frequency. In this case, multiple inserts can be merged into one operation (because they are on one index page ), this greatly improves the performance of insert and modify operations on non-clustered indexes.
The use of insert buffer must meet the following two conditions:
1. indexes are secondary indexes.
2. The index is not unique.
Secondary indexes cannot be unique because we do not search for index pages when inserting them into the insert buffer. If it is found that there will certainly be discrete reads, the insertion buffer will lose its meaning.
Ibuf: size 1, free list len 0, seg size 2, 38 merges
The seg size indicates that the current buffer size is 2*16 kb.
Free list len 0
Free list len represents the length of the idle list.
At present, there is a problem with the insert buffer. In case of intensive writing, the insert buffer occupies an excessive amount of buffer pool memory. By default, it can occupy a maximum of 1/2 of the buffer pool memory.
Modify IBUF_POOL_SIZE_PER_MAX_SIZE to control the size of the inserted buffer. For example, if you change IBUF_POOL_SIZE_PER_MAX_SIZE to 3, you can only use a buffer pool memory of up to 1/3.

Two writes
If the performance of the buffer inserted to the Innodb Storage engine is good, the data reliability brought by two writes to the Innodb Storage engine may occur when the database crashes, however, this page can only write a part (for example, 16 K pages, only the first 4 K pages). We call it part write failure (partial page write );
Note: redo logs record physical operations, such as offset 800 and write 'aaa' records. If the page itself is damaged, it makes no sense to redo it. This means that a copy of the page is required before the application retries the log. When the write failure occurs, the page is restored through the copy of the page before redoing, this is doublewrite. innodb Storage engine doublewerite;
Doublewrite consists of two parts: one is the doublewrite buffer in the memory, which is 2 MB in size; the other is the 128 consecutive pages in the shared tablespace on the physical disk; that is, the two partitions (extent ), the size is also 2 MB. When the dirty pages of the buffer pool are refreshed, the dirty pages are not directly written to the disk. Instead, the dirty pages are copied to the doublewrite buffer in the memory through the memcpy function, then, use doublewrite twice to write 1 MB to the physical disk of the shared tablespace, and then immediately call the fsync function to synchronize the disk to avoid buffer write problems. In this process, the doublewrite page is continuous, so this process is sequential write. The overhead is not very large. After writing the doublewrite page, write the page in the doublewrite buffer to each tablespace file. The writing is discrete. Run the following command to check the running status of double write;


Mysql> show global status like 'innodb _ dblwr % '\ G;
* *************************** 1. row ***************************
Variable_name: Innodb_dblwr_pages_written
Value: 10005304
* *************************** 2. row ***************************
Variable_name: Innodb_dblwr_writes
Value: 3272391
2 rows in set (0.01 sec)


As you can see, doublewrite writes a total of 10005304 pages, but the actual number of writes is 3272391. If you find that your system is at a high peak, Innodb_dblwr_pages_written: Innodb_dblwr_writes is far less than 64: 1, it indicates that the writing pressure on your system is not high.
If the operating system crashes when writing pages to the disk, the Innodb Storage engine can find a copy of the modified page from doublewrite in the shared table during the recovery process, copy it to the tablespace file, and then apply the redo log.
The skip_innodb_doublewrite parameter prohibits the use of the write function twice. In this case, the previously mentioned write failure may occur.
Note: Some file systems provide some write failure prevention mechanisms, such as ZFS file systems. In this case, we should not enable doublewrite.
Adaptive hash Index
Hash is a fast search method. Generally, the search time complexity is o (1 ). It is often used for join operations, such as hash join in SQL Server and Oracle ). However, common databases such as SQL Server and Oracle do not support hash indexes ). The default index type of the MySQL Heap Storage engine is hash, while the Innodb Storage engine proposes another implementation method, adaptive hash index (adaptive hash index ).
The Innodb Storage engine monitors the query of indexes on tables. If we observe that the creation of hash indexes can increase the speed, we will create a hash index, which is called adaptive. The adaptive hash index is constructed by B + tree in the buffer pool, so it is quickly created.
Hash table size 9461399, node heap has 2933 buffer (s)
7767.23 hash searches/s, 130.87 non-hash searches/s
The output information includes the size, usage, and use of the adaptive hash index per second. It is worth noting that the hash index can only be used to search for equivalent queries, such as select * from table where index_col = 'xxx' and other search types, such as range search, is not usable. Therefore, the non-hash searches/s error occurs. You can use the hash searches: non-hash searches command to get a rough idea of the efficiency after using hash indexes.
You can disable or enable this feature by using the innodb_adaptive_hash_index parameter. This feature is enabled by default.

MySQL InnoDB Storage engine lock mechanism Experiment

Startup, shutdown, and restoration of the InnoDB Storage Engine

MySQL InnoDB independent tablespace Configuration

Architecture of MySQL Server layer and InnoDB Engine Layer

InnoDB deadlock Case Analysis

MySQL Innodb independent tablespace Configuration

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.