MySQL Innodb Storage Engine Learning Chapter

Source: Internet
Author: User
Tags flushes switches

Master thread has the highest priority in the county. It consists of several loops (loop), a main loop (loop), a background loop (background loop), a refresh loop (flush loop), and a pause loop (suspend loop). The master thread switches between Loop,background loop, flush loop, and suspend loop based on the state of the number-database run.

Operations once per second:
1. The log buffer is flushed to disk, even though the transaction has not yet been committed (always).
2. Combined insert buffer (possible)
3. Refresh the dirty pages in the buffer pool of up to 100 InnoDB to disk (possibly).
4. If there is currently no user activity, switch to background loop (possible).

Even if a transaction has not yet been committed, the InnoDB storage engine will still flush the contents of the redo log buffer to the Redo log file per second. This is something that must be known, and this can be a good explanation of why the big transaction commit time is also very fast.
The merge Insert buffer (insert buffer) does not occur every second. The InnoDB storage engine will determine if the number of IO occurrences in the current second is less than 5, and if it is less than 5, InnoDB believes that the current IO pressure is small and can perform a merge insert buffer operation.
Similarly, refreshing 100 dirty pages is not happening every second. The InnoDB storage engine determines whether the ratio of dirty pages in the current buffer pool (buf_get_modified_ratio_pct) exceeds the innodb_max_dirty_pages_pct parameter in the configuration file (the default is 90, which represents%90) If this threshold is exceeded, the InnoDB storage engine considers it necessary to do a disk synchronization operation and write 100 dirty pages to disk.



InnoDB Storage Engine operates every 10 seconds
1. Refresh 100 dirty pages to disk (possible)
2. Merge up to 5 insert buffers (always)
3. Flush log buffers to disk (always)
4. Delete useless undo page (always)
5. Refresh 100 or 10 dirty pages to disk (always)
In the above process, the InnoDB storage engine first determines if the disk's IO operation is less than 200 times in the last 10 seconds. If so, the InnoDB storage engine considers that there is currently enough disk IO capability. Therefore, 100 dirty pages are flushed to disk. The INNODB storage engine then merges the insert buffers. Unlike the merge insert buffering that can occur every 1 seconds, this time the merge insert buffer operation is always performed at this stage. The InnoDB storage engine then performs an operation that flushes the log buffers to disk, which is the same as what happens every second.
The InnoDB storage engine then performs a one-step full purge operation that removes the useless undo page. When you perform an operation such as update, delete on a table, the original row is marked for deletion, but the information for these versions needs to be preserved because of the consistent read (consistent read) relationship. However, during full perge, the InnoDB storage engine determines whether deleted rows in the current transaction system can be deleted, such as sometimes there may be a query operation that needs to read the previous version of the undo information, if possible, InnoDB will delete it immediately. From the source code, it can be found that the InnoDB storage engine deletes up to 20 undo pages at a time when operating full purge.
The InnoDB storage engine then determines the proportion of dirty pages in the buffer pool (buf_get_modified_ratio_pct), and if there are more than 70% dirty pages, simply refresh 10% of the dirty pages to disk.
Finally, the InnoDB storage engine generates a checkpoint (checkpoint), and the checkpoint for the InnoDB storage engine becomes a fuzzy checkpoint (the Blur checkpoint). The InnoDB storage engine does not write dirty pages from all buffer pools to disk when checkpoint, because this can affect performance and only writes pages of the oldest log sequence number (oldest LSN) to disk.

Then look at background loop, which switches to this loop if there is currently no user activity (when the database is idle) or when the database is closed. This loop performs the following actions:
1. Delete useless undo page (always)
2. Merge 20 insert buffers (always)
3. Jump back to the main loop (always)
4. Continuously refresh 100 pages until eligible (possible, jump to flush loop to complete)
If there is nothing to do in flush Llop, the INNODB storage engine switches to Suspend_loop, hangs the master thread, waits for the event to occur, If the InnoDB storage engine is enabled, the master thread is always in a pending state without using any tables of the InnoDB storage engine.
Starting with InnoDB plugin, show engine InnoDB status allows you to view the state information of the current master thread as follows: Here you can see that the main loop executes 6300568;

Mysql>show engine InnoDB status\g;*************************** 1. Row***************************type:innodb name:status:===================================== -- Geneva- -  -: the: -2ac95fdf3940 INNODB MONITOR OUTPUT=====================================Per second averages calculated fromThe last -seconds-----------------BACKGROUND THREAD-----------------Srv_master_thread loops:6300568Srv_active,0Srv_shutdown,5298516Srv_idlesrv_master_threadLogFlush andWrites:11599055

Daily average monthly with more and more advanced disk equipment in the market, IO Read and write more and more high up ... InnoDB storage engine if the previous method is installed: (1 seconds to process 100 pages of write and 20 insert buffer merges; too slow) so the InnoDB storage engine starts to provide a parameter ; used to represent the throughput of disk IO, the parameter is innodb_io_capacity, and the default value is 200. For the number of flushes to disk, the relative number of pages is refreshed according to the percentage of innodb_io_capacity. The rules are as follows:
1. When merging insert buffers, the number of merge insert buffers is innodb_io_capacity value 5%
2. When the dirty page is refreshed from the buffer, the number of dirty pages refreshed is innodb_io_capacity.

Another problem is the default value of the parameter innodb_max_dirty_pages_pct, before the mysql5.1 version (including 5.1), which is worth the default value of 90, which means that dirty pages account for 90% of the buffer pool. Currently the value is 75% by default;
Another parameter is innodb_adaptive_flushing (adaptive refresh), which affects the number of dirty pages refreshed every 1 seconds. The original refresh rule is: If the dirty page is less than innodb_max_dirty_pages_pct in the buffer pool, the dirty page is not refreshed, and the dirty page is refreshed when it is greater than innodb_max_dirty_pages_pct, while the innodb_ With the introduction of the Adaptive_flushting parameter, the INNODB storage engine uses a function called Buf_flush_get_desired_flush_rate to determine the most appropriate number of dirty pages to be refreshed. The buf_flush_get_desired_flush_rate is judged by the speed at which the redo logs are generated to determine the most appropriate number of dirty pages to refresh. Therefore, when the proportion of dirty pages is less than innodb_max_dirty_pages_pct, a certain amount of dirty pages will also be refreshed.



Key
Key features of the INNODB storage engine include insert buffering, two write (double wirte), Adaptive Hash Index (adaptive hash indexes).
Insert Buffer
Insert buffer is a key feature of the INNODB storage engine; Insert buffer is a part of a physical page as well as a data page;
Clustered indexes are generally sequential and do not require random reads of the disk,
In many cases, a table has multiple nonclustered secondary indexes (secondary index)
The InnoDB storage engine pioneered the insertion buffer, which is not inserted into the index page every time, for an INSERT or update operation on a nonclustered index. Instead, you first determine whether the inserted nonclustered index page is in the buffer pool. If it is, insert it directly; if not, put it in an insert buffer, as if the nonclustered index of the spoofed database has been traced to the leaf node. The merge operation of the insert buffer and nonclustered index page child nodes is then performed at a certain frequency, which can often be combined into one operation ( Because in an index page, this greatly improves the performance of insert and modify operations on nonclustered indexes.
The use of Insert buffers requires the following two conditions:
1, index is secondary index.
2, the index is not unique.
The secondary index cannot be unique, because when inserting it into the insert buffer, we do not look for the index page. If the lookup is bound to have discrete reads, inserting a buffer loses its meaning.
Ibuf:size 1, free list len 0, seg size 2, merges
SEG size shows the size of the current insert buffer is 2*16KB
Free list len 0
The free list Len represents the length of the idle lists.
One problem with the current insert buffer is that, in a write-intensive scenario, the insert buffer consumes too much buffer pool memory, which by default can take up to 1/2 of the buffer pool memory.
Modify the Ibuf_pool_size_per_max_size to control the size of the insert buffer, GCA; change the ibuf_pool_size_per_max_size to 3, the maximum use of 1/3 buffer pool memory.

Write two times
If the insert buffer to the INNODB storage engine is performance, then two write to the InnoDB storage engine of the reliability of the data, when the database is at a time when the database is writing a page, and this page can only write a portion (such as 16K page, write only the first 4K page) case, We call this partial write invalidation (partial page write);
Note: The redo log is recorded for physical operations, such as offset 800, which writes ' AAA ' records. If the page itself is corrupted, it makes no sense to redo it. That is, before applying the redo log, we need a copy of the page, when the write failure occurs, first restore the page through the page's copies, and then redo, this is the DOUBLEWRITE.INNODB storage engine doublewerite;
Doublewrite consists of two parts: a doublewrite buffer in memory, a size of 2MB, and a 128 consecutive pages in a shared tablespace on a physical disk, two extents (extent), the same size as 2MB, and when the dirty pages of the buffer pool are refreshed. Instead of writing the disk directly, the dirty page is copied to the in-memory Doublewrite buffer via the memcpy function, then two times through Doublewrite, each time the 1MB is written to the physical disk of the shared tablespace, and then the Fsync function is immediately called, the disk is synchronized, Avoid the problem of buffer writing. In this process, because the Doublewrite page is sequential, the process is written sequentially. The overhead is not very large. When you finish writing the Doublewrite page, the pages in Doublewrite buffer are written to each tablespace file, and the writes are discrete. Double write health can be observed by the following command;

Mysql>Show Global Status like 'innodb_dblwr%'\g;*************************** 1. Row***************************Variable_name:innodb_dblwr_pages_written Value:10005304*************************** 2. Row***************************variable_name:innodb_dblwr_writes Value:32723912Rowsinch Set(0.01Sec

As you can see, Doublewrite wrote a total of 10,005,304 pages, but the actual number of writes is 3272391, if you find your system at peak time Innodb_dblwr_pages_written:innodb_dblwr_ Writes is far less than 64:1, then your system write pressure is not very high.
If the operating system crashes during the writing of the page to disk, during the recovery process, the InnoDB storage engine can find a copy of the page change from the doublewrite in the shared table, copy it to a tablespace file, and then apply the redo log.
Parameter Skip_innodb_doublewrite can prohibit the use of two write function, this may occur the previously mentioned write failure problem.
Note: Some file systems themselves provide a mechanism for preventing partial write failures, such as ZFS file systems. In this case, let's not enable doublewrite.
Adaptive Hash Index
Hashing (hash) is a very fast way to find, in general, the time to find the complexity of O (1). Commonly used for join operations, such as Hash joins in SQL Server and Oracle (hash join). However, common databases such as SQL Server and Oracle do not support hash indexes. MySQL's heap storage engine has a default index type of hash, while the InnoDB storage engine presents another implementation method, Adaptive hash Index (Adaptive Hash Index).
The InnoDB storage engine monitors the lookup of indexes on a table, and if it is observed that a hash index can lead to an increase in speed, then a hash index is established, so called Adaptive (adaptive). The adaptive Hash index is constructed from the b+tree of the buffer pool and is therefore fast to build.
Hash table Size 9461399, node heap has 2933 buffer (s)
7767.23 Hash searches/s, 130.87 non-hash searches/s
The main output information here is: the size of the adaptive hash Index, usage, and the use of Adaptive Hash index search per second. It is worth noting that the hash index can only be used to search for equivalent queries, such as SELECT * FROM table where index_col = ' xxx ' and for other types of lookups, such as range lookups, are not available. So here comes the case of Non-hash searches/s. The hash searches:non-hash Searches command gives you a general idea of the efficiency of using a hashed index.
We can disable or start this feature by using the parameter Innodb_adaptive_hash_index, which is on by default.

MySQL Innodb Storage Engine Learning Chapter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.