InnoDb Architecture and features (InnoDb storage engine reading notes)

Source: Internet
Author: User

Background Thread

    • Master Thread

A core background thread that is primarily responsible for asynchronously flushing data from the buffer pool to disk. For example, a dirty page refresh, insert buffer merge, Undo page Recycling, and so on.

Operations once per second:

    1. The log buffer is flushed to disk, even if the transaction has not yet been committed. This is always the case, and this is the time to commit to a larger transaction, which is very short.
    2. When the IO pressure is very small (the number of Io occurrences within 1s is less than 5% innodb_io_capacity), the insertion buffer of the 5% innodb_io_capacity is merged.
    3. Refreshes the dirty pages in the innodb_io_capacity buffer pool to disk when the dirty page ratio is greater than innodb_max_dirty_pages_cnt. Otherwise, if Innodb_adaptive_flush is turned on, the refresh is based on buf_flush_get_desired_flush_rate to select the appropriate number of dirty pages to refresh.

Operations once every 10 seconds:

    1. If the IO operation in the past 10S is less than innodb_io_capacity, flush the dirty pages in the innodb_io_capacity buffer pool to disk.
    2. Merge 5% innodb_io_capacity Insert buffer.
    3. Flushes the log buffer to disk.
    4. Delete the useless undo page.
    5. If the percentage of dirty pages in the buffer pool exceeds 70%, refresh innodb_io_capacity dirty pages to disk again. Otherwise, refresh 10% innodb_io_capacity dirty pages.

Background loop (when the database is idle or the database shuts down):

    1. Delete the useless undo page.
    2. Merge innodb_io_capacity Insert buffers.

Flush Loop (Database idle):

    1. Refresh innodb_io_capacity dirty pages
    • IO Thread

The InnoDB storage engine uses AIO heavily, and IO thread is primarily responsible for callbacks for IO requests. You can use the Innodb_read_io_threads and innodb_write_io_threads parameter list adjustments.

    • Purge Thread

After the transaction has been committed. The transaction-related undolog may no longer be required. The Purge thread is used to reclaim unwanted undo pages.

    • Pagecleaner Thread

The refresh operation that is responsible for dirty pages. Mitigates the work of the master thread and the blocking of user query threads.

Memory Buffer Pool

For modifications to the pages in the database, first modify the pages in the buffer pool, and then flush them to disk at a certain frequency. This means that instead of triggering a flush back to the disk each time the page in the buffer pool is modified, the disk is flushed back through checkpoint technology. The size configuration of the buffer pool can be set by Innodb_buffer_pool_size.

The data page types of the buffer pool are: Data page, index page, undo page, insert buffer, Adaptive Hash index, INNODB store lock information, dictionary information.

The InnoDB storage engine now allows multiple buffer pool instances. This reduces the competition for locks by hashing to different buffer pool instances. This parameter can be passed innodb_buffer_pool_instance.

The buffer pool is a large memory area, and the database is managed by the LRU algorithm. But because of the operation of the full table scan is taken into account. Therefore, the naïve LRU algorithm is not used. The midpoint location that is added to the LRU list. The newly read page is not placed directly to the header of the LRU list, but rather to the midpoint location. By default, at 5/8 of the length of the LRU list. Controlled by the parameter innodb_old_blocks_pct.

Insert Buffer

For insert and update operations on nonclustered indexes, the InnoDB storage engine is not inserted directly into the index page, but instead inserts Buffer. Then the Insertbuffer and secondary index of the leaf node merge at a certain frequency. It is common to merge multiple random inserts into one operation. Significantly improves the performance of nonclustered index insertions.

InnoDB Use Insertbuffer conditions:

    • Index non-clustered index
    • The index is not unique (if unique, you need to find the index to ensure unique)

Insert Buffer Internal Implementation

The data structure of Insert Buffer is a B + tree. After Mysql 4.1, there is only a B + tree globally, which is responsible for insert Buffer for all tables ' secondary indexes. Also, the tree is stored in a shared tablespace, which is ibdata1 by default. Therefore, failure can result if the data in the table is recovered only through a standalone tablespace ibd file. You also need to restore the secondary index on the table by using the insert buffer data.

The non-leaf node of the Insert Buffer holds the query key, constructed such as space (4 bytes) + marker (1 bytes) + offset (4 bytes). Space indicates the table space of the table in which the record is located id,offset represents the offset of the page. Marker is used for compatibility with older versions.

Insert Buffer leaves several constructs such as space + marker + offset + metadata + records. Space, marker, offset and the aforementioned meanings are the same. The Ibuf_rec_offset_count in metadata holds a two-byte integer that is used to sort records into the Insert Buffer order. Check back in this order to get the correct value for the record. The individual records that were actually inserted begin with the 5th column of the Insert Buffer leaf node.

When the Insert buffer index is enabled, records for secondary index pages may be inserted into the insert buffer B + tree. To ensure that each merge insert buffer succeeds, there must be a place to mark the available space for each secondary index page. Insert buffer is marked with a special page that is of type insert buffer Bitmap. Each Insert Buffer bitmap page is used to track 16,384 pages, which is 256 extents. Each Insert Buffer Bitmap page is on a second page of 16,384 pages. Each secondary index page occupies 4 bytes in the bit map, which is used primarily to represent the available number of secondary index pages.

Merge Insert Buffer

Records in Insert Buffer are merged into a true secondary index in the following cases:

    • The secondary index page is read into the buffer pool;
    • Insert Buffer Bitmap page traces to the secondary index page when no space is available;
    • When Master thread is dispatched;

This way, multiple record operations on a secondary index page are merged into the original secondary index page with one operation, improving performance.

Write two times (Double write)

The Insertbuffer provides a performance boost to the INNODB storage engine, while two writes to the INNODB storage engine is the reliability of the data pages.

There may be doubt that if a write failure occurs, can it not be recovered by redo log? This is really a solution, but it must be known that redo logging is the physical operation of the page, such as an offset of 800, which writes ' AAA ' records. However, if the page is corrupted, it makes no sense to redo it. This means that a copy of the correct page must exist before the page is modified, and when the write invalidation occurs, the page is restored by a copy of the page and then re-made, which is double write.

Double write consists of two parts, a double write buffer in memory. The other part is the 128 consecutive pages in the shared table space on the physical disk, in the same size as in memory (2M). When a page in the buffer pool is refreshed, the disk is not written directly, but memcpy to double write buffer. After that, two times by double write buffer, each time you write the shared tablespace 1M data sequentially, and then call the Fsync synchronization disk immediately. This write is because the double write page of the shared tablespace is contiguous, so the overhead is not very large. When you finish writing a double write page, writing the page in double write buffer to each table space is a discrete write.

If the operating system crashes during the writing of the page to disk. Then you can find a copy of the page from a double write buffer page in the shared table space when you resume. Copy it to a table space and then apply the redo log.

Adaptive Hash Index

The INNODB storage engine monitors queries on the index pages on the table, and if you observe a hash index, it can lead to a speed increase. The hash index is established, which is called Adaptive Hash Index (AHI).

AHI has a requirement that the continuous access pattern for this page must be the same. For example (a, b) such a federated index, the Kai access mode can make:

WHERE a = xxx

WHERE a = xxx and B = yyy

Access mode is the same as the condition of the query. If the above query operation is performed alternately. The AHI is not established.

In addition, AHI requires access to the same mode 100 times, the page accesses n times through this mode, where n = page record * 1/16

Refresh adjacent pages

When a dirty page is refreshed, the InnoDB storage engine will refresh together by detecting all pages in the same area of the page and, if it is a dirty page.

Asynchronous IO

The INNODB uses asynchronous IO to handle disk operations.

Check Point Technology

In order to avoid the problem of data loss, the object database system generally adopts the write ahead log policy. That is, when the thing commits, write the redo log first, then modify the page.

However, redo logs cannot grow infinitely, and the buffer value (dirty pages not flushed to disk) cannot be infinitely large. Even if it can be infinitely large, the recovery time will be very long after the database goes down. So you need a Check point technique that solves:

    • Shorten the time of database recovery;
    • When the buffer pool is insufficient, the dirty pages can be flushed to disk;
    • Redo logs are not available (redo logs are recycled) and dirty pages are flushed to disk;

There is no need to redo all the logs when the database is restarted after the outage. Because the page before the check point has been flushed to disk, the database simply recovers from the redo log after check point. This significantly reduces recovery time.

For InnoDB, the version is actually compared by the LSN (Log Sequence number). The LSN is a 8-byte number. Each page has an LSN, the redo log has an LSN, and the CheckPoint also has an LSN. The following command can be used to observe

Mysql> show engine InnoDB status\g;

.............

Log Sequence Number 92561351052

Log flushed up to 92561351052

Last checkpoint at 92561351052

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.