MySQL kernel InnoDB Storage engine (Volume 1) Notes

Last Update:2014-11-05 Source: Internet

Author: User

Tags type null

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL kernel InnoDB Storage engine (Volume 1) Notes
MySQL kernel InnoDB Storage engine (Volume 1) Directory

1 Overview

2. Basic data structures and algorithms

3. Synchronization Mechanism

4. Redo logs

5mini-transaction (mtr)

6. Storage Management

7 records

8 index pages

9 lock

10B + Tree Index

11 Insert Buffer

12 Buffer Pool

13. Transaction Processing

14. Data Dictionary

15 service management overview basic data structure and algorithm synchronization mechanism rw-lock/latchs-/x-: x-recursion, s-not ?; Obtained using spin. After a period of time, enter the wait array (semaphore ?) If 1000 cells in sync_primary_wait_array are allocated, ut_error triggers crash. When the thread holding latch releases latch, call sync_array_signal_object to wake up and wait for the thread to redo the log p42 redo log to ensure transaction persistence (D). The undo log is used for rollback and MVCCinnodb_flush_log_at_trx_commit = 0/1/2 redo log. bin log records pages.Physical logic operation logDesign Philosophy: Modify (old-new value) in the physical log record page, logical log record table operation (insert/delete) LSN (indicates the number of bytes that transactions write into redo log ?) Indicates the position of the disk to be refreshed for the 'checkpoint? -- In any case, the LSN has a 'monotonically changed with Time' checkpoint: refresh the pages in the buffer pool to the disk. The redo log size is fixed (3 GB) -> the size of the ib_logfile <N> redo log block (512B-12-8) is the same as that of the disk sector. It is atomic and does not require double write? Redo log Group * group submission: fsync-> log_flush_up_to will copy and restore the last log block: recovery_from_checkpoint_start the first page header of the tablespace. The FIL_PAGE_FILE_FLUSH_LSN records the 32 adjacent pages of the last refresh page when the database is closed? Mini-transaction (mtr) FIX rules: before modifying a page, you need to hold the latchWAL of the page. Do you need an LSN for each page? What if LSN overflow occurs? Force-Log-at-Commitmtr_t mtr; mtr_start (& mtr );... mtr_commit (& mtr); If mtr-> modified = TRUE at the time of submission, first modify page * 1 in the buffer pool, and then release log_sys-> mutex (this is a hotspot) * 1 log_reserve_and_write_fast/log_write_slow when multiple rows are updated in two fast/slow paths, MLOG_MULTI_REC_END Storage Management page: (space_id, offset) 16KB1 extent = 64 consecutive pagespace header segments (segment) each user table has at least two segments: leaf nodes and non-leaf node segments with clustered indexes (B + tree). A single segment can manage up to 32 independent pages, and the tablespace data structure in the primary partition is as follows: fil_system/space/node_structfour asynchronous I/O threads: asynchronous reading, asynchronous writing, cache insertion, and redo log record Heap no records recorded by p102 users are always pseudo records starting from 2: Infimum/Supremum (it feels like the first/last double-stranded table) p103 VARCHAR type NULL does not occupy disk space, while char null fills the large record with 0x00: BLOB/TEXT (overflow page, extern attribute) logic record dtuple_struct, for large record is big_rec_structB + tree index only locates the page, the records on the page need to be scanned by mtype or prtype in two ways (MVCC is only a column ?): Using the hidden transaction ID columnRead_view_struct: Low/up_limit_idtrx_ids, n_trx_idscreatorp114: The read_view_sees_trx_id function is used to determine whether the current transaction can read the current version of the record, isn't the physical order Page Directory (location of the record in the Page) slot? Offset primary key record Page Cursor * Lock p136 theoretically, the lower the isolation level, the less the transaction request lock or the shorter the lock holding time phantom read: predicate lock --> key-range locking --> next/previous-key lockingp138 intention lock: meaning that the transaction wants to lock InnoDB in finer granularity as a row-Level Lock, the lock_rec_struct = {space, page_no, n_bits} requests outside the full table scan are not blocked. All lock objects are protected by kernel_mutex (another hotspot !) Optimization: fine-grained splitting? P144 LOCK_GAP (indicating that the range lock does not include the endpoint) Explicit lock and implicit lock ** (Omitted) Maintenance of row lock * (emphasis, omitted) insert the merged auto-increment lock (atomic?) for the split page of the read page locked for updating PURGE consistency ?) Deadlock * B + tree index aggregation/secondary split operation: btr_page_split_and_insert merge: btr_compress lookup: the key value of the unique constraint on the slave. The PAGE_CUR_GE mode is used instead of the LElatch_modecursorDML mode: btr_cur_optimistic_insert non-primary key update (mainly because the column size will change) btr_cur_optimistic_update --> btr_cur_pessimistic_update (for example, omitted) primary Key Update and deletion persistent cursor btr_pcur_struct adaptive hash Index * Insert Buffer merges multiple inserts into one operation (improves the insertion performance of secondary indexes with non-unique constraints) p237 is most difficult to implement in the logic hierarchy of the deadlock processing page: Non-IB page, IB non-bitmap page, bitmap page p241 asynchronous I/O thread can Can cause deadlock --> rw_lock_x_lock_move_ownership buffer pool LRU, Free, and Flush linked list pre-read p258 random pre-read can trigger linear pre-read only if 9 of the 32 pages have been accessed and are active some write questions about refreshing the pre-read page (?) --> Double write (the tablespace exists in the memory, the size is 2 MB, which means a maximum of 128 pages/refresh) Transaction Processing classification: flat, flat with storage points, chain, nesting, and distributed transaction system segment * doublewrite segment * undo log storage consistency non-locked read p282 read snapshots do not need to lock undo log implementation: rollback segment + undo segment trx_undo_structundo record purge * rollback7B roll_ptr Hidden Column {rseg_id (1), page_no (4), offset (2)} Three rollback types: TRX_SIG _ {TOTAL_ROLLBACK, ROLLBACK_TO_SAVEPT, ERROR_OCCURRED} commit data dictionary Service Management

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More