Efficiently traversing InnoDB B+Trees with the page directory--slot,traversinginnodb
Efficientlytraversing InnoDB B+Trees with the page directory1、the purpose of the page directory
As described in the posts mentioned above,all records in INDEX pages are linked together in a singly-linked list inascending order. However, list traversal through a page with potentiallyseveral hundred records in it is very expensive: every record’s key must becompared, and this needs to be done at each level of the B+Tree until therecord sought is found on a leaf page.
Index page頁中的所有記錄都以單鏈表遞增的形式串聯。但是在一頁中以鏈表的形式檢索記錄代價很大:每一個記錄的key必須比較,這個動作需要在所有高度的B+樹上進行,知道在葉子節點找到記錄。
The page directory greatly optimizes thissearch by providing a fixed-width data structure with direct pointers to 1 ofevery 4-8 records, in order. Thus, it can be used for a traditional binarysearch of the records in each page, starting at the mid-point of the directoryand progressively pruning the directory by half until only a single entryremains, and then linear-scanning from there. Since the directory iseffectively an array, it can be traversed in either ascending or descendingorder, despite the records being linked in only ascending order.
Page directory通過提供一個固定大小的資料結構(這個結構指向4-8個記錄中的一個)最佳化查詢。因此能夠在每個頁中使用二叉尋找的方法。根據slot折半尋找,知道只剩下一個條目,然後從這個條目開水線性掃描。由於directory是一個高效的數組,可以以遞增或者遞減的順序進行掃描,即使記錄只是以遞增的順序連結。
2、The physical structure of the pagedirectory
The structure is actually very simple. Thenumber of slots (the page directory length) is specified in the first field ofthe INDEX header of the page. The page directory always contains an entry forthe infimum and supremum system records (so the minimum size is 2 entries), andmay contain 0 or more additional entries, one for each 4-8 system records. Arecord is said to “own” another record if it represents it in the pagedirectory. Each entry in the page directory “owns” the records between theprevious entry in the directory, up to and including itself. The count ofrecords “owned” by each record is stored in the record header that precedeseach record.
Slots的個數在該頁的index header部分的第一域指定。Page directory至少包含infimum和supremum的slot。因此directory最少有2個slot。一個記錄如果own其他記錄,表示在這個slot裡。每個slot管理本身和上一個slot中的記錄之間的記錄。記錄owned的個數存在每個記錄的record header部分。
The page-directory-summary mode of innodb_spacecan be used to view the page directory contents, in this case for a completelyempty table (with the same schema as the 1 million row table used in A quickintroduction to innodb_ruby), showing the minimum possible page directory:
$ innodb_space -f t_page_directory.ibd -p 3page-directory-summary
slot offset type owned key
0 99 infimum 1
1 112 supremum 1
If we insert a single record, we can seethat it gets owned by the record with a greater key than itself that has anentry in the page directory. In this case, supremum will own the record (aspreviously discussed, supremum represents a record higher than any possible keyin the page):
$ innodb_space -f t_page_directory.ibd -p 3page-directory-summary
slot offset type owned key
0 99 infimum 1
1 112 supremum 2
The infimum record always owns only itself,since no record can have a lower key. The supremum record always owns itself,but has no minimum record ownership. Each additional entry in the pagedirectory should own a minimum of 4 records (itself plus 3 others) and amaximum of 8 records (itself plus 7 others).
Infimum記錄總是只own自己,因為是最小記錄。Supremum記錄總是own自己。除了infimum和supremum的slot,每個slot都會至少管理4個記錄(itself+3others),最多管理8個。
To illustrate, each record with an entry inthe page directory (bolded) owns the records immediately prior to it in thesingly-linked list (K = Key, O = Number of Records Owned):
3、Growth of the page directory
Once any page directory slot would exceed 8records owned, the page directory is rebalanced to distribute the records into4-record groups. If we insert 6 additional records into the table, supremumwill now own a total of 8 records:
一旦一個slot管理的記錄超過8個,slot就會將之分成4個記錄為一組。如果我們插入6個記錄,supremum slot會擁有8個記錄。
$ innodb_space -f t_page_directory.ibd -p 3page-directory-summary
slot offset type owned key
0 99 infimum 1
1 112 supremum 8
The next insert will cause are-organization:
在插入一個記錄會引起重組
$ innodb_space -f t_page_directory.ibd -p 3page-directory-summary
slot offset type owned key
0 99 infimum 1
1 191 conventional 4
2 112 supremum 5
4、A logical view of the page directory
At a logical level, the page directory (andrecords) for a page with 24 records (with keys from 0 to 23) would look likethis:
Infimum總是只own自己,該slot的n_owned=1
Supremum總是owns一個頁中最後幾個記錄,個數可以小於4.
其他slot至少有4個記錄最多8個。
逆序排放。從16376個位元組開始,即FIL trailer的開始位置。
Take note that:
Records are singly linked from infimum tosupremum through all 24 user records, as previously discussed.
Approximately each 4th record is enteredinto the page directory, represented in the illustration both by bolding thatrecord and by noting its offset in the page directory array represented at thetop of the illustration.
The page directory is stored “backwards” inthe page, so is reversed in this illustration compared to its ordering on disk.
記錄是單鏈表形式連結
http://blog.jcole.us/2013/01/