MySQL's MyISAM, InnoDB engine uses B + Tree indexes by default (displayed as "BTREE" when queried), and this article discusses two issues:
- Why do major databases such as MySQL choose the index structure of B + trees?
- How to understand the common MySQL index optimization idea based on the index structure?
Why the index cannot be fully loaded into memory
The selection of the index structure is based on the fact that the index cannot be fully loaded into memory when large data volumes are available.
Why is the index not fully loaded into memory? If you use a tree structure to organize your index, simply estimate:
- Assuming a single index node 12b,1000w rows of data, a unique index, the leaf node accounts for about 100MB, and the entire tree is up to 200MB.
- Assuming a row of data takes up 200B, the total data is about 2G.
Assume that the index is stored in memory. That is, every physical disk to save 2G of data, will occupy 200MB of memory, 索引:数据的占用比
about 1/10. 1/10 of the occupancy ratio is not too big? The physical disk is much cheaper than memory, take a memory 16G HDD 1T Server For example, if you want to fill 1T hard disk, at least 100G of memory , far greater than 16G.
Consider that there may be multiple indexes on a table, federated indexes, smaller data rows, and so on, the actual occupancy ratio is usually greater than 1/10, sometimes up to 1/3. In an index-based storage schema, the 索引:数据的占用比
index cannot be loaded into memory because it is too high.
Problems with other structures
Because memory cannot be loaded, it is bound to depend on disk (or SSD) storage. The memory read and write speed is thousands of times the disk (related to the implementation), so the core issue is "How to reduce disk read and write times."
First, regardless of the page table mechanism, it is assumed that each read and write is directly penetrated to the disk, then:
- Linear structure: Read/write average o (n) times
- Binary search tree (BST): Read/write Average O (log2 (n)) times, or worst read/write O (n) times if the tree is unbalanced
- Self-balancing binary search tree (AVL): The self-balancing algorithm is added on the basis of BST, read/write Max O (log2 (n)) times
- Red-black Tree (RBT): Another self-balancing lookup tree, read/write Max O (log2 (n)) times
BST, AVL, and RBT are good at optimizing read and write times from O (n) to O (log2 (n)), where AVL and RBT are more self-balancing than BST, minimizing read and write times to O (log2 (n)).
Assuming that the primary key itself is ordered using the self-increment primary key, the number of reads and writes of the tree structure can be optimized to the tree height, the lower the tree height, the less the read and write times, and the self-balance to ensure the stability of the tree structure. If you want to optimize further, you can introduce B-and + + trees.
B-Tree solved what problem
Many articles mistakenly called B-trees, which may be a misunderstanding of their English name "B-tree" (even more, the B-tree is called a binary tree or binary search tree). Especially when speaking with a B + tree. It is assumed that there is a B (minus) tree in a B + (plus) tree, in fact the English name of the B + tree is "b+-tree".
If the maintenance operation is left open, the B-tree is like a "M-Fork Search Tree" (the maximum number of subtrees) and the time complexity is O (LOGM (n)). However, the B-tree has designed an efficient and simple maintenance operation that keeps the depth of the B-tree between about log (Ceil (M/2)) (n) ~LOGM (n) and greatly reduces the tree height.
Re-emphasize:
Don't dwell on time complexity, unlike simple algorithms, disk IO times are a bigger factor. Readers can deduce that the time complexity of B-tree and AVL is the same, but because B-tree has fewer layers and less disk IO, the performance of B-tree in practice is superior to that of AVL and other two-fork trees.
Similar to binary search trees, each node stores multiple keys and subtrees, and the subtree and key are sequentially arranged.
The directory of the page table is the extended external memory + accelerated disk read and write, a page (pages) usually 4K (equal to the size of the block block of disk data, see INODE and Block Analysis), the operating system each page to load the content from the disk into memory (to share the seek cost), modify the page, Re-elective write the page back to disk. Considering the good nature of the page table, the size of each node can be approximately equal to one page (making M very Large), one page of each load can completely overwrite one node to select the next layer of subtree; For a page table, AVL (or RBT) is equivalent to a B-Tree of 1 key+2 subtrees, since logically adjacent nodes are physically not adjacent, so reading a 4k page will void most of the space in the page.
Assuming that the key, subtree node pointers occupy 4B, the B-tree node is the largest, and the m * (4 + 4) = 8m B
page size is 4KB. Then m = 4 * 1024 / 8m = 512
, a 512-fork B-Tree, 1000w of data, the maximum depth log(512/2)(10^7) = 3.02 ~= 4
. Compared to the depth of the binary tree such as AVL log(2)(10^7) = 23.25 ~= 24
, 5 times times more than the difference. Shocked! b Tree Index Depth so!
In addition, the B-tree is very friendly to the principle of locality. If the key is small (such as the self-increment key of 4 B above), the cache can be further accelerated in addition to the page table. Flattered ~
B + Tree solves what's wrong with the rest of the tree
However, if you want to actually apply the index to the database, there are some problems with the B-tree:
- Data row not positioned
- Unable to process range query
Question 1
Data table records have multiple fields, it is not enough to simply navigate to the primary key, and you need to navigate to the data row. There are 3 solutions:
- The data row corresponding to the key (possibly multiple rows) is stored directly in the child node.
- Data rows are stored separately, and a field is added to the node to locate the key corresponding to the data row location.
- Modify the key and subtree of the judgment logic, so that the subtree is greater than or equal to the last key is less than the next key, eventually all access will fall on the leaf node, the leaf node directly stores the data row or data row location.
Scenario 1 Direct pass, the storage data row reduces the number of subtrees in the page, and m decreases the tree height increase.
A field is added to the node in scenario 2, assuming that the pointer is 4B, then the new m = 4 * 1024 / 12m = 341.33 ~= 341
, maximum depth log(341/2)(10^7) = 3.14 ~= 4
.
The node m of scenario 3 is the same as the depth, but the time complexity becomes stable o (LOGM (n)).
Scenario 3 can be considered.
Question 2
In real business, the frequency of range queries is very high, and B-trees can only be positioned at an index location (which may correspond to multiple rows), which makes it difficult to handle range queries. The smaller changes are 2 scenarios:
- Do not change, check the left boundary first, then the right boundary, and then DFS (or BFS) traverse the left and right boundaries between the nodes.
- On the basis of "problem 1-Scenario 3", since all the data rows are stored in the leaf node, the leaf node of the B-tree itself is also ordered, can be added a pointer to the current leaf node in the primary key order of the next leaf node, query the left boundary first, then the right boundary, and then from the left boundary to a bounded traversal
At first glance, the feeling scenario 1 is better than Scenario 2--the time complexity and constant term are the same, and scenario 1 does not need to be changed. But don't forget the principle of locality, regardless of whether the node stores data rows or data row locations, the benefit of scenario 2 is that the page table and cache can still be used to read the next node's information. However, scenario 1 faces the disadvantage of node logical neighbor and physical separation.
Draw a B + Tree
In summary, Scenario 2 of question 1 and solution 1 of problem 2 can be integrated into one scenario (based on B-Tree index), and scenario 1 of question 3 and solution 2 of question 2 can be consolidated into one (B + Tree-based index). In fact, some of the databases, file systems have adopted B-trees, some of which use a + + tree.
Because some monkeys do not understand why, the mainstream database, including MySQL, more than the choice of B + trees. That
Main changes as described above:
- Modify the organization logic of key and subtree, and drop the index access to the leaf node
- String the leaf nodes in order (convenient range query)
The process of increasing, deleting and checking B-tree and + + trees
B-tree additions and deletions can refer to the R-Tree from the B-tree, the B + tree, b* tree, the "6-tree, the insertion, deletion operation" subsection, B + tree additions and deletions of the same. We do not repeat here.
MySQL index optimization
Depending on the nature of the B + tree, it is easy to understand a variety of common MySQL index optimization ideas.
The difference between different engines is not considered.
Priority use of the self-increment key as the primary key
In the previous analysis, assuming that the 4B self-increment key as the index, the M can reach 512, the layer height is only 3. There are two benefits to using the self-increment key:
- The self-increment key is generally int, such as Integer, key is relatively compact, so that m can be very large, and the index occupies a small space. The most extreme example, if you use a 50B varchar (including length), then
m = 4 * 1024 / 54m = 75.85 ~= 76
the maximum depth log(76/2)(10^7) = 4.43 ~= 5
, plus the cache missing, the cost of string comparison, the time cost increases greatly. At the same time, the key from 4B to 50B, the entire index tree space occupancy growth is also extremely scary (if the level two index uses primary key to locate data rows, the space growth is more serious).
- The self-increasing nature makes the insertion request of the new data row inevitably fall to the far right of the index tree, the node splitting frequency is lower, ideally, the index tree can reach the "full" state. The index tree is full, on the one hand the layer height is lower, and the node merging occurs when the node is deleted.
Optimization Experience:
The monkey used the column of varchar (100) To do the primary key, storage Containerid, after 3, 4 days 100G database is full, dba little sister in the mail euphemism for my contempt ... After the addition of the self-increment column as the primary key, Containerid as a unique two-level index, time, space optimization effect is quite significant.
Leftmost prefix match
An index can be as simple as a column (a), or as complex as multiple columns (a, B, C, D), that is 联合索引
. In the case of a federated index, the key is also composed of multiple columns, and the index can only be used to find if the key exists (equal), the scope of the query (>, <, between, like left match) and so on can not be further matched, subsequent degradation to linear lookup. Therefore, the order in which the columns are arranged determines the number of columns that can hit the index.
If there is an index (a, B, C, D), the query criteria a = 1 and b = 2 and c > 3 and d = 4
will hit a, B, C, and not hit D at each node in turn. That is, the leftmost prefix matching principle.
=, in automatic optimization order
Without considering the order of =, in, and so on, MySQL automatically optimizes the order of these conditions to match as many indexed columns as possible.
If there are indexes (a, B, C, D), the query conditions c > 3 and b = 2 and a = 1 and d < 4
and the a = 1 and c > 3 and b = 2 and d < 4
order are all possible, MySQL automatically optimizes to a = 1 and b = 2 and c > 3 and d < 4
, in turn, hits A, B, C.
Indexed columns cannot participate in calculations
Query conditions that have indexed columns that participate in the calculation are unfriendly to the index (or even unable to use the index), such as from_unixtime(create_time) = ‘2014-05-29‘
.
The reason is simple, how to find the corresponding key in the node? If you have a linear scan, you need to recalculate it every time and the cost is too high; if you look for a binary, you need to determine the size relationship for the From_unixtime method.
Therefore, the index column cannot participate in the calculation. The above from_unixtime(create_time) = ‘2014-05-29‘
statement should be written create_time = unix_timestamp(‘2014-05-29‘)
.
Do not create a new index if you can expand
If there is an index (a), want to index (a, b), try to choose to modify the index (a) to index (a, B).
The cost of creating a new index is easy to understand. In the case of index (a) modification to index (a, b), MySQL can be modified to index (a, B) directly on the B + Tree of index A, after splitting, merging, and so on.
You do not need to establish an index with a prefix that contains relationships
If you already have an index (a, B), you do not need to index (a), but you still need to consider indexing (b) if necessary.
Select high-Sensitivity columns for indexing
It's easy to understand. For example, the index can only divide 1000w rows of data into two parts (such as 500w male, 500w female), and the index is almost invalid.
区分度
The formula is count(distinct <col>) / count(*)
that the scale of the field is not repeated, the larger the ratio the better the degree of differentiation. The uniqueness of the unique key is 1, while some states, gender fields may be more sensitive to the big data than the degree of 0.
This value is difficult to determine, generally need to join the field requirement is more than 0.1, that is, the average 1 scan 10 records.
MySQL B-tree index and index optimization