MySQL B-tree index and INDEX OPTIMIZATION, and mysqlb-Tree Index
MySQL MyISAM and InnoDB engines use B + tree indexes by default ("BTREE" is displayed during queries). This article discusses two issues:
- Why is the index structure of B + tree selected for mainstream databases such as MySQL?
- How to understand the common MySQL index optimization ideas based on the index structure?
Why cannot indexes be fully loaded into memory?
The selection of the index structure is based on the following nature: when the data volume is large, indexes cannot be fully loaded into the memory.
Why cannot indexes be fully loaded into the memory? Assume that the tree structure is used to organize the index. A simple estimation is as follows:
- Assume that a single index node contains 100 million data rows and a unique index, the leaf node occupies about 200 MB, and the entire tree can contain a maximum of MB.
- Assume that a row of Data occupies B, the total data occupies about 2 GB.
Assume that the index is stored in the memory. That is to say, every time 2 GB of data is stored on a physical disk, it will occupy MB of memory,Index: Data Usage
About 1/10. Is the proportion occupied by 1/10 large? Physical disks are much cheaper than memory. Take a server with a 16 GB hard disk and 1 TB of memory as an example,If you want to store a hard disk full of 1 TB, at least GB of memory is required., Larger than 16 GB.
Considering that a table may have multiple indexes, joint indexes, and less data rows, the actual usage ratio is usually greater than 1/10, and in some cases it can reach 1/3. In the index-based storage architecture,Index: Data Usage
Therefore, indexes cannot be fully loaded into memory.
Other Structure Problems
Because the memory cannot be loaded, it is bound to rely on disk (or SSD) storage. The memory read/write speed is tens of thousands of times that of the disk (related to specific implementations). Therefore, the core issue is "How to Reduce disk read/write times ".
First, we do not consider the page table mechanism. Suppose that each read or write directly goes through the disk, then:
- Linear Structure: average read/write O (n) times
- Binary Search Tree (BST): average read/write O (log2 (n) times; if the fruit tree is unbalanced, the worst read/write O (n) times
- Self-balancing Binary Search Tree (AVL): Based on BST, the self-balancing algorithm is added to read/write maximum O (log2 (n) times.
- Red/black tree (RBT): another self-balancing search tree, read/write maximum O (log2 (n) times
BST, AVL, and RBT optimize the number of reads and writes from O (n) to O (log2 (n). AVL and RBT have more self-balancing functions than BST, reduce the number of reads and writes to the maximum O (log2 (n )).
Assume that the auto-incrementing primary key is used, the primary key itself is ordered, and the number of reads and writes in the tree structure can be optimized to the tree height. The lower the tree height, the fewer reads and writes. The self-balancing ensures the stability of the tree structure. If you want further optimization, you can introduce the B and B + trees.
What problems does B-tree solve?
Many articles refer to B-Tree as B-(subtract) Tree, which may be a misunderstanding of its English name "B-Tree" (what's more, B is called a binary tree or a binary search tree ). Especially when talking with the B + tree. Assume that B + (plus) is a B-(minus) Tree. In fact, the English name of B + is B +-Tree ".
If you leave the maintenance operation aside, B is like a "m-cross search tree" (m is the maximum number of sub-trees), with the time complexity of O (logm (n )). However, B-tree is designed with an efficient and simple maintenance operation, which enables the depth of B-tree to be maintained in approx. log (ceil (m/2) (n )~ Logm (n) greatly reduces tree height.
Emphasize again:
Do not worry about time complexity. Unlike simple algorithms, disk I/O count is a greater factor. Readers can deduce that the time complexity of Tree B is the same as that of AVL. However, because Tree B has fewer layers and fewer disk IO times, the performance of Tree B is superior to binary trees such as AVL in practice.
Similar to the Binary Search Tree, each node stores multiple keys and Subtrees. the Subtrees and keys are arranged in order.
The Directory of the Page table is extended external storage + accelerated disk read/write. A Page is usually 4 K (equal to the size of the disk data block, see inode and block analysis ), each time the operating system loads content from the disk to the memory in units of a page (the track cost is apportioned), the page is modified, and the page will be written back to the disk. Considering the good nature of the page table, the size of each node can be approximately equal to one page (making m very large), so that each page loaded can completely overwrite one node, so that you can select the next subtree. For a page table, AVL (or RBT) is equivalent to a B-tree with one key + two Subtrees. Logically adjacent nodes are physically not adjacent. Therefore, when reading a 4 K page, most of the space on the page will be invalid data.
Assume that keys and subtree node pointers all occupy 4B, the maximum number of B-Tree nodes ism * (4 + 4) = 8m B
The page size is 4 kb. Thenm = 4 * 1024 / 8m = 512
, A 512-cross B-tree, w Data, maximum depthlog(512/2)(10^7) = 3.02 ~= 4
. The depth of Binary Tree comparison, such as AVL, islog(2)(10^7) = 23.25 ~= 24
, The difference is more than five times. Shocked! The depth of the B-tree index is so great!
In addition, Tree B is very friendly to the local principle. If the key is relatively small (for example, the auto-increment key of 4B above), the cache can further accelerate the pre-read operations in addition to the page table addition. Meizzizi ~
What is the problem solved by the B + tree?
However, if you want to apply the data to the database index, the B-tree has some problems:
Question 1
There are multiple fields in the record of the data table. It is not enough to locate only the primary key, but also the data row. There are three solutions:
Solution 1: directly pass the data storage row to reduce the number of sub-trees on the page, and m to decrease the tree height.
A field is added to the node in solution 2. If it is a pointer of 4B, the newm = 4 * 1024 / 12m = 341.33 ~= 341
, Maximum depthlog(341/2)(10^7) = 3.14 ~= 4
.
The node m and depth of solution 3 remain unchanged, but the time complexity changes to stable O (logm (n )).
Solution 3 can be considered.
Question 2
In actual business, range queries are frequently performed. Tree B can only locate one index location (which may correspond to multiple rows), making it difficult to process range queries. Two solutions are slightly changed:
At first glance, it seems that solution 1 is better than solution 2-the time complexity and constant term are the same, and solution 1 does not need to be changed. But don't forget the Locality Principle. no matter whether the node stores data rows or data row locations, solution 2 can still use the page table and cache to pre-read the information of the next node. Solution 1 faces the disadvantages of logical adjacent nodes and physical separation of nodes.
Extract B + tree
To sum up, solution 2 of Issue 1 and solution 1 of Issue 2 can be integrated into a solution (based on B-tree indexes ), solution 3 of Issue 1 and solution 2 of Issue 2 can be integrated into one (Index Based on B + tree ). In fact, some databases and file systems adopt the B-tree and some adopt the B + tree.
Because some monkeys do not yet understand the reason, mainstream databases, including MySQL, choose B + tree. That is:
Major changes are described above:
- Modify the organization logic of the key and subtree, and drop the index access to the leaf node.
- Concatenates leaf nodes in sequence to facilitate range query)
Addition, deletion, and query of Tree B and Tree B +
For the addition and deletion process of Tree B, refer to section 6 "insert and delete operations of Tree B" From Tree B, Tree B +, and tree B, the addition and deletion of the B + tree are the same. We will not go into details here.
Mysql INDEX OPTIMIZATION
According to the nature of the B + tree, it is easy to understand various common MySQL INDEX OPTIMIZATION ideas.
Currently, the differences between different engines are not considered.
Auto-increment key is used as the primary key first.
In the previous analysis, assume that the auto-increment key of 4B is used as the index, then m can reach 512, and the height is only 3. The auto-increment key has two advantages:
Optimization experience:
Monkeys used the varchar (100) column as the primary key to store containerId. After 3 or 4 days, the database will be full, in the email, the DBA's sister-in-law expressed her contempt for me... Then, the auto-increment column is added as the primary key and the containerId is used as the secondary index of unique. The time and space optimization results are quite remarkable.
Leftmost prefix match
An index can be as simple as a column (a) or as complex as multiple columns (a, B, c, d), that isJoint Index
. If it is a joint index, the key is also composed of multiple columns. At the same time, the index can only be used to find whether the key exists (equal ), when a Range Query (>, <, between, and like left Match) is encountered, it cannot be further matched, and the subsequent degradation is linear search. Therefore, the order of columns determines the number of columns that can hit the index.
If there are indexes (a, B, c, d) and query Conditionsa = 1 and b = 2 and c > 3 and d = 4
In turn, each node hits a, B, and c, but cannot hit d. That is, the leftmost prefix matching principle.
=, In automatic optimization order
No need to consider the order of =, in, etc. mysql will automatically optimize the order of these conditions to match as many index columns as possible.
If there are indexes (a, B, c, d) and query Conditionsc > 3 and b = 2 and a = 1 and d < 4
Anda = 1 and c > 3 and b = 2 and d < 4
And so on. MySQL automatically optimizesa = 1 and b = 2 and c > 3 and d < 4
, Hitting a, B, and c in turn.
Index Columns cannot participate in Calculation
The query conditions with index columns involved in calculation are unfriendly to the index (or even the index cannot be used), as shown infrom_unixtime(create_time) = '2014-05-29'
.
The reason is simple. How can I find the corresponding key in the node? If linear scanning is performed, re-calculation is required each time, and the cost is too high. If binary search is performed, the size relationship needs to be determined based on the from_unixtime method.
Therefore, index Columns cannot participate in calculation. Abovefrom_unixtime(create_time) = '2014-05-29'
The statement should be writtencreate_time = unix_timestamp('2014-05-29')
.
Do not create an index if it can be expanded
If you already have an index (a) and want to create an index (a, B), try to change index (a) to index (a, B ).
The cost of creating an index is easy to understand. If the index (a) is changed to the index (a, B), MySQL can directly change the index to the index (, B ).
You do not need to create an index with a prefix and a inclusion relationship.
If you already have an index (a, B), you do not need to create another index (a). However, if necessary, you still need to consider creating an index (B ).
Select a column with a high degree of discrimination as an index
It is easy to understand. For example, if you use gender as an index, the index can only divide million rows of data into two parts (for example, million male and million female), and the index is almost ineffective.
Discrimination
The formula iscount(distinct <col>) / count(*)
, Indicating the proportion of fields that are not repeated. The larger the proportion, the better the discrimination. The differentiation degree of the unique key is 1, while the differentiation degree of some State and gender fields in front of big data may be close to 0.
This value is hard to determine. Generally, join is required for more than 0.1 fields, that is, 10 records are scanned on average.