About the B + Tree data structure The ①INNODB storage engine supports two common indexes.
One is a B + tree and one is a hash.
B + in the tree is not binary (binary), but balance (balance), since the B + tree was originally evolved from a balanced binary tree, but the B + tree is not a balanced binary tree.
Also, the B + Tree index does not find a specific row for a given key value. The B + Tree index can only find the page where the data row is located . The database then reads the page into memory, finds it in memory, and finally gets the data it finds.
Again, balanced binary tree:
This is a balanced binary tree, the value of Zuozi is always small root value, the value of the right subtree is always greater than the root of the key value, so you can through the middle sequence traversal (in the order of recursion to the left and right to access the subtree), so the output is 9, 17, 28, 35, 39, 56, 65, 87. Thus, if you want to find a record with a key value of 28, find the root, and then find the root is greater than 28, find left dial hand tree, found that the root of Zuozi 17 is less than 28, and then find the next layer of the right subtree, and then find 28. 3 lookups found the node you need to find. But if the binary tree node distribution is very uneven, as in the second picture, then if you want to find the 39 node, the search efficiency and order lookup is almost, the worst result is to find 65, then the two-fork search tree will be completely degenerate into a linear table. So if you want to construct a binary lookup tree for maximum performance, this binary lookup tree needs to be balanced, and the balanced binary tree is relatively high for the performance of the lookup, but not the highest, just near the highest performance. To achieve the best performance, we need to build an optimal binary tree, but the establishment and maintenance of the optimal binary tree requires a lot of operation, so it is better to use a balanced binary tree. At the same time, the balanced binary tree is used more in memory structure objects, so maintaining his overhead is relatively small.
Why does ② use B + trees?
Although binary lookup tree and balanced binary tree can achieve faster data lookup, however, because the content of the database is present on disk, and disk IO compared with memory Io, 10^5~10^6 times slower than memory Io, in order to reduce disk IO, improve the retrieval speed, so the data structure of B + tree is used. in other words, a B + tree is a multi-path lookup tree designed for disk or other direct access assistive devices, which is a multi-fork tree.
③ what is a B + tree and what its characteristics are
B + Tree concept is still too complex, directly more appropriate, to a wikidata encyclopedia:
As can be seen from the above, all the recorded nodes are in the leaf node, and are stored sequentially , if we start from the leftmost node, we can get the order of all the key values are: 1, 2, 3, 4, 5, 6, 7.
In a B + tree, all record nodes are placed in the same layer's leaf nodes according to the order of the key values, and each leaf node is connected by a pointer. because of the number of data stored in one node, the number of disk IO times will be much lower when retrieving.
When a B + tree is inserted, in order to maintain balance, a large number of split page operations may be required for the newly inserted key value, while the B + tree is primarily used for disks, so splitting of the page means disk operations, so you should minimize page splits if possible. Therefore, the B + Tree provides the function of rotation. As for the rotation and deletion of content, too complex, this note is not recorded first. Just understand the reasons for using B + trees and the characteristics of B + trees.
About Indexes
The InnoDB storage engine uses a clustered index, and the actual data rows and associated key values are saved in one piece. Thus, it is always necessary to use the index to access the data in InnoDB two times instead of once. Because the index leaf node is not stored in the physical location of the row, it is the value of the primary key. That is: Two indexes---------the data row is found by page directory in the data leaf byte point.
Because each InnoDB table will have a primary key index, but what if it is not explicitly specified? If you do not manually specify a primary key index, then the INNODB engine assigns a unique column as the primary key, and if there is no unique field, an implied column is automatically generated as the primary key.
Therefore, in the design of InnoDB, you should use as much as possible a self-increment primary key for business-independent auto_increment instead of using random (unordered) clustered keys such as UUID. Also, because all indexes use the index of the primary key, if the primary key index is too long, the secondary index correspondingly becomes larger.
The storage of a clustered index is not physically contiguous, but logically contiguous. On the one hand, the page is connected by a doubly linked list, and the pages are arranged in the order of the primary key, and on the other hand, the records in each page are maintained through a doubly linked list, which can be stored on the physical storage as well as the primary key.
For the current MySQL, all the addition or deletion of the index, the MySQL database is to create a new temporary table, and then import the data into the temporary table, and then delete the original table, and then named the temporary table as the original table. So, if there is too much data in a table, then it takes a long time to add the index later, so it's best to design the index early in the database design.
Also, although the InnoDB storage engine starts with version InnoDB plugin, it supports a method called Fast index creation, but this method is limited to secondary indexes, and it is necessary to rebuild a table for primary key creation and deletion.
B + Tree index algorithm for INNODB storage engine