B + Tree | MySQL Index usage principles

Source: Internet
Author: User
Tags mysql query mysql index

MySQL has never known much, before the SQL prepared to submit to the production environment before the time, the old staff to help me check the next SQL, let me modify the storage engine, then I used the MyISAM, and later changed to InnoDB. Why to change to this, have not heard of the storage engine before, so the online check a bit.

In fact, there is a big difference between using different storage engines, as you can see from the Ape friends below.

A comparison of storage engines

Note: The B-Tree index mentioned above does not indicate an index of b-tree and b+tree, but the definition for B-tree and B + Tree is different.

In MySQL, there are four main types of indexes: B-tree index, Hash Index, Fulltext Index, and R-tree index.

The B-tree index is the most frequently used index type in a MySQL database, and all storage engines except the Archive storage engine support B-tree indexes. The Archive engine does not support indexing until MySQL 5.1, and only supports indexing a single auto_increment column.

Not only in MySQL, but in many other database management systems, the B-tree index is also the most important index type, mainly because the storage structure of the B-tree index has a very good performance in data retrieval of the database.

In general, the physical files of the B-tree index in MySQL are mostly stored in the structure of the Balance tree, that is, all the data that is actually needed is stored in the leaf node of the tree, and the shortest path to any of the leaf nodes is long Are exactly the same, so we all call it the B-tree index. Of course, it is possible that various databases (or MySQL's various storage engines) will slightly transform the storage structure when storing their own b-tree indexes. such as the B-tree index of the INNODB storage engine actually uses the storage structure is b+tree, that is, on the basis of the B-TREE data structure has made a small transformation , on each leaf Node above the information to hold the index key, Also stores pointer information to the next leafnode that is adjacent to the Leaf node (with the addition of sequential access pointers), primarily to speed up the efficiency of retrieving multiple neighboring leaf node.

InnoDB is the default storage engine for MySQL (Mysql5.5.5 before MyISAM)

It may be difficult for an ape friend who has not known the index to read this article, it is necessary to have a general understanding of the MySQL index, you can look at the little pigeon another article: Database query Optimization--mysql index. After reading this article, we will look back at the above text description.

Let's take a look at the concept of B-tree and B + trees first. Figure out why the index query will speed up.

Second, B-tree, + + Tree concept

B-Tree

Binary search tree:

1. All non-leaf nodes have a maximum of two sons (left and right);

2. All nodes store a keyword;

3. The left pointer of a non-leaf node points to a subtree smaller than its key, and the right pointer points to a subtree larger than its key;

Such as:

B-Tree

is a multi-path search tree (not two-pronged):

1. Definition of any non-leaf node up to only m sons; m>2;

2. The number of sons of the root node is [2, M];

3. The number of sons of non-leaf nodes outside the root node is [M/2, M];

4. Each node is stored at least m/2-1 (rounded) and up to M-1 keywords; (at least 2 keywords)

5. Number of key words for non-leaf nodes = number of pointers to sons-1;

6. Non-leaf node keywords: k[1], k[2], ..., k[m-1]; K[i] < k[i+1];

7. Pointers to non-leaf nodes: p[1], p[2], ..., p[m], where p[1] a subtree that points to a keyword less than k[1], p[m] a subtree that points to a keyword greater than k[m-1], and other p[i] to the subtree of the keyword belonging to (k[i-1], k[i]);

8. All leaf nodes are located on the same floor;

such as: (M=3)

B-Tree search, starting from the root node, a binary search of the keyword (ordered) sequence within the node, if

Hit the end, or enter the query keyword belongs to the range of the son node; repeat until the corresponding son pointer is

Empty, or already a leaf knot;

B-Tree Features:

1. The keyword set is distributed throughout the tree;

2. Any keyword appears and appears only in one node;

3. Search may end at non-leaf nodes;

4. Its search performance is equivalent to doing one-time binary search within the complete range of keywords;

5. Automatic level control;

Because it restricts non-leaf nodes outside of the root node, it contains at least M/2 sons, ensuring that the end point is at least utilized.

So the performance of B-tree is always equivalent to binary lookup (independent of M-value), there is no problem of B-tree equilibrium;

Due to the limitation of the M/2, when the node is inserted, if the node is full, it is necessary to divide the node into two M/2 nodes, and to delete the nodes, it is necessary to merge the two M/2 brothers nodes;

B + Tree

B + trees are variants of B-trees and are also a multi-path search tree:

1. Its definition is basically the same as the B-tree, except:

2. The sub-tree pointer of non-leaf node is the same as the number of keywords;

3. The subtree pointer of the non-leaf node p[i], pointing to the subtree (b-tree is open interval) of the key value belonging to [K[i], k[i+1]);

5. Add a chain pointer for all leaf nodes;

6. All keywords appear at the leaf node;

such as: (M=3)

B + 's search is basically the same as that of the B. C-tree, except that the difference is that B + trees only hit the leaf nodes (b-trees can be

Non-leaf node hit), and its performance is equivalent to the keyword in the complete collection of two-point search;

Features of B +:

1. All keywords appear in the list of leaf nodes (dense index), and the key words in the list are in order;

2. Cannot be hit on non-leaf nodes;

3. The non-leaf node is equivalent to the index of the leaf node (sparse index), and the leaf node is equivalent to the data layer of storing (key) data;

4. More suitable for file indexing system;

After understanding the concept of the b-/b+ tree, we continue to analyze the principle that B + trees improve efficiency.

Three, B + Tree index principle

For example, is a B + tree, the definition of B + tree can be seen in the B + tree, here is only a few points, the light blue block we call a disk block, you can see each disk block contains several data items (dark blue) and pointers (shown in yellow), such as disk Block 1 contains data items 17 and 35, including pointers P1, P3,P1 represents a disk block that is less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. Real data exists at leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the direction of the search, such as 17 and 35, do not exist in the data table.

The discovery process of B + trees

, if you want to find the data item 29, then the disk Block 1 is loaded into memory by disk, at this time Io, in memory with a binary lookup to determine 29 between 17 and 35, locking disk Block 1 P2 pointer, memory time because of very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive.

B + Tree Nature

1. Through the above analysis, we know that the number of IO depends on the height of B + H, assuming that the current data table data is N, the number of data items per disk block is M, then there is H=㏒ (m+1) n, when the amount of data n a certain case, m larger, h smaller, and m = size of disk block/data item The size of the disk block is also a data page size, is fixed, if the data items occupy less space, the more data items, the lower the height of the tree. This is why each data item, the index field, is as small as possible, such as an int accounting for 4 bytes, which is less than half the bigint8 byte. This is why the B + tree requires the real data to be placed on the leaf node instead of the inner node, and once placed in the inner node, the data items of the disk block will be greatly reduced, resulting in a higher tree. When the data item equals 1 o'clock, it will degenerate into a linear table.

2. When the data item of the B + tree is a composite data structure, such as (Name,age,sex), the B + number is based on the left-to-right order to establish the search tree, such as when the data (Zhang San, 20,f) is retrieved, the B + tree will first compare the name to determine the direction of the next search, If name is the same, then compare age and sex, and finally get the retrieved data, but when the (20,F) does not have the name of the data, B + tree does not know which node to check next, because the search tree when the name is the first comparison factor, You must search by name first to know where to go next. For example, when (Zhang San, F) such data to retrieve, B + tree can use name to specify the direction of the search, but the next field of age is missing, so only the name equal to Zhang San data are found, and then match the gender is the data of F, this is very important property, that is, the index of the leftmost matching characteristics.

Slow query optimization

About MySQL indexing principle is a relatively boring thing, we just need to have a perceptual understanding, do not need to understand very thoroughly and deeply. We look back at the beginning of the slow query we said, after understanding the index principle, do you have any ideas? Let's summarize some of the basic principles of indexing

Iv. several principles of index building

1. The leftmost prefix matching principle, very important principle, MySQL will always match right until it encounters a range query (>, <, between, like) to stop the match, such as a = 1 and B = 2 and C > 3 and D = 4 if established (a,b,c,d The index of the order, D is not indexed, if the establishment (A,B,D,C) of the index can be used, the order of a,b,d can be arbitrarily adjusted.

2.= and in can be disorderly, such as a = 1 and B = 2 and c = 3 build (a,b,c) index can be arbitrary order, the MySQL query optimizer will help you to optimize the form of the index can be recognized

3. Try to choose a high-differentiated column as the index, the formula for the degree of sensitivity is count (distinct col)/count (*), indicating that the field does not repeat the scale, the greater the proportion of the number of records we scan, the difference between the unique key is 1, and some states, The gender field may be 0 in front of big data, and one might ask, what is the empirical value of this ratio? Using different scenarios, this value is also difficult to determine, generally need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 records

4. The index column cannot participate in the calculation, keep the column "clean", such as from_unixtime (create_time) = ' 2014-05-29 ' can not be used to the index, the reason is simple, B + tree is stored in the Data table field values, but when the retrieval, You need to apply all the elements to the function to compare, obviously the cost is too large. So the statement should be written create_time = Unix_timestamp (' 2014-05-29 ');

5. Expand the index as much as possible and do not create a new index. For example, the table already has an index of a, now to add (A, b) of the index, then only need to modify the original index

Reference article:
Http://blog.sina.com.cn/s/blog_4e0c21cc01010itp.html
Http://www.cnblogs.com/oldhorse/archive/2009/11/16/1604009.html
http://blog.jobbole.com/86594/

B + Tree | MySQL Index usage principles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.