Overview of MySQL btree Indexes

Source: Internet
Author: User
Tags keyword list

Overview of MySQL btree Indexes

Today, I have studied the B-tree index in mysql. Through this article, you can understand the principle of the B-tree index in mysql and the data retrieval process, different btree indexes in innodb and myisam engines, as well as the benefits and restrictions of btree indexes.

B-Tree indexes are the most frequently used index types in MySQL databases. All storage engines except the Archive storage engine support B-Tree indexes. Not only in MySQL, but in many other database management systems, the B-Tree index is also the most important index type, this is mainly because the storage structure of B-Tree indexes has excellent performance in database data retrieval, it is worth noting that the B-tree indexes in innodb and myisam engines in mysql use B + tree (that is, each leaf node contains a pointer to the next leaf node, in this way, the range of leaf nodes is easily traversed, and only key values and pointers are stored on other nodes except leaf nodes ).

Generally, the physical files of the B-Tree index in MySQL are mostly stored in the B + tree structure, that is, all the actually needed data is stored in the Leaf Node of the Tree, in addition, the shortest path length to any Leaf Node is identical, and various databases (or various MySQL storage engines) may be used) when you store your own B-Tree indexes, the storage structure is slightly modified. For example, the actual storage structure used by the B-Tree index of the Innodb Storage engine is actually B + Tree, that is, a small transformation has been made on the basis of the B-Tree data structure, in addition to information related to the index key value and primary key, B + Tree also stores the pointer information pointing to the next LeafNode adjacent to the Leaf Node, this is mainly to accelerate the efficiency of retrieving multiple adjacent Leaf nodes.

I. The following describes the different implementation principles of the B-tree indexes of innodb and myisam in mysql;

1) MyISAM index implementation

The MyISAM engine uses B + Tree as the index structure. The data domain of the leaf node only stores the address (also called the row pointer) pointing to the data record. In MyISAM, the primary index and Secondary index (Secondary key) have no difference in structure, but the primary index requires the key to be unique, and the Secondary index key can be repeated.

2) InnoDB Index implementation

Although InnoDB uses B + Tree as the index structure, the implementation method is different from that of MyISAM.

As mentioned above, the MyISAM index file is separated from the data file, and the index file only stores the address (row pointer) of the data row record ). However, in the innodb engine, btree indexes are divided into two types: 1. Clustered indexes (primary key indexes), 2. Secondary indexes, or secondary indexes. The primary key index in InnoDB is a clustered index, and the table data file itself is an index structure organized by B + Tree, the data field of the leaf node of this tree stores the complete data records (the whole row of data ). The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary key index. However, the innodb secondary index stores the index column value and the pointer to the primary key. Therefore, we use the overwriting index to optimize mysql innodb indexes.

The summary is as follows:

Content stored by the leaf node in the MyISAM engine:

Primary Key Index: only store row pointers;

Secondary indexes: only row pointers are stored;

Content stored by leaf node in InnoDB Engine

Primary Key Index: clustered index stores complete data (full row data)

Secondary index: stores index column values + primary key information

The following figure shows the index implementation principles of innodb and myisam engines in mysql.

Ii. Next let's talk about the process of retrieving data through the btree index:

Both myisam and innodb engines use B + tree to implement btree indexes. During data retrieval, from the root node to the child node, locate the leaf node, the process for finding the data stored in the leaf node is the same, but because the data stored in myisam and innodb engine are different (as described above ), therefore, the process of finding the data in the leaf node and then finding the real data is different. Then, based on the different data stored in the leaf node data in the different storage engines described above, for example, the primary key index leaf node in innodb stores the complete data rows. Therefore, when we traverse data based on the primary key index in innodb, we can find the data of the leaf node, in myisam, the leaf node data stores the row pointer, that is, find the data of the leaf node, and then find the real data row based on the row pointer.

Next we will focus on the process of finding the data domain in the leaf node from the root node:

For comparison, Let's first look at the B-tree implementation principle:

B + tree implementation principles are as follows:

The two graphs show that, compared with B-tree, B + Tree root nodes and subnodes only store key values and pointers, all data record nodes are stored on the leaf nodes of the same layer according to the key value size sequence. This greatly increases the number of key values stored on each node and lowers the height of B + Tree, in addition, leaf nodes in B + Tree store more pointers to the next leaf node than B-tree, which makes it easier to traverse the range of leaf nodes.

Each node occupies the disk space of one disk block. A node has n sort-in-ascending keywords and (n + 1) pointers to the Child root node, this pointer stores the address of the disk block where the sub-node is located (note that n is calculated based on the data volume when creating the index. If the data volume is too large, the three layers may not be able to meet the requirements. The four layers of B + tree or more layers are required. Then, n keywords are divided into (n + 1) range fields, then each range field corresponds to a pointer to point to the subnode, And the subnode is further divided based on the keyword, and then the Pointer Points to the leaf node.

The following explains how B + tree indexes are implemented (Modified from the network ):

Each node occupies the disk space of one disk block. Each node has two keywords in ascending order and three pointers pointing to the Child root node, the three range fields divided by two keywords correspond to the range fields of the data that the three pointers point. Take the root node as an example. The keywords are 17 and 35. The data range of the Child tree pointed by the P1 pointer is smaller than 17, and that of the child tree pointed by the P2 pointer is 17 ~ 35. The data range of the Child tree pointed by the P3 pointer is greater than 35.

Then, we simulate the specific process of where id = 29: (first, mysql reads data in blocks (pages ).

First, find disk Block 1 based on the root node and read it into the memory. Disk I/O operation 1st times]

Compare the keyword 29 in the range (), find the pointer P2 of disk Block 1.

Locate disk block 3 based on the P2 pointer and read it into the memory. Disk I/O operation 2nd times]

When the comparison keyword 29 is in the range (), find the pointer P2 of disk Block 3.

Find disk block 8 Based on the P2 pointer and read it into the memory. Disk I/O operation 3rd Times]

In the disk block 8 keyword list, find the keyword 29.

After analyzing the above process, it is found that three disk I/O operations and three Memory search operations are required. Because the keywords in the memory are an ordered table structure, you can use the binary lookup method to improve efficiency. Three disk I/O operations affect the efficiency of the entire B-Tree search. Compared with AVLTree, B-Tree reduces the number of nodes, which makes the data retrieved from the memory by disk I/O more efficient.

Compared with B-tree, B + Tree root nodes and sub-nodes only store key values and pointers,

View the page size in mysql:

MySQL [meminfo]> show variables like 'innodb _ page_size ';

+ ------------------ + ------- +

| Variable_name | Value |

+ ------------------ + ------- +

| Innodb_page_size | 16384 |

+ ------------------ + ------- +

1 row in set (0.00 sec)

In the InnoDB Storage engine, the page size is 16 KB. Generally, the table's primary key type is INT (4 bytes occupied) or BIGINT (8 bytes occupied ), the pointer type is usually 4 or 8 bytes. That is to say, a page (a node in B + Tree) stores about 16 KB/(8B + 8B) = 1 K key values (for the convenience of calculation, the value of K here is ^ 3 ). That is to say, a B + Tree index with a depth of 3 can maintain 10 ^ 3*10 ^ 3*10 ^ 3 = 1 billion records.

In actual situations, each node may not be fully filled. Therefore, in the database, the height of B + Tree is generally 2 ~ Layer 4. The InnoDB Storage engine of MySQL is designed to resident the root node in the memory. That is to say, it only needs 1 ~ to find a row record of a key value ~ Three disk I/O operations.

Iii. Next we will talk about the differences between the traversal data of the non-clustered tables in mysql innodb engine and myisam.

For example (from the network ):

It seems that the efficiency of clustered indexes is significantly lower than that of non-clustered indexes, because each secondary index search requires two B + tree searches. Isn't that an option? What are the advantages of clustered index?

1. Because the row data and the leaf node are stored together, the primary key and row data are loaded into the memory together, And the row data can be returned immediately after the leaf node is found, if data is organized by the primary key Id, the data is obtained faster.

2. The secondary index uses the primary key as the "Pointer" instead of the row address value as the pointer. This reduces the maintenance of secondary indexes when row movement or data page splitting occurs, using the primary key value as a pointer will occupy more space for the secondary index. The advantage is that InnoDB does not need to update the "Pointer" in the secondary index when moving data ", using clustered indexes ensures that no matter how the node of the primary key B + tree changes, the secondary index tree is not affected.

Iv. Finally, let's talk about the benefits and restrictions of B + tree indexes in mysql (from the third edition of high-performance mysql)

(1) applicable situations:

You can use the query type of the btree index. The btree index is used for full key value, key value range, or key prefix search. The key prefix search is only suitable for searching based on the leftmost prefix. The index created in the preceding example is valid for the following types of queries:

1) full value matching

Full value matching means matching all the columns in the index to find the name and date of birth.

2) match the leftmost prefix

For example, only the last name is queried, that is, only the first column of the index is used.

3) match the column prefix

You can also match only the beginning of a column value. For example, if you match a person with a surname starting with J (like 'J % '), the first column of the index is used, and is part of the first column.

4) matching range value

For example, if you look for the person with the last name between allen and barrymore, only the first column of the index is used.

5) precisely match a column and match another column in the range

For example, you can find all the names starting with allen and starting with K, that is, the first column last_name exact match, and the second column first_name range match.

6) access only the index Query

B-tree generally supports queries that only access the index. That is, the query only needs to access the index without accessing the data rows. That is, this is the concept of covering the index. The data to be accessed is directly obtained from the index, which is for the btree index in innodb.

Because the nodes in the index tree are ordered, in addition to searching by value, the index can also be used for the order by operation in the query. Generally, if the btree can find the value in some way, this method can also be used for sorting. Therefore, if the order by clause meets the query types listed above, this index can also meet the corresponding sorting requirements.

(2) The following are limits on the btree index:

1) if you do not start searching based on the leftmost index, you cannot use the index. (Note that this is not the order of the where condition, that is, in the where condition, regardless of the conditional order, as long as the columns in the where clause can be consistent from the leftmost column in the Multi-column index, the multi-column index can be used)

2) You cannot skip the columns in the index. For example, if you have created multiple column indexes (such as the last name, name, and date of birth): the query condition is "last name" and "Birth date", and the name column is skipped, you can only use the last name for multiple column indexes.

3) if a query contains a range query for a column, indexes cannot be used to optimize the query for all columns on the right of the query, for example: where last_name = xxx and first_name like 'xxx % 'and dob = 'xxx'; in this way, the first_name column can use indexes, and the dob column after this column cannot use indexes.

Summary: The commonly used engines in mysql include innodb and myisam. The default indexes created in these two engines are B-tree indexes, which are implemented by the B + tree structure, in addition, the content stored on the specific leaf node of innodb and myisam is different, and the covered index is for the index of the innodb engine, in myisam engine, the leaf nodes of B-tree indexes only store row pointers.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.