The difference between MySQL's btree index and hash index

Source: Internet
Author: User

Hash index structure of the particularity, its retrieval efficiency is very high, index retrieval can be located at once, unlike B-tree index need from the root node to the side point, and finally access to the page node so many IO access, so the Hash index query efficiency is much higher than the B-tree index.

Probably a lot of people have doubts, since the efficiency of the hash index is much higher than b-tree, why do we not all use hash index and also use B-tree index? Everything has two sides, hash index is the same, although the hash index is high efficiency, but the hash index itself because of its particularity also brought a lot ofLimitations and drawbacks, mainly with the following.

(1) Hash index can only meet "=", "in" and "<=>" query, can not use range query.

Because the hash index comparison is the hash value after the hash operation, so it can only be used for the equivalent of filtering, can not be used for range-based filtering, because the corresponding hash algorithm after processing the hash value of the size of the relationship, and can not be guaranteed and hash before the exact same.

(2) Hash index cannot be used to avoid sorting operations of data.

Because the hash index is stored in the hash after the hash value, and the size of the hash value is not necessarily the same as the key value before the hash operation, so the database can not use the index data to avoid any sorting operations;

(3) Hash index cannot use partial index key query.

For the composite index, the hash index in the calculation of the hash value when the combination index key merge and then calculate the hash value together, rather than calculate the hash value alone, so by combining the index of the previous or several index key query, the Hash index can not be exploited.

(4) Hash index cannot avoid table scan at any time.

As I have known before,Hash Indexis to pass the index key through the hash operation, the hash value and the corresponding line pointer information stored in a hash table, because the different index keys exist the same hash value, so even if the number of records that satisfy a certain hash key value can not be directly completed from the hash index Query, or by accessing the actual data in the table to make a corresponding comparison, and get the corresponding results.

(5) When a hash index encounters a large number of equal hash values, performance is not necessarily higher than the B-tree index.

ForselectivityLower index keys, if you create a hash index, then there will be a large number of record pointer information stored in the same hash value associated. This can be very cumbersome to locate a record, wasting multiple table data access, resulting in poor overall performance

2. B-tree Index

The B-tree index is the most frequently used index type in a MySQL database, and all storage engines except the Archive storage engine support B-tree indexes. Not only in MySQL, but in many other database management systems, the B-tree index is also the most important index type, mainly because the storage structure of the B-tree index has a very good performance in data retrieval of the database.
In general, the physical files of the B-tree index in MySQL are mostly stored in the structure of the Balance tree, that is, all the data that is actually needed is stored in the leaf node of the tree, and the shortest path to any leaf node is exactly the same length The same, so we all call it B-tree index of course, it is possible that various databases (or MySQL's various storage engines) will slightly transform the storage structure when storing their own b-tree indexes. such as the B-tree index of the INNODB storage engine actually uses the storage structure is b+tree, that is, on the basis of the B-tree data structure made a small transformation, in each
The leaf node contains information about the index key, and it stores pointers to the last leafnode adjacent to the leaf node, primarily to speed up the efficiency of retrieving multiple neighboring leaf node.
In the INNODB storage engine, there are two different forms of indexes, one is the Cluster form of the primary key index (Primary key), the other is the same as other storage engines (such as the MyISAM storage engine) stored in the same general B-tree index, the index in the Inno The DB storage engine is known as secondary Index. Let's take a picture of how these two indexes are stored
form to make a comparison.

The Primary Key is stored as Clustered in the left side of the diagram, and the normal B-tree index is on the right. Both Root Node and Branch Nodes are exactly the same. And the Leaf Nodes there is a difference. In Prim, Leaf Nodes holds the actual data of the table, not only the data of the primary key field, but also the order of the data of the other fields in the primary key value. While secondary index is not much different from other common B-tree indexes, the Leaf Nodes contains information about the index key and stores the Innodb primary key value.

Therefore, in Innodb if access to data through the primary key is very high, and if the data is accessed through secondary index, INNODB first through the relevant information secondary index, through the corresponding index key to retrieve the Leaf node, It is necessary to retrieve the corresponding data row by the primary key value stored in Leaf Node and then through the primary key index. The primary key and non-primary key indexes of the MyISAM storage engine differ very little, except that the index key of the primary key index is a unique and non-null key. and the MyISAM Storage Engine index and INNODB of the secondary index storage structure is basically the same, the main difference is only MyISAM storage engine on the Leaf Nodes above the index key information, and then stored directly to the MyISAM data file Information in the corresponding data row (such as row number), but does not hold key value information for the primary key

The difference

Note : First, in the MySQL document, the B + Tree index is actually written as Btree, such as the following:

CREATE TABLE T (
aid int unsigned NOT NULL auto_increment,
userid int unsigned NOT NULL default 0,
Username varchar (+) NOT null default ' ',
Detail varchar (255) NOT null default ' ',
Primary key (AID),
Unique key (UID) USING BTREE,
Key (username) USING BTREE - Here the uname column creates only a partial index of the leftmost 12 character length
) Engine=innodb;

A classic B + Tree index data structure is shown in:

The B + Tree is a balanced multi-fork tree, where the difference in height from the root node to each leaf node is no more than 1, and pointers are linked to each other at the same level.

In the B + Tree of the regular search, from the root node to the leaf node of the searching efficiency is basically the same, there will be no large fluctuations, and index-based sequential scanning can also be used to quickly move around the two-way pointer, the efficiency is very high.

Therefore, the B + Tree index is widely used in databases, file systems and other scenarios. By the way, one of the reasons the Xfs file system is much more efficient than EXT3/EXT4 is that its file and directory index structures all use B + Tree indexes, while the EXT3/EXT4 file directory structure uses linked list, hashed B-tree, extents/ Index data structures such as bitmap, so their IOPS capability is inferior to XFS under high I/O pressure.

The hash index is the same:

Simply put, the hash index is to use a certain hashing algorithm , the key value into a new hash value, the retrieval does not need to resemble a B + tree from the root node to the leaf node to search, only once the hash algorithm can immediately locate the corresponding position, very fast.

From the above figure, the obvious difference between the B + Tree index and the hash index is:

    • if the query is equivalent, then the hash index obviously has an absolute advantage , because it only needs to go through an algorithm to find the corresponding key value; Of course, the premise is that the key value is unique. If the key value is not unique, you need to find the location of the key, and then follow the list back to scan, until the corresponding data is found;

    • It can also be seen, if the scope of query retrieval, when the hash index is useless , because the original is an ordered key value, after the hashing algorithm, it is possible to become discontinuous, there is no way to use the index to complete the scope of query retrieval;

    • Similarly, the hash index does not use the index to complete the sorting , and like ' xxx% ' such a partial fuzzy query (this part of the fuzzy query, in fact, is also the scope of query);

    • The hash index also does not support the leftmost matching rule for multiple-column federated indexes ;

    • B + Tree Index keyword retrieval efficiency is relatively average, not as large as the magnitude of the wave, in the case of a large number of duplicate key values, the efficiency of the hash index is very low, because there is a so-called hash collision problem .

Postscript

In MySQL, only the Heap/memory engine table can explicitly support hash indexes (NDB is also supported, but this is not commonly used), and the InnoDB engine's Adaptive Hash Index (Adaptive hash indexes) is not the only one that can be specified when the index is created.

It is also important to note that the Heap/memory engine table is lost after the MySQL instance restarts.

In general, the B + Tree index structure is suitable for most scenarios, such as the following: A hash index is more advantageous.

In the heap table, if the stored data repeatability is very low (that is, the cardinality is very large), the column data with the equivalent query-based, no scope query, no sorting, especially suitable for the hash index

For example, this sql:
SELECT ... From t WHERE C1 =?; -Only equivalent query

In most scenarios, there will be a range of query, sorting, grouping and other query features, with B + Tree index on it.

The difference between MySQL's btree index and hash index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.