MySQL B + Tree index and hash index differences

Last Update:2016-07-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Guide

There are two B + tree indexes and hash indexes commonly used in MySQL, and we look at the differences between the two index data structures and their different application recommendations.

The difference

Note : First, in the MySQL document, the B + Tree index is actually written as Btree, such as the following:

CREATE TABLE T (
aid int unsigned NOT NULL auto_increment,
userid int unsigned NOT NULL default 0,
Username varchar (+) NOT null default ' ',
Detail varchar (255) NOT null default ' ',
Primary key (AID),
Unique key (UID) USING BTREE,
Key (username) USING BTREE - Here the uname column creates only a partial index of the leftmost 12 character length
) Engine=innodb;

A classic B + Tree index data structure is shown in:

(Image from Network)

The B + Tree is a balanced multi-fork tree, where the difference in height from the root node to each leaf node is no more than 1, and pointers are linked to each other at the same level.

In the B + Tree of the regular search, from the root node to the leaf node of the searching efficiency is basically the same, there will be no large fluctuations, and index-based sequential scanning can also be used to quickly move around the two-way pointer, the efficiency is very high.

Therefore, the B + Tree index is widely used in databases, file systems and other scenarios. By the way, one of the reasons the Xfs file system is much more efficient than EXT3/EXT4 is that its file and directory index structures all use B + Tree indexes, while the EXT3/EXT4 file directory structure uses linked list, hashed B-tree, extents/ Index data structures such as bitmap, so their IOPS capability is inferior to XFS under high I/O pressure.

Details can be found in:

Https://en.wikipedia.org/wiki/Ext4
Https://en.wikipedia.org/wiki/XFS

The hash index is the same:

(Image from Network)

Simply put, the hash index is to use a certain hashing algorithm , the key value into a new hash value, the retrieval does not need to resemble a B + tree from the root node to the leaf node to search, only once the hash algorithm can immediately locate the corresponding position, very fast.

From the above figure, the obvious difference between the B + Tree index and the hash index is:

if the query is equivalent, then the hash index obviously has an absolute advantage , because it only needs to go through an algorithm to find the corresponding key value; Of course, the premise is that the key value is unique. If the key value is not unique, you need to find the location of the key, and then follow the list back to scan, until the corresponding data is found;
It can also be seen, if the scope of query retrieval, when the hash index is useless , because the original is an ordered key value, after the hashing algorithm, it is possible to become discontinuous, there is no way to use the index to complete the scope of query retrieval;
Similarly, the hash index does not use the index to complete the sorting , and like ' xxx% ' such a partial fuzzy query (this part of the fuzzy query, in fact, is also the scope of query);
The hash index also does not support the leftmost matching rule for multiple-column federated indexes ;
B + Tree Index keyword retrieval efficiency is relatively average, not as large as the magnitude of the wave, in the case of a large number of duplicate key values, the efficiency of the hash index is very low, because there is a so-called hash collision problem .

Postscript

In MySQL, only the Heap/memory engine table can explicitly support hash indexes (NDB is also supported, but this is not commonly used), and the InnoDB engine's Adaptive Hash Index (Adaptive hash indexes) is not the only one that can be specified when the index is created.

It is also important to note that the Heap/memory engine table is lost after the MySQL instance restarts.

In general, the B + Tree index structure is suitable for most scenarios, such as the following: A hash index is more advantageous.

In the heap table, if the stored data repeatability is very low (that is, the cardinality is very large), the column data with the equivalent query-based, no scope query, no sorting, especially suitable for the hash index

For example, this sql:
SELECT ... From t WHERE C1 =?; -Only equivalent query

In most scenarios, there will be a range of query, sorting, grouping and other query features, with B + Tree index on it.

MySQL B + Tree index and hash index differences

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL B + Tree index and hash index differences

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MySQL B + Tree index and hash index differences

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support