Guide
There are two B + tree indexes and hash indexes commonly used in MySQL, and we look at the differences between the two index data structures and their different application recommendations.
The difference
Note : First, in the MySQL document, the B + Tree index is actually written as Btree, such as the following:
CREATE TABLE T (
aid int unsigned NOT NULL auto_increment,
userid int unsigned NOT NULL default 0,
Username varchar (+) NOT null default ' ',
Detail varchar (255) NOT null default ' ',
Primary key (AID),
Unique key (UID) USING BTREE,
Key (username) USING BTREE - Here the uname column creates only a partial index of the leftmost 12 character length
) Engine=innodb;
A classic B + Tree index data structure is shown in:
(Image from Network)
The B + Tree is a balanced multi-fork tree, where the difference in height from the root node to each leaf node is no more than 1, and pointers are linked to each other at the same level.
In the B + Tree of the regular search, from the root node to the leaf node of the searching efficiency is basically the same, there will be no large fluctuations, and index-based sequential scanning can also be used to quickly move around the two-way pointer, the efficiency is very high.
Therefore, the B + Tree index is widely used in databases, file systems and other scenarios. By the way, one of the reasons the Xfs file system is much more efficient than EXT3/EXT4 is that its file and directory index structures all use B + Tree indexes, while the EXT3/EXT4 file directory structure uses linked list, hashed B-tree, extents/ Index data structures such as bitmap, so their IOPS capability is inferior to XFS under high I/O pressure.
Details can be found in:
Https://en.wikipedia.org/wiki/Ext4
Https://en.wikipedia.org/wiki/XFS
The hash index is the same:
(Image from Network)
Simply put, the hash index is to use a certain hashing algorithm , the key value into a new hash value, the retrieval does not need to resemble a B + tree from the root node to the leaf node to search, only once the hash algorithm can immediately locate the corresponding position, very fast.
From the above figure, the obvious difference between the B + Tree index and the hash index is:
if the query is equivalent, then the hash index obviously has an absolute advantage , because it only needs to go through an algorithm to find the corresponding key value; Of course, the premise is that the key value is unique. If the key value is not unique, you need to find the location of the key, and then follow the list back to scan, until the corresponding data is found;
It can also be seen, if the scope of query retrieval, when the hash index is useless , because the original is an ordered key value, after the hashing algorithm, it is possible to become discontinuous, there is no way to use the index to complete the scope of query retrieval;
Similarly, the hash index does not use the index to complete the sorting , and like ' xxx% ' such a partial fuzzy query (this part of the fuzzy query, in fact, is also the scope of query);
The hash index also does not support the leftmost matching rule for multiple-column federated indexes ;
B + Tree Index keyword retrieval efficiency is relatively average, not as large as the magnitude of the wave, in the case of a large number of duplicate key values, the efficiency of the hash index is very low, because there is a so-called hash collision problem .
Postscript
In MySQL, only the Heap/memory engine table can explicitly support hash indexes (NDB is also supported, but this is not commonly used), and the InnoDB engine's Adaptive Hash Index (Adaptive hash indexes) is not the only one that can be specified when the index is created.
It is also important to note that the Heap/memory engine table is lost after the MySQL instance restarts.
In general, the B + Tree index structure is suitable for most scenarios, such as the following: A hash index is more advantageous.
In the heap table, if the stored data repeatability is very low (that is, the cardinality is very large), the column data with the equivalent query-based, no scope query, no sorting, especially suitable for the hash index
For example, this sql:
SELECT ... From t WHERE C1 =?; -Only equivalent query
In most scenarios, there will be a range of query, sorting, grouping and other query features, with B + Tree index on it.
MySQL B + Tree index and hash index differences