MySQL btree and hash index applicable scenarios and limitations

Source: Internet
Author: User

Btree Index:

If there is no particular type, most of it is the btree index, which uses the btree data structure to store it, and most MySQL engines support this index,archive engine is an exception ,before 5.1 this engine does not support any indexes, and5.1 supports single-column self-increment indexes. InnoDB using b+tree=btree (btree is no longer in use)

The storage engine uses Btree indexes in different ways, each with its own merits, such as : MyISAM uses prefix compression technology to make the index smaller (but may also result in reduced Connection Table query performance), but InnoDB is stored in the original data format. As another example: theMyISAM index references the row being indexed by the physical location of the data, and InnoDB references the row being indexed based on the primary key.

Btree usually means that all values are stored in order, and each leaf page is the same distance from the root

, the InnoDB index work,MyISAM uses a different structure, but the basic idea is similar:

Images from high-performance MySQL third edition

The Btree index speeds up access to data because the storage engine no longer needs to perform a full table scan to get the data needed, instead of searching from the root node of the index, which holds pointers to child nodes in the slot of the root node, and the storage engine looks down the bottom layer based on these pointers. By comparing the values of the node pages and the values you are looking for, you can find the appropriate pointers to the next level of child nodes, which actually define the upper and lower limits of the values in the child node pages, and the final storage engine either finds the corresponding value or the record does not exist.

Leaf nodes are special, their pointers point to the data being indexed, not to other node pages (different engine pointer types), in fact, there may be many layers of node pages between the root node and the leaf node, the depth of the tree and the size of the table is directly related.

btree Tree index columns are stored sequentially, so it is well suited to look up range data, such as

There are tables:

CREATE TABLE people (last_name varchar () not null,first_name varchar (a) Not Null,dob date not null,gender enum (' m ', ' F ') Not Null,key (LAST_NAME,FIRST_NAME,DOB));

For each row of data in the table, the index contains the value of the last_name,first_name,dob column, showing how the index organizes the storage of the data:

Images from high-performance MySQL third edition

Note: The index sorts multiple values by the order of the columns when the index is defined in the CREATE TABLE statement, with the names of the last two values being the same, sorted by date of birth.

You can use btree indexed query types,btree indexes used for full-key values, key-value ranges, or key-prefix lookups, where key-prefix lookups are only appropriate for lookups based on the leftmost prefix. The Multi-column index created in the previous example is valid for queries of the following type:

A: Full Value matching

Full-value matching refers to matching all columns in the index and can be used to find names and date of birth

B: match the leftmost prefix

such as: Find only last name, that is, use only the first column of the index

C: Match column prefix

You can also match only the beginning of a column's value, such as: a person who matches a last name starting with J, and this is just the first column of the index, and is part of the first column

D: Match Range value

If you look for a person whose surname is between Allen and Barrymore, only the first column of the index is used here.

E: Exact Match of a column and range matches another column

If you find all surname Allen, and the first name letter is K, that is, last_name exact match, the second column first_name range matches

F: Query that accesses the index only

Btree can usually support queries that only access the index, that is, the query only needs to access the index, without having to access the data rows, that is, the concept of overriding the index. The data that needs to be accessed is obtained directly from the index.

Because the nodes in the index tree are ordered, the index can also be used for the order by operation in the query in addition to lookup by value, and in General, if Btree can be found in some way, then it can be used for sorting, so if the order The By clause satisfies the various query types listed earlier, and the index can also meet the corresponding ordering requirements.

Here are the limitations on the Btree index:

A: You cannot use an index if you do not start by the leftmost column of the index (note that this is not the order of where conditions, that is, the where condition, regardless of the condition order, as long as The columns that appear in the where can be used to multi-column indexes in multiple-column indexes that can be left -most coherent.

B: You cannot skip the columns in the index, such as: The query condition is the last name and date of birth, skipping the FirstName column, so that the multicolumn index can only use the last Name column

C: If the query has a range query for a column, none of its right columns can use the index-optimized query, such as:where last_name=xxx and first_name like ' xxx% ' and dob= ' xxx '; The _name column can use the index, and the DOB column after this column cannot use the index.

Hash index:

Based on a hash table implementation, only queries that accurately match all columns of the index are valid, and for each row of data, the storage engine computes a hash code for all indexed column values, the hash code is a small value, and the hash code for different key values is computed differently, and the hash index stores all the hash codes in the index. Also holds a pointer to each data row in the hash table.

In MySQL, only the memory engine explicitly supports the hash index, which is also the default index type of memory engine table,memory also supports btree, it is worth mentioning that The memory engine supports non-unique hash indexes. In the database world is quite different, if the hash value of multiple columns is the same, the index will be in the form of a list of multiple record pointers to the same hash entry.

Example:

Mysql> create table testhash (Fname varchar ()  not null,lname varchar (50 )  not null,key using hash (fname))  engine=memory; query ok, 0 rows affected  (0.01 sec)  mysql> insert into  Testhash values (' Arjen ', ' Lentz '), (' Baron ', ' Schwartz '), (' Peter ', ' Zaitsev '), (' Vadim ', ' Tkachenko '); query ok, 4 rows affected  (0.00 sec) records: 4  duplicates:  0  warnings: 0 mysql> select * from testhash;+-------+-----------+|  fname | lname     |+-------+-----------+| arjen |  lentz     | |  baron | schwartz  | |  peter | zaitsev   | |  vadim | tkachenko |+-------+-----------+4 rows in set  (0.00 sec)   Assuming that the index uses an imaginary hash function f (), it returns the following value: F (' ARJEn ') =2323f (' Baron ') =7437f (' Peter ') =8784f (' Vadim ') =2458  the data structure of the hash index is as follows: Slot:         value: 2323         Pointer to line 1th 2458         Pointer to 4th line 7437         Pointer to line 2nd 8784 &N Bsp       Pointer to line 3rd

The number of each slot is sequential, but the data rows are not sequential . Here's a look at the query:

Select LName from Testhash where fname= ' Peter ';

 

MySQL calculates the hash value of Peter first and uses that value to find the corresponding record pointer, because f (' Peter ') = 8784, so MySQL looks for 8784 in the index and can find a pointer to the third row, The final step is to compare whether the value of the third row is Peter, to ensure that it is the row you are looking for. Because the index itself only needs to store the corresponding hash value, the structure of the index is very compact, which also makes the hash index lookup very fast, however, the hash index is also limited, as follows:

A: The hash index contains only the hash and row pointers, not the field values, so you cannot use the values in the index to avoid reading rows (that is, you cannot use a hash index for overwriting index scans), but accessing rows in memory is fast (because the memory engine's data is kept in RAM), So the impact on performance is not obvious in most cases.

B: Hash index data is not stored in the order of the values of indexed columns, so it cannot be used for sorting

C: The hash index also does not support partial indexed column matching lookups, because the hash index always computes the hash value using the contents of all the column values of the index. For example,if a hash index is established on a data column (a, b), the index cannot be used if only data column A is queried.

D: Hash index only supports equivalent comparison queries, such as:=,in (),<=> (note that<> and <=> are different operations ), does not support any scope query (must be given a specific The Where condition value to calculate the hash value, so the range query is not supported).

E: Access to hash index data is very fast, unless there are many hash conflicts, when a hash conflict occurs, the storage engine must traverse all the row pointers in the linked list, compare rows by row, until all rows that match the criteria are found.

F: If there is a lot of hash conflicts, some index maintenance operations are expensive, such as: If a hash index (that is, a column of many duplicate values) is established on a very low-selectivity column, when a row is deleted from the table, the storage engine needs to traverse each row in the linked list of corresponding hashes to find and delete the corresponding reference. The greater the cost.

From what is described above, the hash index is only suitable for certain scenarios, and once the hash index is appropriate, the performance gains are noticeable, except that the NDB engine supports a unique hash index in addition to the memory engine,and the NDB storage engine is very special, but it is not discussed here.

The InnoDB engine has a special function called an adaptive hash Index, and when InnoDB notices that certain index values are used very frequently, it creates a hash index on top of the Btree index in memory, which allows Btree Index also has some advantages of hash index, such as: fast hash lookup, which is a fully automatic, internal behavior, the user cannot control or configure, but if necessary, you can choose to turn off this function (innodb_adaptive_hash_index= OFF, default is on ).

Scenarios and limitations for MySQL btree and hash indexing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.