MySQL in the btree and Hash index more detailed

Last Update:2017-01-13 Source: Internet

Author: User

Tags character set hash mysql in

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL's most common index structure is btree (O (n)), but there are always situations where we want to use other types of indexes for better performance. Hash is one of the options, for example, when we retrieve the user ID by user name, they are always one-to-one relationships, the operator used is only =, if the use of hash as the index data structure, time complexity can be reduced to O (1). Unfortunately, in the current version of MySQL (5.6), the hash only supports the memory and NDB two kinds of engines, while our most common InnoDB and MyISAM do not support hash type indexes.

Anyway, to understand the difference between the two indexes, the following translation from the MySQL official website document on the two explanations.

B-tree Index Features

The B-tree index can be used on comparison operators like =,>,>=,<,<= and between. It can also be used with the LIKE operator as long as its query condition is a constant that does not begin with a wildcard character. You can use indexes like the following statements:

The code is as follows	Copy Code
SELECT * from Tbl_name WHERE key_col like ' patrick% '; SELECT * from Tbl_name WHERE key_col like ' pat%_ck% ';

The following two scenarios do not use indexes:

The code is as follows	Copy Code
SELECT * from Tbl_name WHERE key_col like '%patrick% '; SELECT * from Tbl_name WHERE key_col like Other_col;

The first is because it starts with a wildcard character and the second is because the constants are not used.

If you use ... Like '%string% ' and string more than three characters, MySQL uses the turbo Boyer-moore algorithm algorithm to initialize the query expression, and then use this expression to make the query faster.

One such query, Col_name is NULL, can be indexed with col_name.

Any index that does not overwrite all where and level conditions will not be used. In other words, to use an index, the first column in the index needs to appear in each and group.

The following where condition uses the index:

The code is as follows	Copy Code
... WHERE index_part1=1 and index_part2=2 and other_column=3 /* index = 1 OR index = 2 / ... WHERE index=1 OR a=10 and index=2 / Optimized to "index_part1= ' Hello '" * * ... WHERE index_part1= ' Hello ' and index_part3=5 /* can use INDEX1 Index but will not use INDEX2 and index3 * * ... WHERE index1=1 and index2=2 or index1=3 and index3=3;

The following where condition does not use an index:

The code is as follows

Copy Code

/* Index_part1 not used to * *
... WHERE Index_part2=1 and index_part3=2

/* Index index does not appear in each WHERE clause * *
... WHERE index=1 OR a=10

/* No index covering all columns * *
... WHERE index_part1=1 OR index_part2=10

Sometimes MySQL does not use an index, even if it is available. For example, when the MySQL estimate uses the index to read most of the row data. (In this case, a full table scan can be faster than using an index because it requires less retrieval). However, if the statement uses limit to qualify the number of rows returned, MySQL uses the index. Because it is more efficient to use indexes when there are fewer rows of results.

Hash Index Features
The index of the hash type has some characteristics that differ from those described above:

They can only be used for peer-to-peer comparisons, such as = and <=> operators (but much faster). They cannot be used for range query conditions such as <. If the system only needs to use a storage structure such as "key Pair", use the hash type index as much as possible.

The optimizer cannot use a hash index to accelerate the order by operator. (Such indexes cannot be used to search for values in the next order)

MySQL cannot determine the number of data between two values (this requires using the range query operator to determine which index to use). If you convert a MyISAM table to a memory table that relies on a hash index, some statements (performance) may be affected.

Only the full key can be used to search a row of data. (If you use the B-tree index, any fragment of a key can be used to find it.) I think it might mean that the wildcard like operator doesn't work.

Hash index is efficient, but the hash index itself has brought many limitations and drawbacks because of its particularity, mainly have the following.

(1) Hash index can only satisfy "=", "in" and "<=>" query, can not use range query.

Because the hash index is compared to the hash after the hash value, so it can only be used for equivalent filtering, can not be used for filtering based on the scope, because after the corresponding hash algorithm to deal with the size of the hash value of the relationship, and can not guarantee the same as the hash operation before.

(2) Hash index cannot be used to avoid sorting operations of data.

Because the hash index is stored in the hash after the hash value, and the size of the hash value is not necessarily the same as the hash operation before the key value is exactly the same, so the database can not use the index data to avoid any sort operation;

(3) Hash index can not use the partial index key query.

For the combined index, the hash index calculates the hash value when the combination index key merges and then calculates the hash value together, instead of calculating the hash value separately, the hash index can not be used when the query is made by the first or several index keys of the combined index.

(4) Hash index can not avoid table scan at any time.

As already known before, hash index is the key through the hash operation, the hash of the results and the corresponding row pointer information stored in a hash table, because of the same hash value of different key, so even if the data to meet a hash key value of the number of records, can not To complete the query directly from the Hash index, or to compare it by accessing the actual data in the table, and get the corresponding results.

(5) When a hash index encounters a large number of hash values, performance is not necessarily higher than the B-tree index.

For index keys with a low selectivity, if a hash index is created, there will be a large number of record pointer information that is associated with the same Hash value. This will be very cumbersome to locate a record, will waste a lot of table data access, resulting in poor overall performance.

Postscript

By the way, there are some problems encountered in using MySQL:

Sometimes using a script to migrate data will encounter garbled problem, even if the table character set into UTF8 also useless, this time before executing SQL add a set names UTF8 can.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More