Comparison between B-tree and Hash indexes in Mysql

Source: Internet
Author: User

The common index in mysql is btress, but sometimes we also use hash indexes. Next I will analyze the comparison between B-tree and Hash indexes for your reference.

The most common index structure of mysql is btree (O (log (n), but there are always some situations where we want to use other types of indexes for better performance. Hash is one of the options. For example, when we retrieve user IDs through user names, they always have a one-to-one relationship. The operator used is only =. If we use hash as the index data structure, the time complexity can be reduced to O (1 ). Unfortunately, in the current mysql version (5.6), hash only supports MEMORY and NDB engines, while our most commonly used INNODB and MYISAM do not support hash indexes.

In any case, you still need to understand the differences between the two indexes, which are translated from the mysql official documentation below.

B-Tree index features

B-Tree indexes can be used on comparison operators such as =,>, >=, <, <=, and. It can also be used for the LIKE operator as long as its query condition is a constant that does not start with a wildcard. Indexes can be used as follows:

The Code is as follows: Copy code
SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick % ';
SELECT * FROM tbl_name WHERE key_col LIKE 'pat % _ ck % ';

 

In the following two cases, indexes are not used:

The Code is as follows: Copy code
SELECT * FROM tbl_name WHERE key_col LIKE '% Patrick % ';
SELECT * FROM tbl_name WHERE key_col LIKE other_col;

 

The first is because it starts with a wildcard, and the second is because no constant is used.

If you use... LIKE '% string %' and the string contains more than three characters. MYSQL uses the Turbo Boyer-Moore algorithm to initialize the query expression, and then uses this expression to make the query faster.

For such a query, col_name is null can use the index of col_name.

Any index that does not cover all where and-level conditions will not be used. That is to say, to use an index, the first column of the index must appear in each AND group.

The following WHERE condition uses an index:

The Code is as follows: Copy code

... WHERE index_part1 = 1 AND index_part2 = 2 AND other_column = 3
/* Index = 1 OR index = 2 */
... WHERE index = 1 or a = 10 AND index = 2
/* Optimized to "index_part1 = 'hello '"*/
... WHERE index_part1 = 'hello' AND index_part3 = 5
/* Indexes of index1 can be used, but index2 and index3 are not used */
... WHERE index1 = 1 AND index2 = 2 OR index1 = 3 AND index3 = 3;

 

The following WHERE condition does not use an index:

The Code is as follows: Copy code

/* Index_part1 is not used */
... WHERE index_part2 = 1 AND index_part3 = 2

/* Index does not appear in each where clause */
... WHERE index = 1 or a = 10

/* No indexes cover all columns */
... WHERE index_part1 = 1 OR index_part2 = 10

 

Sometimes mysql does not use indexes even if they are available. For example, when mysql estimates that the index will read most of the row data. (In this case, a full table scan may be faster than using an index because it requires less retrieval ). However, if the statement uses LIMIT to LIMIT the number of returned rows, mysql uses an index. Because the efficiency of using indexes is higher when the number of results rows is small.

Hash index features
Hash indexes are different from the features described above:

They can only be used for peer-to-peer comparison, such as the = and <=> operators (but much faster ). They cannot be used for Range Query conditions such as <. If the system only needs to use a storage structure like "key-value pairs", try to use hash indexes.

The optimizer cannot use hash indexes to accelerate the order by operator. (These indexes cannot be used to search for values in the next order)

Mysql cannot determine how many data entries exist between two values (this requires the range query operator to determine which index to use ). If you convert a MyISAM table into a MEMORY table that relies on hash indexes, some statements (performance) may be affected ).

Only the complete key can be used to search a row of data. (If the B-tree index is used, any segment of a key can be used for searching. I think it may mean that the LIKE operator with wildcard characters does not work ).


Hash indexes are highly efficient. However, due to their particularity, Hash indexes also impose many restrictions and drawbacks, mainly including the following.

(1) The Hash index only supports "=", "IN" and "<=>" queries, and does not support range queries.

Because the Hash Index compares the Hash value after Hash calculation, it can only be used for equivalent filtering and cannot be used for range-based filtering, because the relationship between the size of Hash values processed by the corresponding Hash algorithm cannot be exactly the same as that before the Hash operation.

(2) Hash indexes cannot be used to avoid data sorting.

Hash indexes store Hash values after Hash calculation, and the relationship between Hash values is not necessarily the same as that before Hash calculation, therefore, the database cannot use the index data to avoid any sort operations;

(3) Hash indexes cannot be queried using some index keys.

For a composite index, when calculating the Hash value, the Hash value is calculated after the composite index is bonded, instead of separately calculating the Hash value, therefore, when one or more index keys are used to query a combined index, the Hash index cannot be used.

(4) Hash indexes cannot avoid table scanning at any time.

As we already know, the Hash index stores the Hash value of the Hash operation result and the row pointer information corresponding to the index key in a Hash table, because different index keys have the same Hash value, the query cannot be completed directly from the Hash index even if the number of records that meet the Hash key value is obtained, you still need to compare the actual data in the Access Table and obtain the corresponding results.

(5) When the Hash index encounters a large number of equal Hash values, the performance is not necessarily higher than the B-Tree index.

For low-selectivity index keys, if a Hash index is created, a large amount of Record Pointer information is stored in the same Hash value. In this way, it will be very troublesome to locate a record, which will waste multiple table data accesses, resulting in low overall performance.

Postscript

By the way, record some problems encountered during mysql usage:

Sometimes you may encounter garbled characters when using scripts to migrate data. Even setting the table character set to utf8 does not help. In this case, add a set names utf8 statement before executing the SQL statement.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.