High-performance MySql Darwin (6): Introduction to the principles and features of common index types

Last Update:2018-06-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

High-performance MySql evolution (6): the principles and features of common index types are well known. The impact of indexes on database performance is crucial, but why can indexes improve query efficiency, and the types and features of indexes may not be very clear. This article will give a brief introduction to common index types and features. 1. Why should we use indexes first?

High-performance MySql evolution (6): Introduction to the principles and features of common index types

As we all know, the impact of indexes on database performance is crucial, but it may not be clear why indexes can improve query efficiency and the types and characteristics of indexes, this article will give a brief introduction to common index types and features

1. Why index?

First, let's explain why indexes can improve query efficiency. The common Query Process is usually to get the expected results through the whole table scan. If the table has many records, the query efficiency will be very slow. Indexes increase the efficiency by minimizing the number of records scanned. Different types of indexes usually adopt different policies to reduce the number of records scanned, the specific policy will be described later.

First, let's look at a simple example to illustrate the role of the index.

In this example, a dictionary table containing about 100,000 entries is used to compare whether the query time of the index is included.

mysql> select id,word, mean from dictionary where mean='DEFAULT2';+--------+--------+----------+| id     | word  | mean     |+--------+--------+----------+| 110003 |Random | DEFAULT2 |+--------+--------+----------+1 row inset (0.05sec)mysql> select id,word, mean from dictionary where word='Random';+--------+--------+----------+| id     | word  | mean     |+--------+--------+----------+| 110004 |Random | DEFAULR# || 110003 |Random | DEFAULT2 |+--------+--------+----------+2 rows inset (0.00sec)

Next let's take a look at why there is a difference in time. We can see from the execution plan that the first statement executes a full table scan and queries 110486 records to get the expected results, the second statement uses indexes and retrieves only two records to get the desired results. This shows the principle of index acceleration: reducing the number of scanned records during query.

mysql> explain select id,word, mean from dictionary wheremean='DEFAULT2';+----+-------------+------------+-------+---------------+------+---------+------+--------+--------------------------+| id |select_type | table      | type  | possible_keys | key  | key_len | ref  | rows  | Extra                    |+----+-------------+------------+-------+---------------+------+---------+------+--------+--------------------------+|  1 | SIMPLE      | dictionary | All | NULL          | word | 135     | NULL | 110486 | Using where; Usingindex |+----+-------------+------------+-------+---------------+------+---------+------+--------+--------------------------+1 row inset (0.00 sec)mysql> explain select id,word, mean from dictionary where word='Random';+----+-------------+------------+------+---------------+------+---------+-------+------+--------------------------+| id |select_type | table      | type |possible_keys | key  | key_len | ref   | rows | Extra                    |+----+-------------+------------+------+---------------+------+---------+-------+------+--------------------------+|  1 | SIMPLE      | dictionary | ref  | word          | word | 102     | const |    2| Usingwhere; Using index |+----+-------------+------------+------+---------------+------+---------+-------+------+--------------------------+1 row inset (0.00 sec)

2. Index type

In most RDBMS, the Indexing feature is determined by the storage engine. Different storage engines may adopt different implementation methods for indexing, b-Tree indexes and Hash indexes are two common indexes. The two indexes use different underlying data structures. Therefore, these two indexes have their own characteristics during use.

2.1 B-Tree Index

B-Tree index is a widely used index type. In many databases (ORACLE and MYSQL), it is also used as the default index type, this index uses the B-Tree data structure to store data.

B-tree stores data in order and allows you to query, read, insert, and delete data structures at O (log n) running time. Generally, a node can have a binary search tree with more than two sub-nodes. In B-Tree, internal (non-leaf) nodes can have multiple subnodes within the preset range. When data is inserted or removed from a node, the number of its subnodes changes.

The structure of B-Tree is shown below.

Illustrates the working principle of B-Tree, defines the range of leaf node values in the root node, and stores actual values in the leaves. When searching, the condition value is used to select an appropriate leaf node range in the root node, and then compared with the value of the leaf node in a certain interval of the leaf layer.

For example, the student ID in the student table increases sequentially. In the figure, Key1 is 100, and Key2 is 200. if you want to query a student whose ID is 90, you can search in the leftmost leaf linked list. If you want to query a student whose ID is 130, it will be searched in the leaf linked list in the middle. This query method can greatly improve the efficiency because it avoids full table scanning.

Note that when the B-Tree index is created on multiple fields (for example, when the index is set to LastName, FirstName, or BrithDay), each Key value is LastName, for data structures such as FirstName and Brithday, the matching process of leaf node values is compared according to the Field Sequence defined in the index, therefore, you must use the index in this order. Otherwise, the index will not be correctly used (for example, the Order in the Where condition is Brithday, LastName, and FirstName ).

Because the index data stored in B-Tree is ordered, if Order by is executed on B-Tree indexes, the sorting efficiency will be greatly improved.

The working principle of B-Tree determines that it has good support for the following query methods:

(1) full index matching-the matching condition contains all the fields of the index and matches the order of the fields completely.

(2) match only the first column of the Index

(3) match only the prefix of the first column (right match), for example, "where lastName like Sun %"

(4) search the range in the first column-for example, "where lastName between" Steve "and" Tony"

(5) full match for the first column and prefix match for the second column

(6) The returned value must be a subset of the index, such as select LastName, FristName, Brithday from Student where LastName like "Tony ". because B-Tree contains the required value, therefore, in this case, data access can only occur in B-Tree to avoid access to data tables (Mysql has a special term "Overwrite Index ")

At the same time, the operating principle of B-Tree determines that the index function will be affected when the following query method is used:

(1) The query condition does not start from the first column of the index, for example, where firstname = "Eric" andbirthday = '2017-10-10'

(2) Use columns in the index without order, for example, where lastname = "Tony" andbirthday ="

(3) Some fields whose values are indexed due to fuzzy match, for example, where lastname = 'Tony 'andfirstname like 'Robert %' and birthday = '2017-10-10 ', only the lastname and firstname fields of the index are used here, brithday is blocked by the like operation.

Some Problems in the use of B-Tree indexes are listed above. These problems indicate that the order of fields in the query conditions has a great impact on the use of indexes. Therefore, when designing indexes or query conditions, pay attention to the order of fields. In some cases, multiple indexes with the same field but different order may be created to compensate for this order problem.

2.2 Hash Index

As the name suggests, this type of Index uses a Hash data structure to store indexes. The structure is roughly

During storage, the key is computed using the Hash function to obtain the Hash value of the key. Then, the Hash value is used as a pointer and the database record pointer is bound together. It is very important to select a good Hash function. A good Hash function can evenly distribute the calculated Hash values to reduce conflicts. Only when the conflicts are reduced can the query time of the Hash table be reduced. The query process can be divided into four steps:

(1) generate a Hash value based on the query conditions. For example, create a hash index on name, in the query condition where name = 'John Smith ', the hash value of 'John Smith' is 02.

(2) Use the Hash value of 02 to find the corresponding Bucket in the Hash index table

(3) Use the table pointer contained in the Bucket in step (2) (521-1234) to find a record in the database

(4) because different names may have the same Hash value, the last step needs to compare whether 'johnsmith 'is the same as the name of the database record that has been found. If it is the same, the current record is returned, otherwise, return to step 2, find another data record, and then match until the corresponding record is found.

The particularity of the Hash index structure determines that the retrieval efficiency is very high. Index retrieval can be located at a time, unlike B-Tree indexes that need to go from the root node to the branch node, the Hash index query efficiency is much higher than that of B-Tree indexes.

Many people may have doubts. Since Hash indexes are much more efficient than B-Tree indexes, why do we need to use B-Tree indexes instead of Hash indexes? Everything has two sides. The same is true for Hash indexes. Although Hash indexes are highly efficient, Hash indexes also impose many restrictions and drawbacks due to their particularity.

(1) The Hash index only supports "=", "IN" and "<=>" queries, and does not support range queries.
Because the Hash Index compares the Hash value after Hash calculation, it can only be used for equivalent filtering and cannot be used for range-based filtering, because the relationship between the size of Hash values processed by the corresponding Hash algorithm cannot be exactly the same as that before the Hash operation.

(2) Hash indexes cannot be used to avoid data sorting.
Hash indexes store Hash values after Hash calculation, and the relationship between Hash values is not necessarily the same as that before Hash calculation, therefore, the database cannot use the index data to avoid any sort operations;

(3) Hash indexes cannot be queried using some index keys.
For a composite index, when calculating the Hash value, the Hash value is calculated after the composite index is bonded, instead of separately calculating the Hash value, therefore, when one or more index keys are used to query a combined index, the Hash index cannot be used.

(4) Hash indexes cannot avoid table scanning at any time.
As we already know, the Hash index stores the Hash value of the Hash operation result and the row pointer information corresponding to the index key in a Hash table, because different index keys have the same Hash value, the query cannot be completed directly from the Hash index even if the number of records that meet the Hash key value is obtained, you still need to compare the actual data in the Access Table and obtain the corresponding results.

(5) When the Hash index encounters a large number of equal Hash values, the performance is not necessarily higher than the B-Tree index.
For low-selectivity index keys, if a Hash index is created, a large amount of Record Pointer information is stored in the same Hash value. In this way, it will be very troublesome to locate a record, which will waste multiple table data accesses, resulting in low overall performance.

It is worth mentioning that the default index type of most database management systems is B-Tree (Oracle, Mysql-InnoDB). Therefore, if you want to use Hash indexes, it must be set as a Hash index. Many Smart Data storage engines (such as Mysql's InnoDB) use an adaptive Hash index to improve query efficiency, this mechanism works when the storage engine uses the B-Tree Index type, if it finds that the value of an index is frequently retrieved, the storage engine automatically treats the value as a Hash to improve the efficiency of B-Tree.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

High-performance MySql Darwin (6): Introduction to the principles and features of common index types

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

High-performance MySql Darwin (6): Introduction to the principles and features of common index types

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support