Schema optimization and indexing

Last Update:2017-02-27 Source: Internet

Author: User

Tags hash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hash index

The hash index is based on hash table supremacy, and it is only valid for exact lookups, that is, you must look up each column on the index. For each row, the storage engine calculates the hash code for the indexed column. The hash code is very small, and the hash code is not the same for different values for other rows. The hash code is stored in the index and a pointer is stored in the hash table pointing to each row.

In MySQL, only the memory storage engine supports an explicit hash index. Although memory tables can also use B-tree indexes, they are the default index type, which is a hash index. The memory engine supports a non unique hash index, which is unusual in the database domain. If multiple values have the same hash code, the index is in the same hash table's entity, using a linked list to hold the pointer to the row that the values belong to.

Let's say an example, assuming the table structure is as follows

CREATE TABLE testhash ( fname VARCHAR(50) NOT NULL, lname VARCHAR(50) NOT NULL, KEY USING HASH(fname) ) ENGINE=MEMORY;

Contains the following data:

Suppose we have a hash function called F (), which returns the following values (these are examples, not real values):

f('Arjen') = 2323 f('Baron') = 7437 f('Peter') = 8784 f('Vadim') = 2458

The data structure of the index is

Slot Value 2323 Pointer to row 1 2458 Pointer to row 4 7437 Pointer to row 2 8784 Pointer to row 3

Note that this slot is orderly. But the line is not orderly. Now we execute the following query

Mysql> SELECT lname from Testhash WHERE fname= ' Peter ';

MySQL calculates the hash value of Peter and uses it to find pointers in the index. Because f (peter) = 8784. MySQL looks for 8784 in the index and finds its pointer, pointing to Line 3. The final step is to compare the value of row 3 with Peter. To determine if the row is the correct one.

Because the index itself stores a relatively short hash value, the hash index is compressed. The length of the hasn value and the type of the column are not related. The length of a hash index created for a tinyint is the same as the length of a very large column type.

So the lookup is very lightweight and fast. However, the hash index has the following limitations:

Because the index contains only the hash value and the pointer to the row, not the data itself, MySQL does not use the values in the index to avoid reading rows. Fortunately, accessing the rows in memory is very fast, so you generally do not degrade performance.

MySQL cannot use a hash index on the sort. Because they do not store the sorted rows.

The hash index does not support matching of partial keys because they calculate the hash value of the entire index value. So if you do hash index (A,B) and your query queries only with a condition of a, the hash index is invalidated.

The hash index only supports some comparisons that are equal. For example, =,in (),<=> operation is OK. (Note that <> and <=> are not the same). They cannot speed up the range query. Like where price>100.

Accessing data in a hash index is very fast, unless there are some conflicts (many values have the same hash value). When a conflict occurs, the storage engine must find the row pointed to by the pointer in each linked list, and compare the values of those rows to the values to be looked up until the correct row is found.

If there are many hash value conflicts, then the maintenance of the index will be very slow. For example, if you create an index on a column and the index value conflicts a lot, then you delete a row from the table, and the cost of finding the correct pointer from the index is very large. The storage engine finds a pointer in the linked list with each row of the same hash value and then finds the reference to delete the row.

These restrictions result in the use of a hash index only in exceptional cases. However, if they meet the needs of the application, the performance will be a fantastic upgrade. One example is about the Data Warehouse, and the classic star Shema needs a lot of connections to find the table. The hash index can accurately find which table is required.

In addition to the memory storage engine is a hash index, the NDB cluster storage engine also supports a unique hash index. This feature is focused on the NDB cluster index, which is beyond the scope of this book.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More