High-performance MySQL (5) create a High-Performance Index HASH Index

Source: Internet
Author: User
Tags crc32
The hash index is implemented based on the hash table. It is only valid for queries of all columns that precisely match the index. For each row of data, the storage engine calculates a hash code for all index columns, calculate the rows with different key values

The hash index is implemented based on the hash table. It is only valid for queries of all columns that precisely match the index. For each row of data, the storage engine calculates a hash code for all index columns, calculate the rows with different key values

The hash index is implemented based on the hash table. It is only valid for queries of all columns that precisely match the index. For each row of data, the storage engine calculates a hash code for all index columns, the hash code calculated by rows with different key values is also different. The hash code is stored in the hash index, and the hash table stores the pointer to each data.

1. Memory engine supports hash indexes, B-Tree indexes, and non-unique hash indexes. If the hash values of multiple columns are the same, indexes store multiple record pointers to the same hash entry in the form of a linked list, which is special.

Example:

Create table 'testhash' ('fname' varchar (50) not null, 'lname' varchar (50) not null, KEY 'fname' ('fname') using hash) ENGINE = memory default charset = utf8 |

Assume that the index uses f () to generate the hash code as follows:

F ('arjen') = 2323

F ('baron') = 7437

F ('Peter ') = 8784

F ('vadim ') = 2458


The data structure of the hash index is as follows:

Slot
Value

2323
Pointing to 1st rows pointer

2458
Pointing to 4th rows pointer

7437
Pointing to 2nd rows pointer

8784
Pointing to 3rd rows pointer

Note that the hash code is ordered, but the data row is not.

When querying

Select * from testhash where fname = 'Peter ';

First calculate the hash code, then find the 3rd rows pointer, and finally compare the value of the 3rd rows to determine whether it is the row to be searched.

2. Hash index restrictions:

A. Hash indexes only contain hash codes and row pointers, and do not store Field Values. Therefore, the values in indexes cannot be used to avoid reading rows.

B. Hash index data is not stored in the order of index values, so it cannot be used for sorting.

C. The hash index does not support matching of some index columns. All index columns must be used because the hash value is calculated by all index columns.

D. the hash index only supports equivalent comparison queries, including =, in (), and <=> (Security comparison) when the comparison contains null. Hash does not support any range query, for example, where price> 100

E. The hash index is very fast. unless there is a hash conflict (different index values have the same hash value), the engine must match all rows in the traversal table.

F. When there are many hash conflicts, for example, when the same value on the column is large, the index maintenance cost will be relatively high.

The InnoDB engine has a special function called "adaptive hash Index", which can be implemented internally or disabled.


3. Create a custom hash Index

If the storage engine does not support hash indexes, you can create a pseudo hash index based on B-Tree. This is not the same as the real hash index. We still use B-Tree for search. We only use the hash value of the key value instead of the key value for index search, you only need to manually specify the hash function in where.

Example:

If you need to store a large number of URLs and search by URL, if you use B-Tree to index the URL, the stored content will be large. For example, the following query

Select * from url where url = "http://www.baidu.com ";

If you delete the original URL column index and add an indexed field url_crc, you can use the following query to use crc32 for hash.

Select * from url where url_crc = crc32 ("http://www.baidu.com") and url = "http://www.baidu.com ";

In this way, the performance will be high.

This defect is the need to maintain the hash value. You can use triggers for maintenance.

Create a table

Create table 'pseudo hash' ('id' int (10) unsigned not null AUTO_INCREMENT, 'url' varchar (255) not null, 'url _ crc 'int (10) unsigned not null default '0', primary key ('id') ENGINE = InnoDB default charset = utf8;

Create a trigger

// Insert delimiter $ create trigger pseudo dohash_crc_ins before insert on Pseudo hash for each row begin set NEW. url_crc = crc32 (NEW. url); end; $ // update create trigger pseudo dohash_crc_upd before update on Pseudo dohash for each row begin set NEW. url_crc = crc32 (NEW. url); end; $ delimiter;

Avoid using too long hash functions as much space as possible. Unless a large number of conflicts occur, you can consider implementing a simple 64-bit hash function by yourself. A simple method is to use MD5 () to return some values.

It is worth noting that:

When you use a hash index for query, you must keep up with the matching of rul in the where clause. Once a hash conflict occurs, the truly queried value will help match the real row.



This article is from "phper-a little bit every day ~" Blog, please keep this source

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.