Hash index
The hash index is based on hash table supremacy, and it is only valid for exact lookups, that is, you must look up each column on the index. For each row, the storage engine calculates the hash code for the indexed column. The hash code is very small, and the hash code is not the same for different values for other rows. The hash code is stored in the index and a pointer is stored in the hash table pointing to each row.
In MySQL, only the memory storage engine supports an explicit hash index. Although memory tables can also use B-tree indexes, they are the default index type, which is a hash index. The memory engine supports a non unique hash index, which is unusual in the database domain. If multiple values have the same hash code, the index is in the same hash table's entity, using a linked list to hold the pointer to the row that the values belong to.
Let's say an example, assuming the table structure is as follows
CREATE TABLE testhash (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
KEY USING HASH(fname)
) ENGINE=MEMORY;
Contains the following data:
mysql> SELECT * FROM testhash;
+--------+-----------+
| fname | lname |
+--------+-----------+
| Arjen | Lentz |
| Baron | Schwartz |
| Peter | Zaitsev |
| Vadim | Tkachenko |
+--------+-----------+
Suppose we have a hash function called F (), which returns the following values (these are examples, not real values):
f('Arjen') = 2323
f('Baron') = 7437
f('Peter') = 8784
f('Vadim') = 2458
The data structure of the index is
Slot Value
2323 Pointer to row 1
2458 Pointer to row 4
7437 Pointer to row 2
8784 Pointer to row 3
Note that this slot is orderly. But the line is not orderly. Now we execute the following query
Mysql> SELECT lname from Testhash WHERE fname= ' Peter ';
MySQL calculates the hash value of Peter and uses it to find pointers in the index. Because f (peter) = 8784. MySQL looks for 8784 in the index and finds its pointer, pointing to Line 3. The final step is to compare the value of row 3 with Peter. To determine if the row is the correct one.
Because the index itself stores a relatively short hash value, the hash index is compressed. The length of the hasn value and the type of the column are not related. The length of a hash index created for a tinyint is the same as the length of a very large column type.
So the lookup is very lightweight and fast. However, the hash index has the following limitations:
Because the index contains only the hash value and the pointer to the row, not the data itself, MySQL does not use the values in the index to avoid reading rows. Fortunately, accessing the rows in memory is very fast, so you generally do not degrade performance.
MySQL cannot use a hash index on the sort. Because they do not store the sorted rows.
The hash index does not support matching of partial keys because they calculate the hash value of the entire index value. So if you do hash index (A,B) and your query queries only with a condition of a, the hash index is invalidated.
The hash index only supports some comparisons that are equal. For example, =,in (),<=> operation is OK. (Note that <> and <=> are not the same). They cannot speed up the range query. Like where price>100.
Accessing data in a hash index is very fast, unless there are some conflicts (many values have the same hash value). When a conflict occurs, the storage engine must find the row pointed to by the pointer in each linked list, and compare the values of those rows to the values to be looked up until the correct row is found.
If there are many hash value conflicts, then the maintenance of the index will be very slow. For example, if you create an index on a column and the index value conflicts a lot, then you delete a row from the table, and the cost of finding the correct pointer from the index is very large. The storage engine finds a pointer in the linked list with each row of the same hash value and then finds the reference to delete the row.
These restrictions result in the use of a hash index only in exceptional cases. However, if they meet the needs of the application, the performance will be a fantastic upgrade. One example is about the Data Warehouse, and the classic star Shema needs a lot of connections to find the table. The hash index can accurately find which table is required.
In addition to the memory storage engine is a hash index, the NDB cluster storage engine also supports a unique hash index. This feature is focused on the NDB cluster index, which is beyond the scope of this book.