Differences and applicability of several index types in MySQL

Source: Internet
Author: User
Tags keyword list

 

As you know, MySQL currently has the following types of indexes: Fulltext,hash,btree,rtree.

So, what are the functional and performance differences of these indexes?

Fulltext

That is, full-text indexing, currently only supported by the MyISAM engine. It can be used in CREATE TABLE, ALTER table, create INDEX, but currently only CHAR, VARCHAR, text column can be created on the text index. It is worth mentioning that, when the data volume is large, the data is now placed in a table without a global index, and then create the fulltext index by creating index, more than the first one to establish a table fulltext and then write the data much faster.

The full-text index is not born with MyISAM, it appears to solve the problem that the where name like "%word%" is less efficient for text-based fuzzy queries. Before the full-text index, such a query statement is to traverse the data table operation, visible, in the large amount of data is extremely time-consuming, if there is no asynchronous IO processing, the process will be held hostage, it is a waste of time, of course, there is no further explanation of asynchronous Io, want to understand the children's shoes, self-Google.

The use of full-text indexing is not complex:

Create ALTER TABLE table ADD INDEX ' Fullindex ' USING fulltext (' cname1 ' [, cname2 ...]);

Use SELECT * FROM table WHERE MATCH (cname1[,cname2 ...]) Against (' word ' MODE);

Where mode is the search mode (in BOOLEAN mode, in NATURAL LANGUAGE mode, in NATURAL LANGUAGE mode with query Expansion/with query Expans ION).

About these three kinds of search method, the foolish Ann here also does not have to make the explanation, simply, is, the Boolean mode, allows Word to contain some special characters to mark some specific requirements, such as + indicates must have,-indicates certainly does not have, * indicates the general match, is not reminds of the regular, similar bar; natural language mode, is a simple word match; The natural language pattern with expressions is to first use the natural language pattern to process the returned results and then to match the expressions.

To the search engine a little bit of understanding of the classmate, must know the concept of Word segmentation, Fulltext Index is also based on the principle of Word segmentation index. In Latin, most of the alphabet, Word segmentation can be easily separated by the space. However, it is obvious that Chinese cannot make participle in this way. And what about that? This introduces you to a MySQL Chinese word breaker plugin mysqlcft, with it, you can Chinese word segmentation, want to know the students please MYSQLCFT, of course, there are other word-breaker can be used.

HASH

The word Hash , it can be said, since we began the code of the day, began to constantly see and use. In fact, a hash is a (key=>value) Form of a key-value pair, such as a function mapping in mathematics, allowing multiple keys to correspond to the same value, but not allowing a key to correspond to more than one value. It is because of this feature, hash is very suitable for indexing, a column or a number of columns to build a hash index, it will use this column or a few columns of value through a certain algorithm to calculate a hash value, corresponding to a row or a few rows of data (here conceptually and function mapping is different, do not confuse). In the Java language, each class has its own hashcode () method, and none of the display definitions are inherited from the object class, which makes each object unique and plays an important role in equal comparisons between objects and in serialized transmissions. There are many ways to generate hash, sufficient to ensure the uniqueness of the hash code, for example, in MongoDB, each document has a system for its generation of the unique Objectid (including timestamp, host hash value, process PID, and self-increment ID) is also a hash of the performance. Well, I seem to be pulling away,-_-!.

Because the hash index can be positioned one at a time, it does not need to be looked up by layer as a tree index, so it is highly efficient. So why do you need other tree-shaped indexes?

In here, the foolish Ann does not summarize himself. References to other great gods in the garden: the difference between the Btree index and the hash index of MySQL from the 14 road

(1) Hash index can only meet "=", "in" and "<=>" query, can not use range query.
Because the hash index comparison is the hash value after the hash operation, so it can only be used for the equivalent of filtering, can not be used for range-based filtering, because the corresponding hash algorithm after processing the hash value of the size of the relationship, and can not be guaranteed and hash before the exact same.
(2) Hash index cannot be used to avoid sorting operations of data.
Because the hash index is stored in the hash after the hash value, and the size of the hash value is not necessarily the same as the key value before the hash operation, so the database can not use the index data to avoid any sorting operations;
(3) Hash index cannot use partial index key query.
For the composite index, the hash index in the calculation of the hash value when the combination index key merge and then calculate the hash value together, rather than calculate the hash value alone, so by combining the index of the previous or several index key query, the Hash index can not be exploited.
(4) Hash index cannot avoid table scan at any time.
As already known, the hash index is the index key through the hash operation, the hash value of the result of hashing and the corresponding line pointer information stored in a hash table, because the different index keys exist the same hash value, so even if the number of data that satisfies a hash key value of the record bar, also can not The query is completed directly from the Hash index, or the actual data in the table is accessed, and the corresponding results are obtained.
(5) When a hash index encounters a large number of equal hash values, performance is not necessarily higher than the B-tree index.
For low-selectivity index keys, if a hash index is created, then there will be a large number of record pointer information associated with the same hash value. This can be very cumbersome to locate a record, wasting multiple table data access and resulting in poor overall performance.

I would add a little bit to the process of hash indexing, by the way, explaining the 4th and 5 above:

When we build a hash index for a column or column (currently only the memory engine explicitly supports this type of index), a file similar to the following is generated on the hard disk:

Hash value Storage Address
1db54bc745a1 77#45b5
4bca452157d4 76#4556,77#45cc ...

...

The hash value is calculated by a specific algorithm by the specified column data, the disk address is the address where the data row is stored on the hard disk (there may be other storage address, in fact, memory will be the hash table into RAM).

Thus, when we do where Age = 18 o'clock, we will calculate a hash value of 18 through the same algorithm ==> find the corresponding storage address in the hash table ==> to obtain data based on the storage address.

Therefore, each time the query is to traverse the hash table, until the corresponding hash value, such as (4), the amount of data, the hash table will become large, performance degradation, traverse time increases, such as (5).

BTREE

Btree index is a kind of index value according to a certain algorithm, into a tree-shaped data structure, I believe that learning data structure of the children's shoes are the original learning binary tree This data structure experience memories, anyway, foolish Ann I was in order to soft test but this thing good toss, but that exam seems not how to test this. Like a binary tree, each query is started from the root of the tree's portal, traversing node in turn to get the leaf.

Btree is slightly different in myisam form and InnoDB.

In InnoDB, there are two forms: the first is the primary key form, and its leaf node holds the data, and not only the data of the index key, but also the data of the other fields. The second is secondary index, whose leaf node is similar to the normal btree, but also contains information that points to the primary key.

And in MyISAM, the primary key is not much different from the others. But the InnoDB is not the same as in the MyISAM, leaf node is not the primary key information, but point to the data file in the corresponding data row information.

RTREE

Rtree is rarely used in MySQL, only supports geometry data types, and supports this type of storage engine with only MyISAM, BDb, InnoDb, NDb, and archive.

The advantage with respect to Btree,rtree is the range lookup.

Usage of various indexes

(1) For btree this MySQL default index type, has the universal applicability

(2) Because Fulltext is not very good for Chinese support, in the absence of plug-ins, it is best not to use. In fact, some small blog applications, only need to set up a keyword list for it in the data collection, through the keyword index, but also a good way, at least I often do this.

(3) For some search engine-level applications, Fulltext is also not a good method of processing, MySQL full-text indexing file is relatively large, and the efficiency is not very high, even if the use of Chinese word-breaker, Chinese word support is only general. If you really encounter this problem, Apache Lucene may be your choice.

(4) It is because the hash table has an unparalleled advantage in handling smaller amounts of data, so hash indexes are good for caching (in-memory databases). such as the memory version of MySQL database memsql, the use of a wide range of caching tools Mencached,nosql database Redis, etc., all use a hash index this form. Of course, if you do not want to learn these things, MySQL memory engine can also meet this demand.

(5) As for Rtree, I have not used it so far, it is specific how, I do not know. Have Rtree use experience of classmate, can exchange next!

Differences and applicability of several index types in MySQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.