Btree This MySQL default indexing method, has the universal applicability

Last Update:2018-08-16 Source: Internet

Author: User

Tags create index keyword list

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Article turned from 52145477

MySQL currently has the following main indexing methods: Fulltext,hash,btree,rtree.

So, what are the functional and performance differences of these indexes?

Fulltext

That is, full-text indexing, currently only supported by the MyISAM engine. It can be used in CREATE TABLE, ALTER table, create INDEX, but currently only CHAR, VARCHAR, text column can be created on the text index. It is worth mentioning that, when the data volume is large, the data is now placed in a table without a global index, and then create the fulltext index by creating index, more than the first one to establish a table fulltext and then write the data much faster.

The full-text index is not born with MyISAM, it appears to solve the problem that the where name like "%word%" is less efficient for text-based fuzzy queries. Before the full-text index, such a query statement is to traverse the data table operation, visible, in the large amount of data is extremely time-consuming, if there is no asynchronous IO processing, the process will be held hostage, it is a waste of time, of course, there is no further explanation of asynchronous Io, want to understand the children's shoes, self-Google.

The use of full-text indexing is not complex:

Create ALTER TABLE table ADD INDEX ' Fullindex ' USING fulltext (' cname1 ' [, cname2 ...]);

Use SELECT * FROM table WHERE MATCH (cname1[,cname2 ...]) Against (' word ' MODE);

Where mode is the search mode (in BOOLEAN mode, in NATURAL LANGUAGE mode, in NATURAL LANGUAGE mode with query Expansion/with query Expans ION).

About these three kinds of search methods, here also do not do more to explain, simply, is, Boolean mode, allow Word contains some special characters to mark some specific requirements, such as + means must have,-said must have no, * denotes a generic match, is not think of regular, similar to it; natural language mode, is a simple word match; The natural language pattern with expressions is to first use the natural language pattern to process the returned results and then to match the expressions.

To the search engine a little bit of understanding of the classmate, must know the concept of Word segmentation, Fulltext Index is also based on the principle of Word segmentation index. In Latin, most of the alphabet, Word segmentation can be easily separated by the space. However, it is obvious that Chinese cannot make participle in this way. And what about that? This introduces you to a MySQL Chinese word breaker plugin mysqlcft, with it, you can Chinese word segmentation, want to know the students please MYSQLCFT, of course, there are other word-breaker can be used.

HASH

The word hash, it can be said, since we began the code of the day, began to constantly see and use. In fact, a hash is a (key=>value) Form of a key-value pair, such as a function mapping in mathematics, allowing multiple keys to correspond to the same value, but not allowing a key to correspond to more than one value. It is because of this feature, hash is very suitable for indexing, a column or a number of columns to build a hash index, it will use this column or a few columns of value through a certain algorithm to calculate a hash value, corresponding to a row or a few rows of data (here conceptually and function mapping is different, do not confuse). In the Java language, each class has its own hashcode () method, and none of the display definitions are inherited from the object class, which makes each object unique and plays an important role in equal comparisons between objects and in serialized transmissions. There are many ways to generate hash, sufficient to ensure the uniqueness of the hash code, for example, in MongoDB, each document has a system for its generation of the unique Objectid (including timestamp, host hash value, process PID, and self-increment ID) is also a hash of the performance. Well, I seem to be pulling away,-_-!.

Because the hash index can be positioned one at a time, it does not need to be looked up by layer as a tree index, so it is highly efficient. So why do you need other tree-shaped indexes?

You don't have to summarize yourself here. References to other great gods in the garden: the difference between the Btree index and the hash index of MySQL from the 14 road

(1) Hash index can only meet "=", "in" and "<=>" query, can not use range query.
Because the hash index comparison is the hash value after the hash operation, so it can only be used for the equivalent of filtering, can not be used for range-based filtering, because the corresponding hash algorithm after processing the hash value of the size of the relationship, and can not be guaranteed and hash before the exact same.
(2) Hash index cannot be used to avoid sorting operations of data.
Because the hash index is stored in the hash after the hash value, and the size of the hash value is not necessarily the same as the key value before the hash operation, so the database can not use the index data to avoid any sorting operations;
(3) Hash index cannot use partial index key query.
For the composite index, the hash index in the calculation of the hash value when the combination index key merge and then calculate the hash value together, rather than calculate the hash value alone, so by combining the index of the previous or several index key query, the Hash index can not be exploited.
(4) Hash index cannot avoid table scan at any time.
As already known, the hash index is the index key through the hash operation, the hash value of the result of hashing and the corresponding line pointer information stored in a hash table, because the different index keys exist the same hash value, so even if the number of data that satisfies a hash key value of the record bar, also can not The query is completed directly from the Hash index, or the actual data in the table is accessed, and the corresponding results are obtained.
(5) When a hash index encounters a large number of equal hash values, performance is not necessarily higher than the B-tree index.
For low-selectivity index keys, if a hash index is created, then there will be a large number of record pointer information associated with the same hash value. This can be very cumbersome to locate a record, wasting multiple table data access and resulting in poor overall performance.

Let me add a little bit to the process of hashing the index, by the way, the 4th and 5 above:

When we build a hash index for a column or column (currently only the memory engine explicitly supports this type of index), a file similar to the following is generated on the hard disk:

Hash value	Storage Address
1db54bc745a1	77#45b5
4bca452157d4	76#4556,77#45cc ...

...

The hash value is calculated by a specific algorithm by the specified column data, the disk address is the address where the data row is stored on the hard disk (there may be other storage address, in fact, memory will be the hash table into RAM).

Thus, when we do where Age = 18 o'clock, we will calculate a hash value of 18 through the same algorithm ==> find the corresponding storage address in the hash table ==> to obtain data based on the storage address.

Therefore, each time the query is to traverse the hash table, until the corresponding hash value, such as (4), the amount of data, the hash table will become large, performance degradation, traverse time increases, such as (5).

BTREE

Btree index is a kind of index value according to a certain algorithm, into a tree-shaped data structure, I believe that learning data structure of the children's shoes are the original learning binary tree This data structure experience memories, anyway, I was in order to soft test but this thing was a good toss, but that exam seems not how to test this. Like a binary tree, each query is started from the root of the tree's portal, traversing node in turn to get the leaf.

Btree is slightly different in myisam form and InnoDB.

In InnoDB, there are two forms: the first is the primary key form, and its leaf node holds the data, and not only the data of the index key, but also the data of the other fields. The second is secondary index, whose leaf node is similar to the normal btree, but also contains information that points to the primary key.

And in MyISAM, the primary key is not much different from the others. But the InnoDB is not the same as in the MyISAM, leaf node is not the primary key information, but point to the data file in the corresponding data row information.

RTREE

Rtree is rarely used in MySQL, only supports geometry data types, and supports this type of storage engine with only MyISAM, BDb, InnoDb, NDb, and archive.

The advantage with respect to Btree,rtree is the range lookup.

Usage of various indexes

(1) For btree this MySQL default index way, has the universal applicability

(2) Because Fulltext is not very good for Chinese support, in the absence of plug-ins, it is best not to use. In fact, some small blog applications, only need to set up a keyword list for it in the data collection, through the keyword index, is also a good way, at least I often do so.

(3) For some search engine-level applications, Fulltext is also not a good method of processing, MySQL full-text indexing file is relatively large, and the efficiency is not very high, even if the use of Chinese word-breaker, Chinese word support is only general. If you really encounter this problem, Apache Lucene may be your choice.

(4) It is because the hash table has an unparalleled advantage in handling smaller amounts of data, so hash indexes are good for caching (in-memory databases). such as the memory version of MySQL database memsql, the use of a wide range of caching tools Mencached,nosql database Redis, etc., all use a hash index this form. Of course, if you do not want to learn these things, MySQL memory engine can also meet this demand.

Btree This MySQL default indexing method, has the universal applicability

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More