Principle and optimization of database indexing

Source: Internet
Author: User
Tags mysql query mysql index

This article mainly comes from the mainstream articles on the Internet, just follow the personal understanding of a little integration, followed by reference links.

I. Summary

This paper takes MySQL database as the research object and discusses some topics related to database indexing. In particular, MySQL supports many storage engines, and the various storage engines support the indexes differently, so the MySQL database supports multiple index types such as btree indexes, hash indexes, full-text indexes, and so on. To avoid confusion, this article will focus only on the Btree index, as this is the primary index for dealing with MySQL, and the hash index and the full-text index are not discussed in this article.

Second, common query algorithm and data structure

Why do you want to talk about query algorithms and data structures here? Because the reason is to build an index, in order to build a data structure, you can apply an efficient query algorithm above, and ultimately improve the data query speed.

2.1 Nature of the index

The official MySQL definition of an index is: index is the data structure that helps MySQL to get data efficiently. By extracting the skeleton of a sentence, you can get the essence of the index: The index is the data structure.

2.2 Common query algorithms

We know that database query is one of the most important functions of database. We all want to query the data as fast as possible, so the designers of the database system are optimized from the point of view of the query algorithm. So what query algorithms can make queries faster?

2.2.1 Sequential lookup (linear search)

The most basic query algorithm, of course, is sequential lookup (linear search), which is the method of comparing each element, but the algorithm is extremely inefficient when the data is large.
Data structures: ordered or unordered queues
Complexity of: O(n)
Instance code:

//顺序查找int SequenceSearch(int a[], int value, int n){    int i;    for(i=0; i<n; i++)        if(a[i]==value)            return i;    return -1;}
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
2.2.22-point lookup (binary search)

A faster query than the sequential lookup method should be two-point lookup, the principle of binary lookup is that the search process starts from the middle element of the array, if the intermediate element is exactly the element to be found, then the search process ends; If a particular element is greater than or less than the middle element, it is found in the half of the array greater than or less And starting with the beginning of the comparison from the middle element. If an array of steps is empty, the representation cannot be found.
Data structures: Ordered arrays
Complexity of: O(logn)
Instance code:

//二分查找,递归版本int BinarySearch2(int a[], int value, int low, int high){    int mid = low+(high-low)/2;    if(a[mid]==value)        return mid;    if(a[mid]>value)        return BinarySearch2(a, value, low, mid-1);    if(a[mid]<value)        return BinarySearch2(a, value, mid+1, high);}
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
2.2.32 Fork Sorting Tree Lookup

The binary sorting tree is characterized by:

    1. If its left subtree is not empty, then the value of all nodes on the left subtree is less than the value of its root node;
    2. If its right subtree is not empty, the value of all nodes on the right subtree is greater than the value of its root node;
    3. Its left and right sub-trees are also two-fork sorting trees.

The principle of search:

    1. If B is an empty tree, the search fails, otherwise:
    2. If x equals the value of the data field of the root node of B, the lookup succeeds;
    3. If x is less than the value of the data field of the root node of B, the left subtree is searched; otherwise:
    4. Find the right subtree.

Data structure: Two-fork sorting tree
Complexity of Time:O(log2N)

2.2.4 Hashing hash method (hash table)

The principle is to first create a hash table (hash table) based on the key value and the hash function, the fuel consumption is based on the key value, through the hash function, the position of the data element.

Data structure: Hash table
Time complexity: Almost O(1) , depending on how much conflict is generated.

2.2.5 Block Lookup

Block lookup, also known as index order lookup, is an improved method for sequential lookups. The idea of the algorithm is to divide n data elements "ordered by block" into M-Block (m≤n). The nodes in each block do not have to be ordered, but the blocks and blocks must be "ordered by block"; that is, the keyword of any element in the 1th block must be less than the keyword of any element in the 2nd block, and any element in the 2nd block must be less than any element in the 3rd block, and so on.

Algorithm Flow:

    1. First, select the maximum keywords in each block to form an index table;
    2. Find two parts: first binary lookup or sequential lookup of the Index table to determine which piece the unknown origin is recorded in, and then find in the determined block using the sequential method.

Each comparison of this search algorithm reduces the search scope by half. Their query speed has been greatly improved, the complexity is. If you look at it a little bit, you will find that each lookup algorithm can only be applied to a particular data structure, such as a binary lookup requires an orderly retrieval of data, while a binary tree lookup can only be applied to a binary lookup tree, but the data itself cannot be fully organized to meet a variety of data structures (for example, It is theoretically impossible to organize both columns sequentially, so in addition to the data, the database system maintains a data structure that satisfies a particular lookup algorithm that references (points to) data in some way, so that an advanced find algorithm can be implemented on those data structures. This data structure is the index.

2.3 Balanced Multi-search tree B-tree (B-tree)

The above refers to the two-fork tree, its search time complexity O(log2N) , so its search efficiency and the depth of the tree, if you want to improve the speed of queries, it is necessary to reduce the depth of the tree. To reduce the depth of the tree, it is natural to use a multi-fork tree, combined with the idea of a balanced binary tree, we can build a balanced multi-fork tree structure, and then we can build a balanced multi-path search algorithm, improve the efficiency of large data volume.

2.3.1 B Tree

B-Tree (Balance tree) is also called B-tree (in fact B-is translated by B-tree, so B-tree and B-tree is a concept), it is a balanced multi-path search tree. is a typical B-tree:

We can see some of the characteristics of the B-tree, in order to better describe the B-tree, we define the record as a two-tuple [key, data] , key is the key value of the record, data represents other data (only key, not drawing data). Here is a detailed definition of the B-tree:

1. 有一个根节点,根节点只有一个记录和两个孩子或者根节点为空;2. 每个节点记录中的key和指针相互间隔,指针指向孩子节点;3. d是表示树的宽度,除叶子节点之外,其它每个节点有[d/2,d-1]条记录,并且些记录中的key都是从左到右按大小排列的,有[d/2+1,d]个孩子;4. 在一个节点中,第n个子树中的所有key,小于这个节点中第n个key,大于第n-1个key,比如中B节点的第2个子节点E中的所有key都小于B中的第2个key 9,大于第1个key 3;5. 所有的叶子节点必须在同一层次,也就是它们具有相同的深度;
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

Because of the characteristics of B-tree, the algorithm of retrieving data by key in B-tree is very intuitive: first, binary lookup from the root node, if found to return the corresponding node of data, otherwise the corresponding interval pointer to the node to find recursively, until the node is found or a null pointer, the former lookup success , the latter lookup failed. The pseudo-code for the lookup algorithm on B-tree is as follows:

BTree_Search(node, key) {     if(node == null) return null;     foreach(node.key){          if(node.key[i] == key) return[i];          if(node.key[i] > key) return BTree_Search(point[i]->node);      }     return BTree_Search(point[i+1]->node);  }data = BTree_Search(root, my_key);
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

About B-tree has a series of interesting properties, such as a degree of D B-tree, set its index n key, the upper limit of its tree height h is logd((N+1)/2) , retrieve a key, its search node number of the progressive complexity of O(logdN) . As can be seen from this point, B-tree is a very efficient index data structure.

In addition, since inserting deletes a new data record destroys the nature of the b-tree, it is necessary to do a split, merge, transfer, etc. to preserve the b-tree nature of the tree when inserting and deleting it, and this article does not intend to discuss the contents of B-tree completely. Because there is already a lot of information detailing the mathematical nature of b-tree and the insertion and deletion algorithm, interested friends can consult other literatures for detailed study.

2.3.2 B+tree

In fact, B-tree has many variants, the most common of which is b+tree, such as MySQL, which generally uses b+tree to implement its index structure. Compared to B-tree, B+tree has the following different points:

    • The pointer to each node is capped at 2d instead of 2d+1;
    • The inner node does not store data and stores only key;
    • Leaf nodes do not store pointers;

The following is a simple b+tree schematic.

Since not all nodes have the same domain, the b+tree and inner nodes are generally of different sizes. This is different from B-tree, although the number of keys and pointers stored in B-tree may be inconsistent, but the domain and the upper bound of each node are consistent, so in the implementation b-tree often apply for the same size of space for each node. In general, B+tree is more suitable than b-tree to achieve the external storage index structure, the specific reason and the principle of external memory and the principle of computer access, will be discussed below.

2.3.3 B+tree with sequential access pointers

Generally, the b+tree structure used in database system or file system is optimized on the basis of classical b+tree, and the sequential access pointers are added.

, a pointer to an adjacent leaf node is added to each leaf node of the B+tree, and a b+tree with sequential access pointers is formed. The purpose of this optimization is to improve the performance of the interval access, example 4, if you want to query key from 18 to 49 of all data records, when the 18 is found, simply follow the node and the pointer in order to traverse to all data nodes, and greatly mentions the efficiency of the interval query.

This section provides a brief introduction to B-tree and B+tree, and the next section, combined with the memory access principle, explains why the current b+tree is the preferred data structure for the database system implementation index.

Third, the index data structure setting related computer principle

As mentioned above, the binary tree, red black tree and other data structures can also be used to implement the index, but the file system and database system generally adopt b-/+tree as the index structure, this section will combine the computer composition principle related knowledge discusses b-/+tree as the theoretical basis of the index.

3.1 Two types of storage

In a computer system, there are generally two types of storage, computer main memory (RAM) and external memory (such as hard disks, CDs, SSDS, etc.). When designing the index algorithm and the storage structure, we have to take into account both types of storage characteristics. Main memory reading speed, relative to main memory, the data read rate of the external disk is several orders of magnitude slower than master and slave, the specific differences between them will be described in detail later. All of the above-mentioned query algorithms are hypothetical data stored in the main memory of the computer, the main memory of the computer is generally small, the actual database data are stored in the external memory.

In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this way, the index lookup process will generate disk I/O consumption, relative to memory access, I/O access to the consumption of a few orders of magnitude, so the evaluation of a data structure as an index is the most important indicator of the number of disk I/O operations in the process of incremental complexity. In other words, the structural organization of the index minimizes the number of disk I/O accesses during the lookup process. Here is a detailed introduction to memory and disk access principles, and then combine these principles to analyze the efficiency of B-/+tree as an index.

3.2 Main Memory Access principle

At present, the main memory used in the computer is random read-write memory (RAM), modern RAM structure and access principle is more complex, here the paper gives up the specific differences, abstract a very simple access model to illustrate how RAM works.

From an abstract point of view, main memory is a series of storage units composed of a matrix, each storage unit stores a fixed size of data. Each storage unit has a unique address, modern main memory of the site rules are more complex, here it simplifies to a two-dimensional address: through a row address and a column address can be uniquely positioned to a storage unit. A 4 x 4 main memory model is presented.

The main memory access process is as follows:

When the system needs to read main memory, the address signal is put to address bus upload to main memory, main memory read the address signal, parse the signal and locate the designated storage unit, and then put this storage unit data on the bus for other parts to read. The process of writing main memory is similar, the system will write the unit address and data on the address bus and bus, respectively, main memory reads two bus contents, does the corresponding write operation.

As can be seen here, the main memory access time is only linear with the number of accesses, because there is no mechanical operation, two times access to the data "distance" will not have any impact on time, for example, first take A0 and then take A1 and take A0 and then take D3 time consumption is the same.

3.3 Disk Access principle

As mentioned earlier, indexes are typically stored as files on disk, and index retrieval requires disk I/O operations. Unlike main memory, disk I/O is mechanically active, so the time spent on disk I/O is huge.

The disk reads the data by the mechanical movement, when needs to read the data from the disk, the system will pass the data logical address to the disk, the disk's control circuit translates the logical address according to the addressing logic to the physical address, namely determines to read the data in which track, which sector. In order to read the data in this sector, the head needs to be placed above this sector, in order to achieve this, the head needs to move to the corresponding track, this process is called seek, the time spent is called seek time, and then the disk rotation of the target sector is rotated to the head, the process of time is called the rotation time, The last is the transmission of the read data. So the time spent on each reading of the data can be divided into seek time, rotation delay, transmission time three parts. which

    • The seek time is the time required for the magnetic arm to move to the specified track, and the main disk is typically below 5ms.
    • Rotational delay is what we often hear of disk speed, such as a disk 7200 rpm, indicating that can be rotated 7,200 times per minute, that is, 1 seconds can go 120 times, the rotation delay is 1/120/2 = 4.17ms.
    • Transfer time refers to the time that reads from disk or writes data to disk, typically in fraction milliseconds, and is negligible relative to the first two times.

Then the time to access a disk, that is, a disk IO time is approximately equal to 5+4.17 = 9ms, sounds pretty good, but to know that a 500-mips machine can execute 500 million instructions per second, because the instruction depends on the nature of the electricity, In other words, the time to execute an IO can execute 400,000 instructions, the database with 1.001 billion or even tens data, each time 9 milliseconds, it is obviously a disaster.

3.4 Principle of locality and disk pre-reading

Due to the characteristics of the storage media, the disk itself is much slower than main memory, coupled with mechanical movement, disk access speed is often one of the hundreds of of main memory, so in order to improve efficiency, to minimize disk I/O. To do this, the disk is often not read strictly on-demand, but is read-ahead every time, even if only one byte is required, and the disk starts from this location, sequentially reading a certain length of data into memory. The rationale for this is the well-known local principle of computer science: When a data is used, the data around it is usually used immediately. The data that is required during the program run is usually relatively centralized.

Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality. The length of the read-ahead is generally the integer multiple of the page. Page is the logical block of Computer Management memory, hardware and operating system tend to divide main memory and disk storage area into contiguous size equal blocks, each storage block is called a page (in many operating systems, the page size is usually 4k), main memory and disk in the page to exchange data. When the program to read the data is not in main memory, will trigger a page fault, the system will send a read signal to the disk, the disk will find the starting position of the data and sequentially read one or several pages back into memory, and then return unexpectedly, the program continues to run.

IV. data structure B-/+tree of database index and its performance analysis

Here you can finally analyze why database indexes are B-/+TREE storage structures. As mentioned above, database indexes are stored on disk and we generally use disk I/O times to evaluate the index structure. First, from the B-tree analysis, according to the definition of b-tree, it is necessary to retrieve up to h-1 one node at a time (root node resident memory). The designer of the database system skillfully exploits the principle of disk pre-reading, setting the size of a node equal to one page, so that each node can be fully loaded with only one I/O. In order to achieve this, in the actual implementation of the B-tree also need to use the following techniques: each time a new node, directly request a page of space, so that a node can be physically stored in a page, in addition to the computer storage allocation is page-aligned, the implementation of a node only one time I/O.

The b-tree requires a maximum of h-1 secondary I/O (root node resident memory) in a single retrieval, and a progressive complexity of O(h)=O(logdN) . In general practice, the out-of-size D is a very large number, usually more than 100, so H is very small (usually not more than 3).

In summary, if we adopt b-tree storage structure, the number of I/O in the search is generally not more than 3 times, so using B-tree as index structure efficiency is very high.

Performance analysis of 4.1 B + Tree

We know from the above that the search complexity of the B-tree is, so the larger the size O(h)=O(logdN) of the tree, the smaller the depth h, the less I/O. B+tree can increase the width of the degree d, because each node size is one page size, so the upper limit depends on the size of the key and data within the node:

    • 1
    • 2

Because the node in the B+tree removes the data domain, it can have a greater degree of out-of-order and better performance.

4.2 B + Tree discovery process

B-Tree and B + Tree search process are basically the same. As shown, if you want to find the data item 29, then the disk Block 1 is loaded into memory, the first time Io, in memory with a binary lookup determined 29 between 17 and 35, locking disk Block 1 of the P2 pointer, memory time because very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive.

This chapter discusses data structures and algorithms related to indexes from a theoretical point of view, and the next chapter will discuss how B+tree is specifically implemented as an index in MySQL, while combining the MyISAM and INNDB storage engines to introduce two different types of index implementations for nonclustered and clustered indexes.

V. MySQL INDEX implementation

In MySQL, the index is the concept of storage engine level, different storage engine implementation of the index is different, this article mainly discusses the MyISAM and InnoDB two storage engine index implementation way.

5.1 MyISAM Index Implementation

The MyISAM engine uses B+tree as the index structure, and the data domain of the leaf node holds the address of the record. is a schematic diagram of the MyISAM index:

There are three columns in the table, assuming that we have Col1 as the primary key, it is a MyISAM table's main index (Primary key) schematic. You can see that the index file of MyISAM only stores the address of the data record. In MyISAM, the primary index and secondary index (secondary key) are structurally indistinguishable, except that the primary index requires that key be unique, and the secondary index key can be duplicated. If we establish a secondary index on Col2, the structure of this index is as follows:

It is also the address of a b+tree,data field that holds data records. Therefore, the algorithm of index retrieval in MyISAM is to search the index according to the B+tree search algorithm first, if the specified key exists, the value of its data field is fetched, then the data record is read with the value of the data field.
The index of MyISAM is also called "non-aggregation", and the reason for this is to differentiate it from InnoDB's clustered index.

5.2 InnoDB Index Implementation

Although InnoDB also uses B+tree as the index structure, the implementation is very different from MyISAM.

The first major difference is that the InnoDB data file itself is the index file. As you know above, the MyISAM index file and the data file are detached, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+tree, and the tree's leaf node data field holds the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

Is the InnoDB primary index (also the data file), you can see that the leaf node contains the complete data record. This index is called a clustered index. Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key, which is 6 bytes long and has a length of type.

The second difference from the MyISAM index is that the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. In other words, all secondary indexes of InnoDB refer to the primary key as the data domain. For example, to define a secondary index on the Col3:

Here is the ASCII code of the English character as the comparison criterion. Clustered index This implementation makes search by primary key efficient, but secondary index search needs to retrieve two times index: first retrieves the secondary index to obtain the primary key, and then retrieves the record with the primary key to the primary index.

Understanding how index implementations of different storage engines can be useful for proper use and optimization of indexes, such as knowing the InnoDB index implementation, makes it easy to understand why it is not recommended to use too-long fields as primary keys, because all secondary indexes refer to the primary index, and the long primary index makes the secondary index too large. For example, using non-monotonic fields as primary keys is not a good idea in InnoDB because the InnoDB data file itself is a b+tree, and a non-monotonic primary key causes the data file to be frequently split in order to maintain b+tree characteristics when inserting new records, which is inefficient, Using the self-increment field as the primary key is a good choice.

The next chapter will discuss these index-related optimization strategies in detail.

Vi. index usage Strategy and optimization

MySQL optimization is mainly divided into structural optimization (Scheme optimization) and query Optimization (optimization). The high performance indexing strategy discussed in this chapter belongs to the structure optimization category. The content of this chapter is based entirely on the theoretical basis above, in fact, once understanding the mechanism behind the index, then the choice of high-performance strategy becomes pure reasoning, and can understand the logic behind these strategies.

6.1 Federated index and leftmost prefix Principle Federated index (composite Index)

Let's start by introducing a federated index. Federated indexes are simple enough that there is only one field relative to a generic index, and a federated index can create an index for multiple fields. It is also very simple, for example, when we create a federated index on the (a,b,c) field, the index record is sorted first by the A field, then by the B field and then by the C field, so the federated index is characterized by:

    • The first field must be an orderly one.
    • When the first field value is equal, the second field is ordered, for example, when a=2 in the following table, all B values are ordered, and so on, when the same B is worth all of the C fields ordered.

      | A | B | C |
      | 1 | 2 | 3 |
      | 1 | 4 | 2 |
      | 1 | 1 | 4 |
      | 2 | 3 | 5 |
      | 2 | 4 | 4 |
      | 2 | 4 | 6 |
      | 2 | 5 | 5 |

In fact, the search for a federated index is the same as the dictionary, first check according to the first letter, and then according to the second letter, or only according to the first letter, but can not skip the first letter from the second letter to start the search. This is the so-called leftmost prefix principle.

The leftmost prefix principle

Let's take a look at the Federated Index query in more detail. In the example above, we (a,b,c) have built a federated index on the field, so this index is arranged by a and then B and then C, so:

The following query methods can be used in the index

select * from table where a=1;select * from table where a=1 and b=2;select * from table where a=1 and b=2 and c=3;
    • 1
    • 2
    • 3
    • 4

The above three queries (a ), (a,b ),(a,b,c ) can be used in the order of the index, which is the leftmost prefix match.

If the query statement is:

select * from table where a=1 and c=3; 那么只会用到索引a。
    • 1
    • 2

If the query statement is:

select * from table where b=2 and c=3; 因为没有用到最左前缀a,所以这个查询是用户到索引的。
    • 1
    • 2

If you use the leftmost prefix, but the reverse order will use the index code?

Like what:

select * from table where b=2 and a=1;select * from table where b=2 and a=1 and c=3;
    • 1
    • 2
    • 3

If you use the leftmost prefix instead of just reversing the order, you can use the index, because the MySQL query optimizer will determine the order in which the SQL statement should be executed in the most efficient way, and finally generate a real execution plan. However, it is best to query the index order so that the query optimizer does not have to recompile.

Prefix index

In addition to the federated index, there is actually a prefix index for MySQL. Prefix index is to use the prefix of the column instead of the entire column as the index key, the current prefix length is appropriate, you can make the prefix index is selectively close to the full-column index, and because the index key shortened and reduce the size of the index file and maintenance costs.

In general, you can use the prefix index in the following situations:

    • String columns (Varchar,char,text, etc.) that require a full field match or a previous match. That is, ' xxx ' or like ' xxx% '
    • The string itself may be longer, and the first few characters start to be different. For example, we have a Chinese name prefix index does not make any sense, because Chinese names are very short, and the recipient address using prefix index is not very practical, because on the one hand, the address is generally the beginning of xx province, that is, the first few characters are similar, and the pickup address is generally like '% xxx% ', will not use a previous match. On the contrary, the names of foreigners can be indexed by prefix because their characters are longer and the first few characters are more selective. The same email is also a field that can be indexed using a prefix.
    • The index selectivity of the first half of the characters is already close to the index selectivity of the full field. If the length of the whole field is 20, the index selectivity is 0.9, and we prefix the first 10 characters with a selectivity of only 0.5, then we need to continue to increase the length of the prefix character, but this time the advantage of the prefix index is not obvious, there is not much need to build a prefix index.

Some of the articles also mention:

The MySQL prefix index can effectively reduce the size of the index file and increase the speed of the index. But the prefix index also has its drawbacks: MySQL cannot use prefix indexes in ORDER by or GROUP by, nor can they be used as an overlay index (covering index).

6.2 Index Optimization Strategy
  • The left-most prefix-matching principle, as mentioned above
  • Primary key external inspection must be indexed
  • Using indexes on columns that appear in Where,on,group By,order by
  • Try to choose a high-sensitivity column as the index, the formula for the degree of differentiation is count (distinct col)/count (*), which indicates that the field does not repeat the scale, the larger the proportion of the number of records we scan, the difference between the unique key is 1, and some states, sex fields may be in front of the big data is 0
  • Use indexes on smaller columns of data, which makes the index file smaller and allows more index keys to be loaded in memory
  • The index column cannot participate in the calculation, keeping the column "clean", such as from_unixtime (create_time) = ' 2014-05-29 ' cannot be used to the index, for the simple reason that the B + tree contains the field values in the data table, but when it is retrieved, You need to apply all the elements to the function to compare, obviously the cost is too large. So the statement should be written create_time = Unix_timestamp (' 2014-05-29 ');
  • Using the prefix index for longer strings
  • Expand the index as much as possible and do not create new indexes. For example, the table already has an index of a, now to add (A, b) of the index, then only need to modify the original index
  • Do not create indexes too much, weigh the number of indexes with the DML relationship, and DML is the INSERT, delete data operation. There is a need to weigh a problem, the purpose of indexing is to improve the efficiency of the query, but the establishment of too many indexes, will affect the speed of inserting and deleting data, because we modify the table data, the index needs to be adjusted to rebuild
  • For like queries, "%" is not placed in front.
    SELECT * FROMHoudunwang WHERE uname LIKE‘后盾%‘ -- 走索引
    SELECT * FROMHoudunwang WHERE unameLIKE "%后盾%" -- 不走索引
  • query where condition data type does not match and cannot use index
    Strings and numbers do not use indexes;
    CREATE TABLEA ( a char(10));
    EXPLAIN SELECT * FROMA WHERE ="1" – Walk Index
    EXPLAIN SELECT * from a WHERE a =1– do not go index
    Regular expressions do not use indexes, which should be well understood, so why it is difficult to see the REGEXP keyword in sql

Reference article:


Principle and optimization of database indexing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.