Use Policy and optimization (high-performance index Policy) behind MySQL Indexes)

Source: Internet
Author: User
MySQL optimization mainly includes schema optimization and Queryoptimization ). The high-performance index policies discussed in this chapter mainly fall into the scope of structure optimization.

MySQL optimization mainly includes schema optimization and Query optimization ). The high-performance index policies discussed in this chapter mainly fall into the scope of structure optimization.

The content of this chapter is based entirely on the theoretical basis above. In fact, once you understand the mechanisms behind the index, selecting a high-performance policy becomes pure reasoning, and you can understand the logic behind these policies.

Sample Database

To discuss the index policy, a database with a small amount of data is required as an example. This article uses one of the sample databases provided in the MySQL official document: employees. This database has moderate link complexity and a large amount of data. Is the E-R diagram of this database (reference from the MySQL official manual ):

  

2

In the official MySQL documentation, the page for accessing this database is http://dev.mysql.com/doc/employee/en/employee.html. This section describes the database in detail and provides and import methods. If you are interested in importing the database to your own MySQL, refer to the content in this article.

Principle and optimization of leftmost prefixes

The primary condition for using indexes efficiently is to know what kind of queries will use indexes. This problem is related to the "leftmost prefix principle" in B + Tree. The following example illustrates the leftmost prefix principle.

Here we will talk about the concept of Federated indexes. In the above article, we assume that the index only references a single column. In fact, the index in MySQL can reference multiple columns in a certain order. This index is called a joint index. Generally, A federated index is an ordered tuples. Each element is a column in a data table. In fact, relational algebra is required to strictly define an index. However, I do not want to discuss too many topics about relational algebra, this will be boring, so we will not strictly define it here. In addition, a single column index can be seen as a special case where the number of union index elements is 1.

Take the employees. titles table as an example. The following describes the indexes on the table:

From the result, the primary index of the titles table is And a secondary index . To avoid the complexity of Multiple indexes (MySQL SQL optimizer is more complicated when multiple indexes are involved), we will drop the secondary index here:

In this way, you can focus on the index PRIMARY behavior.

Case 1: full column match.

Obviously, an index can be used when exact match is performed based on all columns IN the index (here, exact match refers to "=" or "IN" match. Note that indexes are theoretically sensitive to order, but the Query Optimizer of MySQL automatically adjusts the conditional order of the where clause to use suitable indexes, for example, we will reverse the conditional order in where:

The results are the same.

Case 2: match the leftmost prefix.

When the query condition exactly matches one or more columns on the left of the index, for example Or , So it can be used, but only part of it can be used, that is, the leftmost prefix of the condition. The above query shows that the PRIMARY index is used from the analysis results, but the key_len is 4, indicating that only the first column prefix of the index is used.

Case 3: The exact match of the column in the index is used for the query condition, but a condition in the middle is not provided.

In this case, the index usage is the same as that in Case 2. Because the title is not provided, only the first column of the index is used for the query, and the from_date is also in the index, however, the title does not exist and cannot be connected to the left prefix. Therefore, you need to scan and filter the result from from_date (because emp_no is unique, so scanning does not exist ). If you want from_date to use indexes instead of where filter, you can add a secondary index. In this case, the above query will use this index. In addition, you can use an optimization method called "isolate columns" to fill in the "pitfall" between emp_no and from_date.

First, let's take a look at the title with several different values:

There are only 7 types. When the number of columns that become "pitfall" is relatively small, you can consider using "IN" to fill this "pitfall" to form the leftmost Prefix:

The key_len value is 59, which indicates that the index is fully used. But from the type and rows, we can see that IN actually executes a range query. Here we check 7 Keys. Let's take a look at the performance comparison of the two types of queries:

The performance is improved a little after "Filling in holes. If a large amount of data is left after filtering by emp_no, the latter has more obvious performance advantages. Of course, if there are many title values, it is not appropriate to fill in the pitfalls. Secondary indexes must be created.

Case 4: the first column of the index is not specified in the query condition.

Because it is not the leftmost prefix, indexes cannot be used for such queries.

Case 5: match the prefix string of a column.

Indexes can be used at this time, but indexes cannot be used if the wildcard does not appear only at the end.

Case 6: query by range.

The index can be used for a range column (the leftmost prefix must be used), but the index cannot be used for the column after the range column. At the same time, the index can be used for a maximum of one range column. Therefore, if there are two range columns in the query condition, the index cannot be fully used.

We can see that the index is powerless to the second range index. It is particularly important to note that MySQL is an interesting place, that is, the range index and multi-value matching cannot be distinguished only by using explain, because both of them are displayed as range in type. At the same time, the use of "between" does not mean that it is a range query, for example, the following query:

It seems that two range queries are used, but "BETWEEN" acting on emp_no is actually equivalent to "IN", that is, emp_no is actually a multi-value exact match. We can see that this query uses all three columns of the index. Therefore, you must be cautious about multi-value matching and range matching in MySQL. Otherwise, MySQL may be confused.

Case 7: the query condition contains a function or expression.

Unfortunately, if a query condition contains a function or expression, MySQL does not use an index for this column (although some can be used in a mathematical sense ). For example:

Although this query has the same function as in case 5, the left function cannot be used to apply an index to the title column, and the LIKE function can be used in case 5. Another example is:

Obviously, this query is equivalent to a function where emp_no is 10001. However, because the query condition is an expression, MySQL cannot use indexes for it. It seems that MySQL has not been smart enough to automatically optimize constant expressions. Therefore, when writing a query statement, try to avoid the expression appearing in the query. Instead, you should first perform manual algebra, converts a query statement to a non-expression query statement.

Index selection and prefix Index

Since the index can speed up the query, is it necessary to create an index as long as it is required by a query statement? The answer is no. Although the index accelerates the query speed, the index also has a cost: The index file itself consumes storage space, and the index will increase the burden of inserting, deleting, and modifying records. In addition, mySQL also consumes resources to maintain indexes during runtime. Therefore, the more indexes, the better. We do not recommend creating indexes in two cases.

The first case is that the number of table records is relatively small. For example, if there are 1000 or 2000 or even hundreds of records in a table, you do not need to create an index, so that you can perform a full table scan for the query. As for how many records are counted, this individual has his own opinion. My personal experience is to use 2000 as the demarcation line. If the number of records does not exceed 2000, you can consider not to create an index, more than 2000 indexes can be considered as appropriate.

Another case where indexing is not recommended is that indexing is less selective. The so-called index Selectivity refers to the ratio of non-repeated index values (also called Cardinality) to the number of table records (# T:

Index Selectivity = Cardinality/# T

Obviously, the value range of selectivity is (0, 1]. The higher the selectivity, the greater the value of the index, which is determined by the nature of B + Tree. For example, in the employees. titles table used above, if the title field is frequently queried separately, do you need to create an index? Let's take a look at its selectivity:

The selection of the title is less than 0.0001 (the exact value is 0.00001579), so there is no need to create a separate index for it.

There is an index optimization policy related to index selectivity called prefix index, that is, replacing the entire column with the column prefix as the index key. When the prefix length is appropriate, it can make the prefix index selectively close to the full-column index, and reduce the size and maintenance overhead of the index file because the index key becomes short. The following uses the table employees. employees as an example to describe how to select and use prefix indexes.

From 2, we can see that the employees table has only one index. If we want to search for a person by name, we can only scan the entire table:

If you frequently search for employees by name, the efficiency is obviously low, so we can consider creating indexes. There are two options: Or To check the selectivity of the two indexes:

Apparently, the selectivity is too low, The selection is good, but the length of first_name and last_name is 30. Is there a way to balance the length and selectivity? You can use the first few characters of first_name and last_name to create an index. For example To see its selectivity:

The selectivity is good, but the distance from 0.9313 is still a bit, so add the last_name prefix to 4:

At this time, the selection is ideal, and the index length is only 18 This is nearly half as short as index with this prefix. Built on:

Execute the query by name again to compare and analyze the results before the index creation:

The performance improvement is significant, and the query speed is improved by more than 120 times.

Prefix indexes take into account both the index size and query speed. However, they cannot be used for order by and group by operations or Covering indexes (that is, when the index itself contains all the data required for the query, no longer accessing the data file itself ).

Primary key selection and insertion Optimization for InnoDB

When using the InnoDB Storage engine, if you do not have a special need, always use an auto-increment field that is not related to the business as the primary key.

I often see posts or blogs discussing primary key selection. Some people suggest using auto-incrementing primary keys unrelated to the business. Some people think it is unnecessary to use a unique field such as student ID or ID card number as the primary key. Whatever arguments are supported, most arguments are at the business layer. From the perspective of database index optimization, using the InnoDB engine instead of using the auto-incrementing primary key is definitely a bad idea.

The index Implementation of InnoDB has been discussed above. InnoDB uses clustered indexes, and data records are stored on the leaf nodes of the primary index (a B + Tree. This requires that each data record in the same leaf node (the size is a memory page or disk page) be stored in the primary key order. Therefore, when a new record is inserted, mySQL inserts an appropriate node and Location Based on its primary key. If the page reaches the load factor (InnoDB defaults to 15/16), a new page (node) is created ).

If the table uses an auto-incrementing primary key, the records are added to the subsequent positions of the current index node each time a new record is inserted. When a page is full, A new page is automatically created. As shown in:

  

3

In this way, a compact index structure will be formed, and the approximate order will be filled up. Because existing data does not need to be moved during each insertion, the efficiency is very high and there will be no additional overhead for maintaining indexes.

If a non-auto-incrementing primary key is used (if the ID card number or student ID number is used), because the value of the primary key inserted each time is similar to a random value, each new record is inserted to a certain position in the middle of the existing index page:

  

4

At this time, MySQL had to move the data in order to insert the new record to the appropriate location, and even the target page may have been written back to the disk and cleared from the cache. At this time, it had to read back from the disk, this adds a lot of overhead, and frequent movement and paging operations cause a lot of fragmentation, resulting in a compact index structure. Later, we had to use optimize table to recreate the TABLE and OPTIMIZE the page filling.

Therefore, if you can, use the auto-incrementing field on InnoDB as the primary key.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.