Using indexes for MySQL query optimization technology lectures

Last Update:2013-11-25 Source: Internet

Author: User

Tags mysql query optimization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Index is the most important tool for improving the query speed. Of course there are other technologies available for use, but in general, the biggest performance difference is the correct use of indexes. In the MySQL Mail List, people often ask for methods that make queries run faster. In most cases, we should doubt whether there are any indexes on the data table, and usually solve the problem immediately after adding the index. Of course, it is not always so simple to solve the problem, because the optimization technology is not always simple. However, if indexes are not used, it is a waste of time in many cases to try other methods to improve performance. First, use indexes to obtain the maximum performance improvement, and then check whether other technologies are useful.

This section describes what the index is and how the index improves the query performance. It also discusses the potential performance reduction of indexes in some environments and provides some guidelines for you to choose data table indexes wisely. In the next section, we will discuss the MySQL query optimizer, which tries to find the most efficient way to execute the query. It is good for us to know some optimizer knowledge and supplement how to create indexes, because in this way, you can make better use of the indexes you have created. Some methods for writing a query actually make the index ineffective. In general, you should avoid this situation.

Advantages of Indexes

Let's get to know how indexes work. First, there is a data table without indexes. A table without indexes is just a disordered set of data rows. For example, the ad table shown in Figure 1 is a table without indexes. To find a specific company, you must check each data row in the table to see if it matches the target value. This results in a full data table scan. This process is very slow. If the table is large but contains only a small number of qualified records, the efficiency will be very low.

Figure 1: non-indexed ad table

Figure 2 shows the same data table, but adds an index to the company_num data column of the ad table. This index contains entries for each data row in the ad table, but the index entries are sorted by the company_num value. Now we use indexes instead of viewing data entries row by row to search for matching data items. Suppose we look for all the data rows of company 13. We began to scan the index and found three values for the company. Then we met the index value of company 14, which is larger than the value we are searching. The index value is out of order. Therefore, when we read an index record containing 14, we will know that there will no more matching records and we can end the query operation. Therefore, the indexing function is that we find where the matched data row ends and can ignore other data rows. Another function is to use the positioning algorithm to search for the first matched entry, instead of starting from the index header for linear scanning (for example, binary search is faster than linear scanning ). By using this method, we can quickly locate the first matched value, saving a lot of search time. Databases use a variety of technologies to quickly locate index values, but we do not care about these technologies in this article. The point is that they can be implemented, and indexing is a good thing.

Figure 2: Indexed ad table

You may ask, why don't we sort data rows to save the index? Can the same search speed be improved? Yes. If the table has only one index, the same effect may be achieved. However, you may add a second index, so you cannot sort the data rows in two different ways at a time (for example, you may want to create an index on the customer name, create another index on the customer ID or phone number ). This problem is solved by separating entries from data rows as indexes, allowing us to create multiple indexes. In addition, the rows in the index are generally shorter than the data rows. When you insert or delete new values, it is easier to move shorter index values than to move the sorting order of long data rows.

The details of index implementation vary with MySQL storage engines. For example, for a MyISAM data table, the data row of the table is saved in a data file, and the index value is saved in the index file. A data table may have multiple indexes, but they are all stored in the same index file. Each index in the index file contains an array of sorted key records (which are used to quickly access data files.

In contrast, the BDB and InnoDB Storage engines do not use this method to separate data rows and index values, although they also use indexes as sorted Value Sets for operations. By default, the BDB engine uses a single file to store data and index values. InnoDB uses a single table space to manage the data and index storage of all InnoDB tables in the tablespace. We can configure InnoDB for each table to be created in its own tablespace, but even so, the data and indexes of the data table are stored in the same tablespace file.
The previous discussion described the advantages of indexes in a single table query environment. In this case, by reducing the scanning of the entire table, the indexing speed is significantly improved. When you run a query involving multi-table join (jion), the value of the index is higher. In a single table query, the number of values you need to check on each data column is the number of data rows in the table. In multi-table queries, this quantity may increase significantly because this quantity is produced by the number of data rows in these tables.

Assume that you have three unindexed tables t1, t2, and t3. Each table contains the i1, i2, and i3 data columns, and each table contains 1000 data rows, the serial number ranges from 1 to 1000. The query for combinations of data rows that match some values may be as follows:

SELECT t1.i1, t2.i2, t3.i3
FROM t1, t2, t3
WHERE t1.i1 = t2.i2 AND t2.i1 = t3.i3;

The query result should be 1000 rows. Each Data row contains three equal values. If the query is processed without an index, we cannot know which data rows contain the values if we do not scan all these tables. Therefore, you must try all combinations to find records that meet the WHERE condition. The number of possible combinations is 1000x1000x1000 (1 billion !), It is 1 million times the number of matched records. This wastes a lot of work. This example shows that if no index is used, as the table records keep increasing, the time spent in processing the join of these tables increases faster, resulting in poor performance. We can index these data tables to significantly increase the speed, because the index allows queries to be processed in the following way:

1. Select the first row in Table t1 and view the value of the data row.

2. Use the index on table t2 to directly locate the data row that matches the value of t1. Similarly, you can use the index on table t3 to directly locate the data row that matches the value of table t2.

3. process the next row of table t1 and repeat the previous process. Perform this operation until all data rows in t1 are checked.

In this case, we still perform a complete scan on table t1, but we can perform index search on table t2 and table t3 to retrieve data rows directly from these tables. Theoretically, running the above query in this way will be 1 million times faster. Of course, this example is created to draw conclusions. However, it solves the problem of reality. Adding an index to a table without an index usually results in an astonishing performance improvement.

MySQL has several indexing methods:

· As mentioned above, indexes are used to increase the search speed for data rows matching other tables in the WHERE condition or performing join operations.

· For queries using MIN () or MAX () functions, the minimum or maximum values in the index data column can be quickly found without checking each data row.

· MySQL uses indexes to quickly sort and group order by and group by statements.

· Sometimes MySQL uses indexes to read all the information obtained from the query. If you select the indexed value column in The MyISAM table, you do not need to select other data columns from the data table. In this case, MySQL reads the index value from the index file, and the obtained value is the same as the value obtained from the data file. There is no need to read the same value twice, so there is no need to consider the data file.

Index cost

In general, if MySQL can find a method and use indexes to process queries more quickly, it will do so. This means that in most cases, if you do not index a table, the performance will be compromised. This is the beauty of the index advantages I have depicted. But does it have disadvantages? Yes, it has overhead in time and space. In practice, the value of the advantages of indexes generally exceeds these disadvantages, but you should also know what are the disadvantages.

First, the index accelerates the retrieval speed, but slows down the insertion and deletion speed, and also slows down the update of the value in the indexed data column. That is to say, the index slows down most write operations. This occurs because not only data rows need to be written but all indexes need to be changed when a record is written. The more indexes a data strap has, the more modifications it needs to make, and the higher the average performance. In the "efficient loading of Data" section of this article, we will learn about these phenomena in more detail and find a solution.

Second, indexes consume disk space, and multiple indexes consume more disk space. This may result in faster data table size restrictions:

· For MyISAM tables, frequent indexing may cause the maximum number of index files to be reached faster than data files.

· For a bdb table, it stores data and index values in the same file. Adding an index causes the table to reach the maximum file limit faster.

· All Tables allocated in the InnoDB shared tablespace compete for the same public space pool, so adding indexes will consume storage in the tablespace more quickly. However, unlike the files used by MyISAM and BDB tables, the InnoDB shared tablespace is not limited by the file size of the operating system, because we can configure it to use multiple files. As long as there is extra disk space, you can expand the tablespace by adding new components.

The InnoDB and BDB tables that use separate tablespace are subject to the same constraints, because their data and index values are stored in a single file.

The actual meaning of these elements is: if you do not need to use special indexes to help query execution faster, do not create an index.

Select Index

Suppose you already know the index creation syntax, but the syntax does not tell you how the data table should be indexed. This requires us to consider how to use the data table. This section describes how to identify alternative data columns used for indexing and how to create an index best:

Index data columns used for searching, sorting, and grouping are not only used for output display. In other words, the best alternative data columns for indexing are those that appear in the WHERE clause, join clause, order by, or group by clause. The data column in the output data column list that appears only after the SELECT keyword is not a good alternative column:

SELECT
Col_a <-not an alternative Column
FROM
Tbl1 left join tbl2
ON tbl1.col _ B = tbl2.col _ c <-alternative Column
WHERE
Col_d = expr; <-optional Columns

Of course, the data columns displayed may be the same as the data columns used in the where clause. In our opinion, the data column in the output list is not a good alternative column for indexing.

Join clause

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More