Using indexes for MySQL query optimization technology lectures

Source: Internet
Author: User
Tags mysql query optimization

Index is the most important tool for improving the query speed. Of course there are other technologies available for use, but in general, the biggest performance difference is the correct use of indexes. In the MySQL Mail List, people often ask for methods that make queries run faster. In most cases, we should doubt whether there are any indexes on the data table, and usually solve the problem immediately after adding the index. Of course, it is not always so simple to solve the problem, because the optimization technology is not always simple. However, if indexes are not used, it is a waste of time in many cases to try other methods to improve performance. First, use indexes to obtain the maximum performance improvement, and then check whether other technologies are useful.

This section describes what the index is and how the index improves the query performance. It also discusses the potential performance reduction of indexes in some environments and provides some guidelines for you to choose data table indexes wisely. In the next section, we will discuss the MySQL query optimizer, which tries to find the most efficient way to execute the query. It is good for us to know some optimizer knowledge and supplement how to create indexes, because in this way, you can make better use of the indexes you have created. Some methods for writing a query actually make the index ineffective. In general, you should avoid this situation.

Advantages of Indexes

Let's get to know how indexes work. First, there is a data table without indexes. A table without indexes is just a disordered set of data rows. For example, the ad table shown in Figure 1 is a table without indexes. To find a specific company, you must check each data row in the table to see if it matches the target value. This results in a full data table scan. This process is very slow. If the table is large but contains only a small number of qualified records, the efficiency will be very low.

 

 

 

 

Figure 1: non-indexed ad table

Figure 2 shows the same data table, but adds an index to the company_num data column of the AD table. This index contains entries for each data row in the ad table, but the index entries are sorted by the company_num value. Now we use indexes instead of viewing data entries row by row to search for matching data items. Suppose we look for all the data rows of company 13. We began to scan the index and found three values for the company. Then we met the index value of company 14, which is larger than the value we are searching. The index value is out of order. Therefore, when we read an index record containing 14, we will know that there will no more matching records and we can end the query operation. Therefore, the indexing function is that we find where the matched data row ends and can ignore other data rows. Another function is to use the positioning algorithm to search for the first matched entry, instead of starting from the index header for linear scanning (for example, binary search is faster than linear scanning ). By using this method, we can quickly locate the first matched value, saving a lot of search time. Databases use a variety of technologies to quickly locate index values, but we do not care about these technologies in this article. The point is that they can be implemented, and indexing is a good thing.

 

 

 

Figure 2: Indexed ad table

You may ask, why don't we sort data rows to save the index? Can the same search speed be improved? Yes. If the table has only one index, the same effect may be achieved. However, you may add a second index, so you cannot sort the data rows in two different ways at a time (for example, you may want to create an index on the customer name, create another index on the customer ID or phone number ). This problem is solved by separating entries from data rows as indexes, allowing us to create multiple indexes. In addition, the rows in the index are generally shorter than the data rows. When you insert or delete new values, it is easier to move shorter index values than to move the sorting order of long data rows.

The details of index implementation vary with MySQL storage engines. For example, for a MyISAM data table, the data row of the table is saved in a data file, and the index value is saved in the index file. A data table may have multiple indexes, but they are all stored in the same index file. Each index in the index file contains an array of sorted key records (which are used to quickly access data files.

In contrast, the bdb and InnoDB Storage engines do not use this method to separate data rows and index values, although they also use indexes as sorted Value Sets for operations. By default, the bdb engine uses a single file to store data and index values. InnoDB uses a single table space to manage the data and index storage of all InnoDB tables in the tablespace. We can configure InnoDB for each table to be created in its own tablespace, but even so, the data and indexes of the data table are stored in the same tablespace file.

The previous discussion described the advantages of indexes in a single table query environment. In this case, by reducing the scanning of the entire table, the indexing speed is significantly improved. When you run a query involving multi-table join (jion), the value of the index is higher. In a single table query, the number of values you need to check on each data column is the number of data rows in the table. In multi-table queries, this quantity may increase significantly because this quantity is produced by the number of data rows in these tables.

Assume that you have three unindexed tables T1, T2, and T3. Each table contains the i1, I2, and I3 data columns, and each table contains 1000 data rows, the serial number ranges from 1 to 1000. The query for combinations of data rows that match some values may be as follows:

SELECT t1.i1, t2.i2, t3.i3
FROM t1, t2, t3
WHERE t1.i1 = t2.i2 AND t2.i1 = t3.i3;

The query result should be 1000 rows. Each Data row contains three equal values. If the query is processed without an index, we cannot know which data rows contain the values if we do not scan all these tables. Therefore, you must try all combinations to find records that meet the where condition. The number of possible combinations is 1000x1000x1000 (1 billion !), It is 1 million times the number of matched records. This wastes a lot of work. This example shows that if no index is used, as the table records keep increasing, the time spent in processing the join of these tables increases faster, resulting in poor performance. We can index these data tables to significantly increase the speed, because the index allows queries to be processed in the following way:

1. Select the first row in Table T1 and view the value of the data row.

2. Use the index on table T2 to directly locate the data row that matches the value of T1. Similarly, you can use the index on table T3 to directly locate the data row that matches the value of table T2.

3. process the next row of table T1 and repeat the previous process. Perform this operation until all data rows in T1 are checked.

In this case, we still perform a complete scan on table T1, but we can perform index search on table T2 and table T3 to retrieve data rows directly from these tables. Theoretically, running the above query in this way will be 1 million times faster. Of course, this example is created to draw conclusions. However, it solves the problem of reality. Adding an index to a table without an index usually results in an astonishing performance improvement.

MySQL has several indexing methods:

· As mentioned above, indexes are used to increase the search speed for data rows matching other tables in the where condition or performing join operations.

· For queries using min () or max () functions, the minimum or maximum values in the index data column can be quickly found without checking each data row.

· MySQL uses indexes to quickly sort and group order by and group by statements.

· Sometimes MySQL uses indexes to read all the information obtained from the query. If you select the indexed value column in The MyISAM table, you do not need to select other data columns from the data table. In this case, MySQL reads the index value from the index file, and the obtained value is the same as the value obtained from the data file. There is no need to read the same value twice, so there is no need to consider the data file.

Index cost

In general, if MySQL can find a method and use indexes to process queries more quickly, it will do so. This means that in most cases, if you do not index a table, the performance will be compromised. This is the beauty of the index advantages I have depicted. But does it have disadvantages? Yes, it has overhead in time and space. In practice, the value of the advantages of indexes generally exceeds these disadvantages, but you should also know what are the disadvantages.

First, the index accelerates the retrieval speed, but slows down the insertion and deletion speed, and also slows down the update of the value in the indexed data column. That is to say, the index slows down most write operations. This occurs because not only data rows need to be written but all indexes need to be changed when a record is written. The more indexes a data strap has, the more modifications it needs to make, and the higher the average performance. In the "efficient loading of Data" section of this article, we will learn about these phenomena in more detail and find a solution.

Second, indexes consume disk space, and multiple indexes consume more disk space. This may result in faster data table size restrictions:

· For MyISAM tables, frequent indexing may cause the maximum number of index files to be reached faster than data files.

· For A bdb table, it stores data and index values in the same file. Adding an index causes the table to reach the maximum file limit faster.

· All Tables allocated in the InnoDB shared tablespace compete for the same public space pool, so adding indexes will consume storage in the tablespace more quickly. However, unlike the files used by MyISAM and bdb tables, the InnoDB shared tablespace is not limited by the file size of the operating system, because we can configure it to use multiple files. As long as there is extra disk space, you can expand the tablespace by adding new components.

The InnoDB and bdb tables that use separate tablespace are subject to the same constraints, because their data and index values are stored in a single file.

The actual meaning of these elements is: if you do not need to use special indexes to help query execution faster, do not create an index.

Select Index

Suppose you already know the index creation syntax, but the syntax does not tell you how the data table should be indexed. This requires us to consider how to use the data table. This section describes how to identify alternative data columns used for indexing and how to create an index best:

Index data columns used for searching, sorting, and grouping are not only used for output display. In other words, the best alternative data columns for indexing are those that appear in the WHERE clause, join clause, order by, or group by clause. The data column in the output data column list that appears only after the select keyword is not a good alternative column:

Select
Col_a <-not an alternative Column
From
Tbl1 left join tbl2
On tbl1.col _ B = tbl2.col _ c <-alternative Column
Where
Col_d = expr; <-optional Columns

Of course, the data columns displayed may be the same as the data columns used in the where clause. In our opinion, the data column in the output list is not a good alternative column for indexing.

Data columns in expressions similar to col1 = col2 In the join clause or where clause are particularly good index alternative columns. The example of col_ B and col_c in the preceding query is as follows. If MySQL can use joined columns to optimize queries, it will significantly reduce potential table-row combinations by reducing the full table scan.

Consider the cardinality of the data column ). The base number is the number of different values contained in the Data column. For example, if a data column contains values 1, 3, 7, 4, 7, and 3, the base number is 4. When the base of an index is higher than the number of rows in a data table (that is, a column contains many different values with few repeated values), it works best. If a data column contains many different ages, the index will quickly identify the data rows. If a data column is used to record gender (only "M" and "F" values), the index is of little use. If the probability of a value appearing is almost equal, half of the data rows may be obtained no matter which value is searched. In these cases, it is best not to use indexes at all, because when the query optimizer finds that a value has a high percentage in the table's data rows, it generally ignores the index, scan the entire table. The usual percentage line is "30% ". Now the query optimizer is more complex and other factors are taken into account. Therefore, this percentage is not the only factor for MySQL to choose whether to use scan or index.

The value with a short index. Use smaller data types as much as possible. For example, if mediumint is enough to save the value you want to store, do not use the bigint data column. If your value is no longer than 25 characters, do not use char (100 ). Small values improve the indexing processing speed through several aspects:

· A shorter value can be compared faster, so the index search speed is faster.

· A smaller value leads to a smaller index and requires less disk I/O.

· When a shorter key value is used, the index block in the key cache can save more key values. MySQL can maintain more keys in the memory at a time and increase the possibility of key-value locating without reading additional index blocks from the disk.

For storage engines such as InnoDB and bdb that use clustered indexes, the advantage of keeping the primary key short is more prominent. Data rows and primary key values in the clustered index are stored together ). Other indexes are secondary indexes, which store primary key values and secondary index values. Secondary indexes bend to primary key values, which are used to locate data rows. This implies that the primary key value is copied to each secondary index. Therefore, if the primary key value is long, more extra space is required for each secondary index.

Prefix of the index string value (prefixe ). If you need to index A String data column, it is best to specify the prefix length in any appropriate case. For example, if a char (200) Data Column exists, if the first 10 or 20 characters are different, do not index the entire data column. The first 10 or 20 characters in the index will save a lot of space and may make your query faster. By using a shorter index value, you can obtain the benefits related to comparison speed and disk I/O savings. Of course, you also need to use common sense. Only indexing the first string of a data column may be of little use, because if this operation is performed, there will not be too many unique values in the index.

You can index the prefixes of char, varchar, binary, varbinary, blob, and text data columns.

Use the leftmost (leftmost) prefix. When you create a multi-Column Composite Index, you actually create multiple indexes that MySQL can use. Composite indexes can be used as Multiple indexes, because the leftmost column set in the index can be used to match data rows. This type of column set is called "leftmost prefix" (it is different from the prefix of an index column, which uses the first few characters of a column as the index value ).

Suppose you have created a composite index on the table's state, city, and zip data columns. Data rows in the index are arranged in the State/city/zip order, so they are automatically arranged in the order of State/city and state. This means that MySQL can use this index even if you specify only the state value or the state and city value in the query. Therefore, this index can be used to search for a combination of data columns as follows:

state, city, zip
state, city
state

MySQL cannot use this index to search for content without the leftmost prefix. For example, if you search by city or zip, this index will not be used. If you search for the given state and the specific ZIP code (1 and 3 columns of the index), the index cannot be used for this combination value, although MySQL can use indexes to find matching States to narrow the search range.

Do not index too much. Do not think that "the more indexes, the higher the performance", do not index each data column. As we mentioned earlier, each additional index will spend more disk space and reduce the write operation performance. When you modify the table content, the index must be updated or even reorganized. If your indexes are rarely used or never used, you do not need to reduce the modification speed of the table. In addition, MySQL considers indexes when generating execution plans for retrieval operations. Creating additional indexes will increase the workload for the query optimizer. If there are too many indexes, it is possible (not necessarily) That MySQL fails to select the optimal index. Maintaining the required indexes can help the query optimizer to avoid such errors.

If you want to add an index to a table that has already been indexed, you need to consider whether the index you want to add is the leftmost prefix of the existing multi-column index. If so, you don't need to add an index because you already have an index (for example, if you have created an index on state, city, and zip, you don't need to add a state index ).

Match the index type with the type of the comparison you executed. When you create an index, most storage engines choose the index they will use. For example, InnoDB usually uses B-tree indexes. MySQL also uses the B-tree index, which only uses the R-tree index on the 3D data type. However, the memory storage engine supports hash indexes and B-tree indexes, and allows you to select which index to use. To select an index type, you need to consider the comparison operation type performed on the index data column:

· For hash indexes, the hash function is applied to each data column value. The generated hash value is stored in the index and used for query. The algorithm implemented by the hash function is similar to generating different hash values for different input values. The advantage of using the hash value is that the hash value is more efficient than the original value. Hash indexes are used to perform exact matching such as = or <=> operations. However, the query results for the range of a value are very poor:

id < 30
weight BETWEEN 100 AND 150

· B-tree indexes can be used to efficiently perform precise or range-based operations (use Operations <, <=, =, >=,>, <>, and ,! = And. B-tree indexes can also be used for like pattern matching, provided that the pattern starts with a text string instead of a wildcard.

If the memory data table you are using only performs exact value query, the hash index is a good choice. This is the default index type used by the memory table, so you do not need to specify it. If you want to perform range-based comparison on the memory table, you should use the B-tree index. To specify this index type, you must add using btree to the index definition. For example:

CREATE TABLE lookup
(
id INT NOT NULL,
name CHAR(20),
PRIMARY KEY USING BTREE (id)
) ENGINE = MEMORY;

If the type of statements you want to execute is allowed, a single memory table can have both hash indexes and B-tree indexes, even on the same data column.

Indexes cannot be used for some types of comparisons. If you just pass a value to a function (such as strcmp () to perform a comparison operation, then indexing it will have no value. The server must calculate the function value of each data row, which will exclude the use of the index on the data column.

Use slow-query logs to identify poorly executed queries. This log can be stored on the webpage, and the content of the log can be analyzed by the sensitive data. If a given query appears in the "slow query" log multiple times, this is a clue. A query may not be optimized. You can rewrite it to make it run faster. Remember that when evaluating "Slow query" logs, "slow" is determined based on the actual time, there are more queries in the "slow query" log on the server with a large load.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.