Abstract: This article will introduce how to optimize the performance of the database system to process various queries as quickly as possible. The function of indexes in optimizing queries includes the principle of optimizing indexes. indexes are the most commonly used and most effective method for optimizing queries. A data table, especially a table with a large capacity, creating an appropriate index will greatly increase the query speed. This article will try to explain and give examples of various functions of MySQL indexes.
Use of MySQL Indexes
We will first discuss the index, because it is the most important tool to speed up the query. There are other technologies for accelerating queries, but the most effective one is to properly use indexes. On the MySQL Mail List, people usually ask questions about faster queries. In many cases, because there is no index on the table, the problem can be solved immediately by adding an index. But this is not always effective, because optimization is not always that simple. However, if indexes are not used, in many cases, it is a waste of time to improve performance by other means. You should first consider using indexes to achieve maximum performance improvement, and then seek other technologies that may be helpful.
This section describes what an index is, how it improves query performance, when the index may degrade performance, and how to select an index for a table. In the next section, we will discuss the MySQL query optimization program. In addition to how to create an index, it is also good to know some optimization procedures, because it can make better use of the created index. Some methods for writing a query will actually impede the indexing effect, so this situation should be avoided. (Though not always. Sometimes you want to ignore the role of the optimization program. We will also introduce these situations .)
Impact of indexes on queries of a single table
The index is used to quickly find rows with a specific value in a column. Without an index, MySQL has to start with the first record and then read the entire table until it finds the relevant rows. The larger the table, the more time it takes. If the table has an index on the queried columns, MySQL can quickly find the data file at a location without considering all the data. If a table has 1000 rows, this is at least 100 times faster than sequential reading. Note that you need to access almost all 1000 rows, which can be read in a faster order, because at this time we avoid disk seek.
For example, for the following student table:
Mysql> SELECT * FROM student
+ ------ + --------- +
| Id | name | english | chinese | history |
+ ------ + --------- +
| 12 | Tom | 66 | 93 | 67 |
| 56 | Paul | 78 | 52 | 75 |
| 10 | Marry | 54 | 89 | 74 |
| 4 | Tina | 99 | 83 | 48 |
| 39 | William | 43 | 96 | 52 |
| 74 | Stone | 42 | 40 | 61 |
| 86 | Smith | 49 | 85 | 78 |
| 37 | Black | 49 | 63 | 47 |
| 89 | White | 94 | 31 | 52 |
+ ------ + --------- +
In this way, when we try to perform a specific query on it, we have to perform a full table scan, which is very slow. For example, we can find all students whose english scores fail:
Mysql> SELECT name, english FROM student WHERE english <60;
+ --------- +
| Name | english |
+ --------- +
| Marry | 54 |
| William | 43 |
| Stone | 42 |
| Smith | 49 |
| Black | 49 |
+ --------- +
The WHERE clause must match each record to check whether the condition is met. This small table may not have much impact. However, for a large table, such as a very large school, we may need to store thousands of records, and such a retrieval takes a considerable amount of time.
If we create an index for the english column:
Mysql> alter table student add index (english );
+ ------------------- +
| Index for english |
+ ------------------- +
| 42 |
| 43 |
| 49 |
| 49 |
| 54 |
| 66 |
| 78 |
| 94 |
| 99 |
+ ------------------- +
As shown in the preceding table, this index is stored in the index file and contains the english Column values of each row in the table. However, this index is sorted based on english. Currently, you do not need to search for matching clauses in the entire table row by row, but can search by index. If we want to find all rows with scores less than 60, we can scan the index and obtain 5 rows. Then we reach the row with a score of 66 and the Tom record, which is a value greater than what we are looking. The index values are sorted. Therefore, when reading a record containing Tom, we know that no matching record exists and can exit. If you look for a value that does not appear before an intermediate point in the index table, you can also find its first location algorithm that matches the index, instead of sequential table scanning (such as binary search ). In this way, you can quickly locate the first matched value to save a lot of search time. Databases use a variety of technologies to quickly locate index values. These technologies are not important. What is important is that they work normally and indexing technology is a good thing.
Therefore, the following query is executed:
Mysql> SELECT name, english FROM user WHERE english <60;
The result is:
+ --------- +
| Name | english |
+ --------- +
| Stone | 42 |
| William | 43 |
| Smith | 49 |
| Black | 49 |
| Marry | 54 |
+ --------- +
You can see that this result is different from that before the english column is not indexed. It is sorted because it is officially described above.
Impact of indexes on queries of multiple tables
The preceding discussion describes the benefits of indexes in a single table query. Using indexes eliminates full table scans, greatly accelerating the search speed. Indexes are even more valuable when performing join queries involving multiple tables. In a single table query, the number of values to be viewed in each column is the number of rows in the table. In queries of multiple tables, the number of combinations may be large, because this number is the product of the number of rows in each table.
Assume that three unindexed tables t1, t2, and t3 contain only columns c1, c2, and c3, and each table is composed of 1000 rows containing numbers 1 to 1000. The query for table row combinations with equal values is as follows:
The query result should be 1000 rows. Each combination contains three equal values. If we process this query without indexing, it is impossible to know which rows contain those values. Therefore, you must find all the combinations to obtain those that match the WHERE clause. The number of possible combinations is 1000 × 1000 × 1000 (billion), 1 million times more than the number of matches. A lot of work is wasted, and the query will be very slow, even if it is executed in a database as fast as MySQL. In this case, there are only 1000 rows in each table. What if there are 1 million rows in each table? Obviously, this will produce extremely low performance results. If you index each table, the query process can be greatly accelerated, because the query process using the index is as follows:
1) Select the first row from Table t1 to view the values contained in this row.
2) use the index on table t2 to directly jump to the row in t2 that matches the value from t1. Similarly, the indexes on table t3 are used to directly jump to the rows in Table t3 that match the values from t1.
3) enter the next row of table t1 and repeat the previous process until all the rows in Table t1 have been checked.
In this case, we still perform a full scan on table t1, but we can perform index search on table t2 and table t3 to directly retrieve the rows in these tables. In principle, the query speed is 1 million times faster than that of unused indexes.
As mentioned above, MySQL uses indexes to accelerate the search of rows matching the condition in the WHERE clause, or accelerate the search of rows matching the rows in other tables when executing the join operation.
Impact of Multi-column indexes on queries
Assume that you have issued the following SELECT statement:
Mysql> SELECT * FROM tbl_name WHERE col1 = val1 AND col2 = val2;
If a multi-column index exists on col1 and col2, appropriate rows can be taken out directly. If the separate single-row and column indexes exist on col1 and col2, the optimizer tries to determine which index will find fewer rows and find more restrictive indexes and use this index to retrieve rows.
You can create a multi-column index as follows:
Mysql> alter table tbl_name add index (col1, col2 );
In this way, you should create a separate single-row-column index:
mysql>ALTER TABLE tble_name ADD INDEX(col1);mysql>ALTER TABLE tble_name ADD INDEX(col1);
If a table has multiple column indexes, any leftmost index prefix can be used by the optimizer for travel. For example, if you have a 3-column index (col1, col2, col3), you have indexed it in (col1), (col1, col2), and (col1, col2, col3).
If the column does not constitute the leftmost prefix of the index, MySQL cannot use a partial index. Assume the SELECT statement shown below:
mysql> SELECT * FROM tbl_name WHERE col1=val1;mysql> SELECT * FROM tbl_name WHERE col2=val2;mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), the index is used only for the first query shown above. The second and third queries do contain indexed columns, but (col2) and (col2, col3) are not the leftmost prefixes of (col1, col2, col3.
If the LIKE parameter is a constant string that does not start with a wildcard character, MySQL also uses indexes for LIKE comparison. For example, the following SELECT statement uses an index:
mysql> select * from tbl_name where key_col LIKE "Patrick%";mysql> select * from tbl_name where key_col LIKE "Pat%_ck%";
In the first statement, only the rows with "Patrick" <= key_col <"Patricl" are considered. In the second statement, only the rows with "Pat" <= key_col <"Pau" are considered.
The following SELECT statements do not use indexes:
mysql> select * from tbl_name where key_col LIKE "%Patrick%";mysql> select * from tbl_name where key_col LIKE other_col;
In the first statement, the LIKE value starts with a wildcard character. In the second statement, the LIKE value is not a constant.
If column_name IS an index, column_name is null.
MySQL usually uses an index to find the minimum number of rows. An index is used for the column you compared with the following operators: =,>, >=, <, <=, BETWEEN, and a column with a non-Wildcard prefix LIKE 'something %.
For a multi-column index, if indexes are used at all AND layers of the WHERE clause, the index is not used to optimize the query. To optimize queries using indexes, you must use the prefix of one or more columns of indexes in one AND condition group.
The following WHERE clause uses indexes:
... WHERE index_part1 = 1 AND index_part2 = 2... WHERE index = 1 or a = 10 AND index = 2/* index = 1 OR index = 2 */... WHERE index_part1 = 'hello' AND index_part_3 = 5/* optimized like "index_part1 = 'hello'" */These WHERE clauses do not use indexes :... WHERE index_part2 = 1 AND index_part3 = 2/* index_part_1 is not used */... WHERE index = 1 or a = 10/* No index */... WHERE index_part1 = 1 OR index_part2 = 10/* No index spans all rows */
Functions of MySQL Indexes
All MySQL indexes (PRIMARY, UNIQUE, and INDEX) are stored in Tree B. The string automatically compresses the prefix and end space. Create index syntax.
Indexes are used:
Quickly finds the rows matching a WHERE clause.
When you query multiple tables, you can perform a connection to accelerate the search for rows that match the rows in other tables.
Find the MAX () or MIN () value for a specific index column.
If sorting or grouping is performed on the leftmost prefix of an available index (for example, order by key_part_1, key_part_2), sort or group a table. If all key values follow DESC, the key is read in reverse order.
In some cases, a query can be optimized to retrieve values without consulting data files. If all columns used for some tables are numeric and constitute the leftmost prefix of some keys, the values can be retrieved from the index tree for faster speed.
Disadvantages of Indexing
In general, if MySQL can know how to use indexes to process queries faster, it will do so. This means that, in most cases, if you do not index a table, it damages your own interests. We can see that the author depicts the many benefits of indexing. But is there any disadvantage? Yes, yes. In fact, these shortcomings are covered by advantages, but they should be understood.
First, index files occupy disk space. If there are a large number of indexes, the index file may reach the maximum file size faster than the data file. Secondly, the index file accelerates the retrieval, but increases the insertion and deletion, and the time to update the value in the index column (that is, reduces the time for most write operations ), because write operations involve not only data rows, but also indexes. The more indexes a table has, the larger the average performance of write operations is. In section 8.4.4, record the loading and modification speed, we will introduce these performance problems in more detail and discuss how to solve them.
Index selection criteria
The syntax for creating an index is described in 4.5 index attributes. Here, we assume that you have read this section. However, knowing the syntax does not help determine how the table is indexed. To determine how a table is indexed, you must consider the table usage. This section describes how to determine and select an index column:
1. The index column to be searched is not necessarily the column to be selected
In other words, the column most suitable for indexing is the column that appears in the WHERE clause or the column specified in the join clause, rather than the column in the selection list after the SELECT keyword, for example:
SELECTcol_a distinct is not suitable for cooperating with index columns FROMTbl1 left join tbl2ON tbl1.col _ B = tbl2.col _ c Regular is suitable for indexing columns WHEREcol_d = expr distinct
Of course, the selected columns and columns used for the where clause may also be the same. The key is to list the columns that should not be indexed in the current selection list.
Columns that appear in the join clause or in expressions such as col1 = col2 are suitable for indexing. The example of col_ B and col_c in the query is as follows. If MySQL can use the join column to optimize a query, it means that it substantially reduces the combination of table rows by eliminating full table scans.
2. Use a unique index
Consider the distribution of values in a column. For columns with unique values, the index effect is the best, while for columns with multiple duplicate values, the index effect is the worst. For example, columns with different age values can easily distinguish rows. The columns used to record gender only contain "M" and "F", so it is not very useful to index this column (no matter which value is searched, it will produce about half of the rows ).
3. Use short Indexes
If you index a string or column, you should specify a prefix length. If possible, you should do this. For example, if a CHAR (200) Column exists and multiple values are unique within the first 10 or 20 characters, do not index the entire column. Indexing the first 10 or 20 characters can save a lot of index space and make the query faster. A smaller index involves less disk I/O, and a shorter value is faster. More importantly, for shorter key values, the index cache blocks can accommodate more key values. Therefore, MySQL can also accommodate more values in the memory. This increases the possibility of finding rows without reading more data from the index. (Of course, some common sense should be used. If you only use the first character of the column value for indexing, it is impossible to have much benefit, because there will not be many different values in this index .)
4. Use the leftmost prefix
When creating an index with n columns, you actually created n indexes available for MySQL. Multi-column indexes can act as several indexes, because the leftmost column set in the index can be used to match rows. Such a column set is called the leftmost prefix. (This is different from the prefix of an indexed column. The prefix of an indexed column uses the first n characters as the index value .)
Assume that a table has an index on the three columns named state, city, and zip respectively. The rows in the index are stored in the order of state/city/zip. Therefore, the rows in the index are automatically stored in the order of state/city and state. This means that MySQL can use indexes even if only the state value or the city value is specified in the query. Therefore, this index can be used to search for the following column combinations:
MySQL cannot search with no left prefix. For example, if you search by city or zip, you cannot use this index. If you want to search for a state and a zip code (column 1 and column 3 in the index), this index cannot be used for combination of corresponding values. However, indexes can be used to find rows that match the State to reduce the search range.
5. Do not over-Indexing
Do not think that the index is "more and better". It is wrong to use indexes for anything. Each additional index takes up extra disk space and reduces write performance. We have already discussed this. When modifying the table content, the index must be updated and may sometimes need to be reconstructed. Therefore, the more indexes, the longer the time it takes. If an index is rarely used or never used, the modification speed of the table is not necessarily slowed down. In addition, it takes time for MySQL to consider indexes when generating an execution plan. Creating redundant indexes brings more work to query optimization. Too many indexes may cause MySQL to fail to select the best index to be used. Maintaining only the required indexes is conducive to query optimization.
To add an index to an indexed table, consider whether the index to be added is the leftmost index of an existing multi-column index. If yes, you do not need to add this index because it already exists.
6. Comparison types in Columns
Indexes can be used for operations such as <, <= "," = ","> = ","> ", and. When the pattern has a direct volume prefix, the index is also used for the LIKE operation. If you only use a column for other types of operations (such as STRCMP (), there is no value for indexing it.
Summary
This section describes the functions of indexes in optimizing queries, including the principle of index optimization queries, the benefits of indexes in various situations, and the disadvantages of indexes: increased storage space, this slows down data loading.
Index is the most commonly used and most effective way to optimize queries. Creating an appropriate index for a data table, especially a table with a large capacity, will greatly increase the query speed.