Mysql uses indexes for query optimization and mysql index Query Optimization
The purpose of indexing is to improve the query efficiency. It can be analogous to a dictionary. If you want to query the word "mysql", you must locate the m letter and find the y letter from the bottom down, find the remaining SQL. If there is no index, you may need to read all the words to find what you want.
1. Advantages of Indexes
Assume that you have three unindexed tables t1, t2, and t3. Each table contains the i1, i2, and i3 data columns, and each table contains 1000 data rows, the serial number ranges from 1 to 1000. The query for combinations of data rows that match some values may be as follows:
SELECT t1.i1, t2.i2, t3.i3FROM t1, t2, t3WHERE t1.i1 = t2.i2 AND t2.i1 = t3.i3;
The query result should be 1000 rows. Each Data row contains three equal values. If the query is processed without an index, we cannot know which data rows contain the values if we do not scan all these tables. Therefore, you must try all combinations to find records that meet the WHERE condition. The number of possible combinations is 1000x1000x1000 (1 billion !), It is 1 million times the number of matched records. This wastes a lot of work. This example shows that if no index is used, as the table records keep increasing, the time spent in processing the join of these tables increases faster, resulting in poor performance. We can index these data tables to significantly increase the speed, because the index allows queries to be processed in the following way:
1. Select the first row in Table t1 and view the value of the data row.
2. Use the index on table t2 to directly locate the data row that matches the value of t1. Similarly, you can use the index on table t3 to directly locate the data row that matches the value of table t2.
3. process the next row of table t1 and repeat the previous process. Perform this operation until all data rows in t1 are checked.
In this case, we still perform a complete scan on table t1, but we can perform index search on table t2 and table t3 to retrieve data rows directly from these tables. Theoretically, running the above query in this way will be 1 million times faster. Of course, this example is created to draw conclusions. However, it solves the problem of reality. Adding an index to a table without an index usually results in an astonishing performance improvement.
-
2. index cost
First, the index accelerates the retrieval speed, but slows down the insertion and deletion speed, and also slows down the update of the value in the indexed data column. That is to say, the index slows down most write operations. This occurs because not only data rows need to be written but all indexes need to be changed when a record is written. The more indexes a data strap has, the more modifications it needs to make, and the higher the average performance. In the "loading data efficiently" section of this article, we will learn about these phenomena in more detail and find a solution.
Second, indexes consume disk space, and multiple indexes consume more disk space. This may result in faster data table size restrictions:
· For MyISAM tables, frequent indexing may cause the maximum number of index files to be reached faster than data files.
· For a bdb table, it stores data and index values in the same file. Adding an index causes the table to reach the maximum file limit faster.
· All Tables allocated in the InnoDB shared tablespace compete for the same public space pool, so adding indexes will consume storage in the tablespace more quickly. However, unlike the files used by MyISAM and BDB tables, the InnoDB shared tablespace is not limited by the file size of the operating system, because we can configure it to use multiple files. As long as there is extra disk space, you can expand the tablespace by adding new components.
The InnoDB and BDB tables that use separate tablespace are subject to the same constraints, because their data and index values are stored in a single file.
The actual meaning of these elements is: if you do not need to use special indexes to help query execution faster, do not create an index.
3. Select an index
Suppose you already know the index creation syntax, but the syntax does not tell you how the data table should be indexed. This requires us to consider how to use the data table. This section describes how to identify alternative data columns used for indexing and how to create an index best:
Index data columns used for searching, sorting, and grouping are not only used for output display. In other words, the best alternative data columns for indexing are those that appear in the WHERE clause, join clause, order by, or group by clause. The data column in the output data column list that appears only after the SELECT keyword is not a good alternative column:
SELECTcol_a <-not an alternative column FROMtbl1 left join tbl2ON tbl1.col _ B = tbl2.col _ c <-alternative column WHEREcol_d = expr; <-alternative Column
Of course, the data columns displayed may be the same as the data columns used in the where clause. In our opinion, the data column in the output list is not a good alternative column for indexing.
Data columns in expressions similar to col1 = col2 In the Join clause or WHERE clause are particularly good index alternative columns. The example of col_ B and col_c in the preceding query is as follows. If MySQL can use joined columns to optimize queries, it will significantly reduce potential table-row combinations by reducing the full table scan.
Consider the cardinality of the data column ). The base number is the number of different values contained in the Data column. For example, if a data column contains values 1, 3, 7, 4, 7, and 3, the base number is 4. When the base of an index is higher than the number of rows in a data table (that is, a column contains many different values with few repeated values), it works best. If a data column contains many different ages, the index will quickly identify the data rows. If a data column is used to record gender (only "M" and "F" values), the index is of little use. If the probability of a value appearing is almost equal, half of the data rows may be obtained no matter which value is searched. In these cases, it is best not to use indexes at all, because when the query optimizer finds that a value has a high percentage in the table's data rows, it generally ignores the index, scan the entire table. The usual percentage line is 30% ". Now the query optimizer is more complex and other factors are taken into account. Therefore, this percentage is not the only factor for MySQL to choose whether to use scan or index.
The value with a short index. Use smaller data types as much as possible. For example, if MEDIUMINT is enough to save the value you want to store, do not use the BIGINT data column. If your value is no longer than 25 characters, do not use CHAR (100 ). Small values improve the indexing processing speed through several aspects:
· A shorter value can be compared faster, so the index search speed is faster.
· A smaller value leads to a smaller index and requires less disk I/O.
· When a shorter key value is used, the index block in the key cache can save more key values. MySQL can maintain more keys in the memory at a time and increase the possibility of key-value locating without reading additional index blocks from the disk.
For storage engines such as InnoDB and BDB that use clustered indexes, the advantage of keeping the primary key short is more prominent. Data rows and primary key values in the clustered index are stored together ). Other indexes are secondary indexes, which store primary key values and secondary index values. Secondary indexes bend to primary key values, which are used to locate data rows. This implies that the primary key value is copied to each secondary index. Therefore, if the primary key value is long, more extra space is required for each secondary index.
Prefix of the index string value (prefixe ). If you need to index A String data column, it is best to specify the prefix length in any appropriate case. For example, if a CHAR (200) Data Column exists, if the first 10 or 20 characters are different, do not index the entire data column. The first 10 or 20 characters in the index will save a lot of space and may make your query faster. By using a shorter index value, you can obtain the benefits related to comparison speed and disk I/O savings. Of course, you also need to use common sense. Only indexing the first string of a data column may be of little use, because if this operation is performed, there will not be too many unique values in the index.
You can index the prefixes of CHAR, VARCHAR, BINARY, VARBINARY, BLOB, and TEXT data columns.
Use the leftmost (leftmost) prefix. When you create a multi-Column Composite Index, you actually create multiple indexes that MySQL can use. Composite indexes can be used as Multiple indexes, because the leftmost column set in the index can be used to match data rows. This type of column set is called the "leftmost prefix" (it is different from the prefix of a column of the index, which uses the first few characters of a column as the index value ).
Suppose you have created a composite index on the table's state, city, and zip data columns. Data rows in the index are arranged in the state/city/zip order, so they are automatically arranged in the order of state/city and state. This means that MySQL can use this index even if you specify only the state value or the state and city value in the query. Therefore, this index can be used to search for a combination of data columns as follows:
state, city, zipstate, citystate
MySQL cannot use this index to search for content without the leftmost prefix. For example, if you search by city or zip, this index will not be used. If you search for the given state and the specific ZIP code (1 and 3 columns of the index), the index cannot be used for this combination value, although MySQL can use indexes to find matching States to narrow the search range.
Do not index too much. Do not think that "the more indexes, the higher the performance", do not index each data column. As we mentioned earlier, each additional index will spend more disk space and reduce the write operation performance. When you modify the table content, the index must be updated or even reorganized. If your indexes are rarely used or never used, you do not need to reduce the modification speed of the table. In addition, MySQL considers indexes when generating execution plans for retrieval operations. Creating additional indexes will increase the workload for the query optimizer. If there are too many indexes, it is possible (not necessarily) That MySQL fails to select the optimal index. Maintaining the required indexes can help the query optimizer to avoid such errors.
If you want to add an index to a table that has already been indexed, you need to consider whether the index you want to add is the leftmost prefix of the existing multi-column index. If so, you don't need to add an index because you already have an index (for example, if you have created an index on state, city, and zip, you don't need to add a state index ).
Match the index type with the type of the comparison you executed. When you create an index, most storage engines choose the index they will use. For example, InnoDB usually uses B-tree indexes. MySQL also uses the B-tree index, which only uses the R-tree index on the 3D data type. However, the MEMORY storage engine supports hash indexes and B-tree indexes, and allows you to select which index to use. To select an index type, you need to consider the comparison operation type performed on the index data column:
· For hash indexes, the hash function is applied to each data column value. The generated hash value is stored in the index and used for query. The algorithm implemented by the hash function is similar to generating different hash values for different input values. The advantage of using the hash value is that the hash value is more efficient than the original value. Hash indexes are used to perform exact matching such as = or <=> operations. However, the query results for the range of a value are very poor:
id < 30weight BETWEEN 100 AND 150
· B-tree indexes can be used to efficiently perform precise or range-based operations (use Operations <, <=, =, >=,>, <>, and ,! = And. B-tree indexes can also be used for LIKE pattern matching, provided that the pattern starts with a text string instead of a wildcard.
If the MEMORY data table you are using only performs exact value query, the hash index is a good choice. This is the default index type used by the MEMORY table, so you do not need to specify it. If you want to perform range-based comparison on the MEMORY table, you should use the B-tree index. To specify this index type, you must add using btree to the index definition. For example:
CREATE TABLE lookup(id INT NOT NULL,name CHAR(20),PRIMARY KEY USING BTREE (id)) ENGINE = MEMORY;
If the type of statements you want to execute is allowed, a single MEMORY table can have both hash indexes and B-tree indexes, even on the same data column.
Indexes cannot be used for some types of comparisons. If you just pass a value to a function (such as STRCMP () to perform a comparison operation, then indexing it will have no value. The server must calculate the function value of each data row, which will exclude the use of the index on the data column.
Use slow-query logs to identify poorly executed queries. This log helps you find the queries that benefit from the index. You can directly view the log (it is a text file), or use the mysqldumpslow tool to count its content. If a given query appears in the "slow query" log multiple times, this is a clue. A query may not be optimized. You can rewrite it to make it run faster. Remember that when evaluating "Slow query" logs, "slow" is determined based on the actual time, there will be more queries in the slow query log on the server with a large load.
* 4. indexing principles *
4. 1. the leftmost prefix matching principle is very important. mysql always matches the right until it encounters a Range Query (>, <, between, like) to stop matching, for example, if a = 1 and B = 2 and c> 3 and d = 4 is created for an index in the order of (a, B, c, d), d cannot use the index, if an index (a, B, d, c) is created, the order of a, B, and d can be adjusted as needed.
4.2. = and in can be out of order. For example, a = 1 and B = 2 and c = 3 (a, B, c) indexes can be created in any order, the mysql query optimizer helps you optimize it into a recognizable form of indexes.
4. 3. select a column with a high degree of discrimination as the index. The formula for differentiation is count (distinct col)/count (*), indicating the proportion of fields that are not repeated. The larger the proportion, the fewer records we scan, the differentiation of the unique key is 1, while some state and gender fields may be 0 in front of big data. Someone may ask, is there any experience with this proportion? Different use cases make it hard to determine this value. Generally, we require more than 0.1 join fields, that is, to scan 10 records on average.
4. 4. index Columns cannot be used for calculation and keep the columns "clean". For example, if from_unixtime (create_time) = '2017-05-29 ', indexes cannot be used. The reason is very simple, the B + tree stores the Field Values in the data table. However, when searching, you must apply functions to all elements for comparison. Obviously, the cost is too high. Therefore, the statement should be written as create_time = unix_timestamp ('2017-05-29 ');
. Extend the index as much as possible. Do not create new indexes. For example, if the table already has an index of a and now you want to add an index of (a, B), you only need to modify the original index.
The above section describes how to optimize Mysql queries by using indexes. I hope it will be helpful to you. If you have any questions, please leave a message and I will reply to you in a timely manner. Thank you very much for your support for the help House website!