Mysql uses index to implement query optimization _mysql

Source: Internet
Author: User
Tags first string mysql query first row advantage

The goal of the index is to improve query efficiency, you can analogy dictionary, if you want to check the word "MySQL", we definitely need to locate the M letter, and then down from the bottom to find the Y letter, and then find the remaining SQL. If there is no index, then you may need to look through all the words to find what you want.

1. Advantages of indexing

Suppose you have three indexes of T1, T2, and T3, each containing data columns I1, I2, and i3, and each table contains 1000 rows of data, numbered 1 through 1000. Queries that find combinations of data rows that match some values might look like the following:

SELECT t1.i1, T2.i2, t3.i3 from
t1, T2, t3
where t1.i1 = t2.i2 and t2.i1 = T3.i3;

The result of this query should be 1000 rows, with three equal values for each data row. If this query is processed without an index, then if we do not scan all of these tables, we have no way of knowing which rows contain the values. So you have to try all the combinations to find records that match the Where condition. The number of possible combinations is 1000 x 1000 x 1000 (1 billion!). ), which is 1 million times times the number of matching records. This is a waste of a lot of work. This example shows that if the index is not used, as the record of the table grows, the time spent processing the joins of these tables grows faster, resulting in poor performance. We can significantly increase the speed by indexing these tables because the index allows the query to be handled in the manner shown below:

1. Select the first row in the table T1 and view the value of the data row.

2. Using the index on the table t2, navigate directly to the row of data that matches the value of the T1. Similarly, using the index on the table T3, navigate directly to the rows of data that match the values of the table T2.

3. Process the next line in the table T1 and repeat the previous procedure. Perform such an operation until all data rows in the T1 are checked.

In this case, we still perform a complete scan of the table T1, but we can perform an index lookup on T2 and T3 to get the data rows directly from those tables. In theory, running the query in this way is 1 million times times faster. Of course This example is to come to the conclusion that someone is established. However, the problem it solves is realistic, and adding indexes to tables that are not indexed is typically an amazing performance improvement.
-

2. Price of Index

First, the index accelerates the retrieval speed, but slows the insertion and deletion, while slowing down the values in the indexed data columns. In other words, the index slows down most of the speed involved in write operations. This behavior occurs because writing a record requires not only writing to the data row, but also changing all indexes. The more indexes you have on a datasheet, the more changes you need to make, and the greater the average performance. In the "Efficient Loading Data" section of this article, we will look at these phenomena in more detail and find out how to handle them.

Second, the index spends disk space, and multiple indexes spend more disk space accordingly. This can result in a faster arrival of the data table size limit:

· For MyISAM tables, frequent indexing can cause index files to reach the maximum limit faster than data files.

· For BDB tables, it stores data and index values in the same file, and adding indexes causes the table to reach the maximum file limit faster.

· All tables allocated in InnoDB shared table spaces compete for the same common space pool, so adding indexes can deplete storage in the tablespace more quickly. However, unlike the files used by the MyISAM and BDB tables, the InnoDB shared tablespace is not limited by the operating system's file size because we can configure it to use multiple files. As long as you have extra disk space, you can extend the tablespace by adding new components.

A InnoDB table that uses a separate tablespace has the same constraints as the BDB table, because its data and index values are stored in a single file.

The real meaning of these elements is that if you do not need to use a special index to help the query execute faster, do not build an index.

3. Select Index

Suppose you already know the syntax for indexing, but the syntax does not tell you how the data table should be indexed. This requires that we consider how the datasheet is used. This section guides you through how to identify alternative data columns for indexing, and how best to index:

The indexed data columns used for searching, sorting, and grouping are not just for output display. In other words, the best alternative data column for the index is those that appear in the WHERE clause, the JOIN clause, the order BY, or the GROUP BY clause. A data column in the list of output data columns that appears only after the SELECT keyword is not a good alternative column:

SELECT
col_a <-is not an alternative column from Tbl1 a left
JOIN tbl2 on
tbl1.col_b = Tbl2.col_c <-alternate column
WHERE
col _d = expr; <-Alternative Columns

Of course, the data columns displayed may also be the same as the data columns used in the WHERE clause. Our view is that the columns of data in the output list are not, in essence, good alternative columns for indexing.

Data columns in a JOIN clause or in an expression similar to the col1 = col2 form in a WHERE clause are exceptionally good index alternative columns. Col_b and Col_c in the query shown earlier are examples of this. If MySQL can use the join column to optimize the query, it will definitely reduce the potential table-row combination significantly by reducing the entire table scan.

Consider the cardinality of the data column (cardinality). Cardinality is the number of different values that a data column contains. For example, a data column contains values 1, 3, 7, 4, 7, 3, and its cardinality is 4. It works best when the cardinality of the index is higher than the number of rows in the datasheet (that is, there are many different values in the column, and there are few duplicate values). If a data column contains many different ages, the index can quickly distinguish between rows of data. If a data column is used to record sex (only "M" and "F" values), then the index is of little use. If the probability of a value appearing is almost equal, you may get half of the data rows regardless of which value you are searching for. In these cases, it is best not to use the index at all, because when the query optimizer finds that a value appears in a high percentage of the table's data row, it generally ignores the index and makes a full table scan. The customary percentage line is "30%". Now that the query optimizer is more complex and some other factors are taken into account, this percentage is not the only factor in MySQL's decision to choose whether to use a scan or an index.

An index with a shorter value. Use as small a data type as possible. For example, do not use bigint data columns if Mediumint is sufficient to save the value you need to store. If your value is no longer than 25 characters, do not use char (100). The smaller values improve the processing speed of the index in several ways:

· Shorter values can be compared faster, so indexes are faster to find.

· Smaller values result in smaller indexes, requiring less disk I/O.

· When using a shorter key value, the index block in the key cache can hold more key values. MySQL can hold more keys at once in memory, increasing the likelihood of key value positioning without having to read additional index blocks from disk.

For storage engines such as InnoDB and BDB, which use clustered indexes (clustered index), the advantage of maintaining a short primary key (primary key) is even more pronounced. Data rows and primary key values are stored together in a clustered index (clustered together). Other indexes are subordinate indexes, and they store primary key values and secondary index values. Subordinate indexes succumb to primary key values, which are used to locate data rows. This implies that primary key values are replicated to each secondary index, so that if the primary key value is very long, each secondary index requires more extra space.

Prefix of the index string value (Prefixe). If you need to index a string data column, it is best to specify the prefix length in any appropriate case. For example, if you have a char (200) data column, if the preceding 10 or 20 characters are different, do not index the entire data column. 10 or 20 characters in front of the index can save a lot of space and may make your query faster. By indexing a shorter value, you can get the benefits associated with comparing speed and disk I/O savings. Of course, you also need to use common sense. It may not be useful to index only the first string of a data column, because if you do so, there will not be too many unique values in the index.

You can index the prefixes of Char, VARCHAR, BINARY, VARBINARY, blob, and text data columns.

Use the leftmost (leftmost) prefix. When you create a multiple-column composite index, you actually create multiple indexes that MySQL can use. A composite index can be used as multiple indexes because the leftmost column collection in the index can be used to match rows of data. Such a set of columns is called the "leftmost prefix" (it is different from the prefix of a column in the index, which takes the first few characters of a column as the index value).

Suppose you set up a composite index on the state, city, and zip data columns of the table. The rows of data in the index are arranged in state/city/zip order, so they are automatically arranged in state/city and state order. This means that MySQL can use this index even if you specify only the state value in the query, or specify state and city values. Therefore, this index can be used to search for a combination of data columns as follows:

State, city, Zip State
.

MySQL cannot use this index to search for content that is not included in the leftmost prefix. For example, if you search by city or zip, you will not use this index. If you search for a given state and a specific zip code (1 and 3 columns of the index), the index cannot be used for this combination, although MySQL can use the index to find a matching state to narrow the search.

Do not index too much. Don't think "the more indexes, the higher the performance", do not index each data column. As we mentioned earlier, each additional index will cost more disk space and reduce the performance of the write operation. When you modify the contents of a table, the index must be updated and may even be collated. If your index is rarely used or never used, you do not need to reduce the speed at which the table is modified. In addition, MySQL takes an index into account when generating execution plans for retrieval operations. Creating additional indexes will add more effort to the query optimizer. If there are too many indexes, it is possible (not necessarily) that MySQL chooses the optimal index to fail. Maintaining your own necessary indexes can help the query optimizer to avoid such errors.

If you consider adding an index to a table that has already been indexed, consider whether the index you are adding is the leftmost prefix of an existing multiple-column index. If so, there is no need to add an index because there are already (for example, if you set up an index on state, city, and zip, then there is no point in adding the index of state).

Make the index type match the type of comparison you are performing. When you build an index, most storage engines choose which index implementations they will use. For example, InnoDB typically uses a B-tree index. MySQL also uses the B-tree index, which uses the R-tree index only on three-dimensional data types. However, the memory storage engine supports hash indexes and B-tree indexes, and allows you to choose which indexes to use. To select an index type, you need to consider the type of comparison operation that will be performed on the indexed data columns:

· For hash (hash) indexes, hash functions are applied on each data column value. The resulting hash value is stored in the index and used to execute the query. The hash function implements an algorithm that is similar to generating different hash values for different input values. The advantage of using a hash value is that the hash value is more efficient than the original value. Hash indexes are very fast when they are used to perform exact matches such as = or <=> operations. However, the effect on the scope of a query value is very poor:

IDs <
weight BETWEEN 150

· A B-tree index can be used to efficiently perform comparisons that are accurate or based on scope (using operations <, <=, =, >=, >, <>,!=, and between). A B-Tree index can also be used for like pattern matching, provided that the pattern begins with a literal string instead of a wildcard character.

The hash index is a good choice if you use a memory datasheet that only queries for exact values. This is the default index type used by the memory table, so you do not need to specify it specifically. If you want to perform a range based comparison on the memory table, you should use the B-tree index. To specify this type of index, you need to add a using btree to the index definition. For example:

CREATE TABLE Lookup
(
ID INT not NULL,
name CHAR (),
PRIMARY KEY USING btree (ID)
) ENGINE = MEMORY;

If the type of statement you want to execute allows, a single memory table can have both a hash index and a B-tree index, even on the same data column.

Some types of comparisons cannot use indexes. If you perform a comparison operation simply by passing the value to a function (such as strcmp ()), indexing it is of no value. The server must calculate the function value of each data row, which excludes the use of indexes on the data columns.

Use the slow query (slow-query) log to identify poorly performing queries. This log can help you find out which queries benefit from the index. You can view the log directly (it is a text file), or use the Mysqldumpslow tool to count its contents. If a given query appears in the slow query log multiple times, this is a clue that a query might not be optimized for writing. You can rewrite it to make it run faster. Keep in mind that when evaluating a "slow query" log, "slow" is measured against actual time, and more queries appear in the "Slow query" log on larger servers.

*4. Several principles for index building *

4.1. The leftmost prefix matching principle, very important principle, MySQL will always match to the right until the range query (>, <, between, like) to stop the match, such as a = 1 and B = 2 and C > 3 and D = 4 if established (a,b,c , d) sequential index, D is not indexed, if the index is established (A,B,D,C) can be used, the a,b,d order can be arbitrarily adjusted.

4.2.= and in can be ordered, such as a = 1 and B = 2 and C = 3 established (A,B,C) index can be in any order, the MySQL query optimizer will help you to optimize the form that the index can recognize

4.3. As far as possible to choose a high degree of distinction between the column as an index, the formula for the degree of discrimination is count (distinct col)/count (*), indicating the proportion of the field does not repeat, the greater the proportion of the number of records we scan, the unique key is 1, and some states The gender field may be in front of the Big data distinction is 0, that may be asked, what is the empirical value of this ratio? Using the scene is different, this value is also difficult to determine, the general need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 records

4.4. Index columns can not participate in the calculation, keep the column "clean", such as from_unixtime (create_time) = ' 2014-05-29 ' can not be used to the index, the reason is very simple, B + trees are stored in the data table in the field values, but when retrieving, You need to apply all the elements to the function to be able to compare, obviously cost too much. So the statement should be written as Create_time = Unix_timestamp (' 2014-05-29 ');

4.5. Try to expand the index, do not create a new index. For example, the table already has the index of a, now want to add (a,b) index, then only need to modify the original index.

The above is a small set to introduce the MySQL use index to achieve query optimization, I hope to help everyone, if you have any questions please give me a message, small series will promptly reply to everyone. Here also thank you very much for the cloud Habitat Community website support!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.