Indexing is the most important tool for improving query speed. Of course there are other technologies available, but generally the most significant performance difference is the correct use of the index. In the MySQL mailing list, people often ask about ways to make queries run faster. In most cases, we should suspect that there are no indexes on the data table and that the problem is usually resolved immediately after the index is added. Of course, it is not always easy to solve the problem, because the optimization technology is not always simple. However, if you do not use an index, in many cases, you are wasting time trying to use other methods to improve performance. First use the index to get maximum performance gains, and then see if other technologies are useful.
This section describes what the index is and how the index improves query performance. It also discusses the possible performance degradation of indexes in some environments and provides guidelines for you to choose the index of your data tables wisely. In the next section we will discuss the MySQL query optimizer, which tries to find the most efficient way to execute a query. Understanding some of the optimizer's knowledge, as a complement to how to build an index, is good for us because you can make better use of the index you build. Some methods of writing queries actually make the index ineffective, and in general you should avoid this situation.
Benefits of indexing
let's start by understanding how the index works, first with a data table without an index. A table without an index is simply an unordered collection of data rows. For example, the ad table shown in Figure 1 is a table with no indexes, so if you need to find a particular company, you must check each data row in the table to see if it matches the target value. This results in a complete data table scan, which can be slow and inefficient if the table is large but contains only a small number of qualifying records.
Figure 1: Ad table with no indexes |
Figure 2 is the same data table, but adds an index to the Company_num data column of the ad table. This index contains entries for each row of data in the ad table, but the indexed entries are sorted by Company_num values. Now, instead of looking at rows by row to search for matching data items, we use the index instead. Let's say we look up all the data rows for company 13. We started scanning the index and found three values for the company. Then we hit the index value of company 14, which is larger than the value we are searching for. The index value is out of order, so when we read the index record with 14, we know that there will be no more matching records and we can end the query operation. so the effect of using the index is: we find where the matching rows of data are terminated, and can ignore other rows of data. Another effect comes from using the location algorithm to find the first matching entry without having to perform a linear scan from the index header (for example, a binary search is faster than a linear scan). By using this method, we can quickly locate the first matching value and save a lot of search time. The database uses a variety of techniques to quickly locate index values, but in this article we do not care about these technologies. The point is that they can be implemented, and the index is a good thing.
Figure 2: The Ad table after indexing |
You might want to ask, why don't we sort the data rows to omit the index? Isn't it possible to achieve the same search speed improvements? Yes, if the table has only one index, doing so may also achieve the same effect. But you might add a second index, so you can't sort the rows of data in two different ways at a time (for example, you might want to create an index on the customer name, create another index on the customer ID number or phone number). An index that separates entries from data rows solves this problem, allowing us to create multiple indexes. In addition, the rows in the index are generally shorter than the data rows. When you insert or delete a new value, moving a shorter index value than moving longer
This contrasts with the fact that the BDB and InnoDB storage engines do not use this method to separate data rows and index values, although they also operate the index as a sorted collection of values. By default, the BDB engine uses a single file to store data and index values. InnoDB uses a single data table space (tablespace) to manage data and index storage for InnoDB tables in a tablespace. We can configure INNODB to create each table in its own table space, but even then, data and indexes for the datasheet are stored in the same tablespace file.
The previous discussion describes the advantages of indexes in a single table query environment, in which case the use of indexes significantly improves the speed of searches by reducing the overall table scan. When you run queries that involve multiple table joins (Jion), the value of the index is even higher. In a single-table query, the number of values that you need to check on each data column is the number of data rows in the table. In a multi-table query, this number can rise significantly because this number is the result of the number of data rows in these tables.
Suppose you have three tables T1, T2, and T3 that are not indexed, each containing data columns I1, I2, and i3, and each table contains 1000 rows of data, numbered 1 through 1000. A query that looks for some value matching data row combinations might look like this:
SELECT t1.i1, T2.i2, T3.i3 From T1, T2, T3 WHERE t1.i1 = t2.i2 and t2.i1 = T3.i3; |
The result of this query should be 1000 rows, each data row containing three equal values. If the query is processed without an index, then if we do not scan the tables completely, we have no way of knowing which data rows contain the values. So you have to try all the combinations to find records that match the Where condition. The number of possible combinations is 1000 X (1 billion!). ), which is 1 million times times the number of matching records. This is a waste of a lot of work. This example shows that if the index is not used, the time it takes to process the table's joins increases faster, resulting in poor performance, as the table's records continue to grow. We can significantly increase the speed by indexing these data tables, because the index allows the query to be handled in a way that looks like this:
1. Select the first row in the table T1 and view the value of the data row.
2. Use the index on table T2 to navigate directly to the row of data that matches the value of T1. Similarly, use the index on table T3 to navigate directly to the row of data that matches the value of the table T2.
3. Process the next row of the table T1 and repeat the previous procedure. This is done until all the data rows in the T1 have been checked.
In this case, we still perform a full scan of the table T1, but we can perform an index lookup on T2 and T3 to get the data rows directly from those tables. In theory, running the above query in this way is 1 million times times faster. This example is, of course, intended to draw the conclusion that the bearer was established. However, the problem it solves is real, and adding an index to a table that doesn't have an index usually gets an amazing performance boost.
MySQL has several ways to use indexes:
· As mentioned above, the index is used to increase the search speed of data rows that match other tables when the data row of the Where condition is matched or when the join operation is performed.
· For queries that use the min () or MAX () functions, the minimum or maximum values in the index data column can be found quickly without checking each data row.
· MySQL uses indexes to quickly perform ordering and grouping operations for order by and GROUP by statements.
• Sometimes MySQL uses the index to read all the information the query gets. Assuming you select the indexed numeric column in the MyISAM table, you do not need to select a different data column from the data table. In this case, MySQL reads the index value from the index file, and the resulting value is the same as the value obtained from the read data file. It is not necessary to read the same value two times, so it is not necessary to consider a data file.
Index creation Rules:
1, table primary key, foreign key must have index;
These are some common criteria for establishing an index. Word, the establishment of the index must be cautious, the need for each index should be carefully analyzed, to establish the basis.
Because too many indexes and inadequate, incorrect indexes are not good for performance: Each index established on the table increases the storage overhead, and the index increases processing overhead for insert, delete, and update operations.
In addition, too many composite indexes, in the case of single-field index, generally have no value; Conversely, it also reduces performance when data is being deleted, especially for tables that are frequently updated, with greater negative impact.
Reprint: http://dev.yesky.com/381/2108381.shtml?412
MySQL Query optimization: Working with indexes