The pros and cons of indexing and how to determine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The pros and cons of 8.4.5 indexes and how to determine whether an index is needed

I believe readers know that indexing can greatly improve the efficiency of data retrieval, so that query executes faster, but probably not every friend knows that indexing greatly improves the efficiency of retrieval, but also brings some negative effects on the database. The following is a simple analysis of the advantages and disadvantages of the index in MySQL.

Benefits of indexing

The benefits of indexing may be considered by many readers to be simply "able to improve the efficiency of data retrieval and reduce the IO cost of the database."

Indeed, the greatest benefit of creating an index on a field in a table in a database is that it can greatly improve retrieval efficiency, speed up retrieval time, and reduce the amount of data that needs to be read during retrieval. But the benefits of indexing only improve the retrieval efficiency of table data. Of course not, the index also has a very important purpose, that is to reduce the cost of sorting data.

We know that the data in each index is sorted by the index key value, so when the Query statement contains a sorted grouping operation, if the sort field and the index key field coincide exactly, the MySQL Query Optimizer tells the mysqld not to sort after getting the data Because the data obtained from the index already satisfies the customer's sorting requirements.

What if it's a group operation? There is no way to use the index directly to complete the grouping operation. But grouping operations need to be sorted and grouped, so when the query statement contains grouping operations and the grouping fields coincide with the key fields, mysqld can also use the attribute that the index has ordered, omitting the sort operation in the group.

Sorting grouping operations mainly consumes memory and CPU resources, which can greatly reduce CPU resource consumption if the indexes are used in sorting grouping operations.

Disadvantages of indexing

The benefits of indexing are clear, but we can't just see these benefits and think that the index is the Bible that solves query optimization, so long as it finds that query is not running fast enough to put all the conditions in the WHERE clause in the index.

Indeed, indexing can greatly improve the efficiency of data retrieval, but also improve the performance of sorting grouping operations, but there is a problem that can not be ignored is that the index is completely independent of the underlying data part of the data. Assuming that the column CA in table TA creates an index Idx_ta_ca, any update to the column CA will require that MySQL update the column CA in the table while updating the index data of the column CA and adjust the index that caused the change in the key value because of the update Information. If the column CA is not indexed, MySQL will simply update the column CA information in the table. In this way, the most obvious resource consumption is to increase the IO amount of the update and the amount of calculation that is caused by the index adjustment. In addition, the index IDX_TA_CA of the Column CA requires storage space, and as the volume of the Table TA data increases, the amount of space occupied by the IDX_TA_CA increases, so the index also leads to an increase in storage-space resource consumption.

How to determine if you need to create an index

After understanding the pros and cons of indexing, how do we determine if an index should be created?

In fact, there is not a very clear law that clearly defines what fields should be indexed, and what fields should not be indexed. Because the application scene is too complex, there are too many differences. Of course, there are still a few basic decision strategies that can be found to help with the analysis.

1. More frequent fields as query criteria should create an index

The most effective way to improve the efficiency of data query retrieval is to reduce the amount of data that needs to be accessed, and we know from the benefits of the index that the index is the most effective way to reduce the IO amount of query that is queried by the indexed key field. So in general, you should create indexes for more frequent query criteria fields.

2. Fields that are too unique are not suitable for creating indexes individually, even if they are frequently used as query criteria

The uniqueness of the field is too bad mainly refers to which. The data stored in these fields, such as the Status field, type field, and so on, may be reused for a total of several or dozens of values, each of which will exist in thousands or more records. There is absolutely no need to create a separate index for such a field. Because even though the index is created, MySQL query Optimizer Most of the time, if the MySQL query Optimizer chose this index, I regret to tell you that this can cause a great performance problem. Because each value in an indexed field contains a large number of records, the storage engine brings a lot of random IO when it accesses data based on an index, or even a large number of repetitive IO.

This is mainly due to the characteristics of the data based on index scan. When we access the data in the table through the index, MySQL accesses it in the order of the key values of the index key. In general, multiple records are stored in large numbers on each data page, but most of these records may not match the key values of the key you are using.

If there are any of the following scenarios, we search through the index for some of the data with key values A and B. After the first record that satisfies the requirement is found through the A key value, reads the X data page where the record is located, and then continues to look for the index and finds that another record for the a key value satisfies the requirement, but this record is not on the X data page, and on the Y data page, the storage engine discards the X data page. and read the Y data page. So continue until you find all the records corresponding to the a key value. Then the B-key value, then found that the record is looking for on the X data page, can be read before the X data page has been discarded, can only read the X data page again. This time, the X data page has actually been read two times. There may also be repeated reads over and over again in the lookup, which undoubtedly greatly increases IO traffic to the storage engine.

Not only that, if a key value corresponds to too many data records, that is, the value of the key will return a large proportion of the record, because the index scan generated by random IO, the efficiency of the full table scan than the sequential IO efficiency is much lower, even if no duplicate IO read, will also cause the overall IO Degradation of performance.

Many of the more experienced query tuning experts often say that when a query returns more than 15% of the total table, you should not use an index scan to complete this query. We can't determine whether it's accurate for the "15%" figure, but at least the side proves that the uniqueness is too bad to create the index.

3. A field that is updated very frequently is not appropriate to create an index

The above has been analyzed in the drawbacks of the index, when the field in the index is updated, not only to update the data in the table, but also to update the index data to ensure that the index information is accurate. This problem resulted in a large increase in IO traffic, not only affecting the response time of the update Query, but also affecting the resource consumption of the entire storage system and increasing the load on the entire storage system.

Of course, it is not the right place to create an updated field, as you can see from the terminology of the decision strategy, the "very frequent" field. What kind of update frequency should be regarded as "very frequent". Per second. Every minute. or every hour. To tell you the truth, it's hard to define. Many times it is judged by comparing the number of times that are updated in the same time period and the number of queries that use the field as a condition. If the query through this field is not much, it may take several hours or longer to execute once, and the update is more frequent than the query, that field is certainly not appropriate to create an index. Conversely, if we go through the field of the query more frequently, but the update is not very much, such as queries dozens of times or more to be possible to produce an update, I personally feel that the additional cost of the update is also acceptable.

4. Fields that do not appear in the WHERE clause should not create an index
No one else will ask why. I also think this is nonsense, haha.
8.4.6 index or combined index

After a general idea of the various types of MySQL indexes, and the pros and cons of the index itself and the need to determine whether a field has to create an index, it is necessary to start creating indexes to optimize query. In many cases, the filter condition in the WHERE clause is not just for a single field, there are often multiple fields together as a query filter condition that exists in the WHERE clause. At this point, you have to decide whether to index only the best filtered fields or whether you want to create a composite index on all the fields (filter conditions).

To this kind of question, it is difficult to have an absolute conclusion, need to analyze and consider from various aspects, balance the pros and cons of two kinds of schemes, then choose one of the best scheme. Because the index has been learned from the previous section to improve the performance of some queries, it also makes certain updates less efficient. In a composite index, because there are more than one field exists, the probability of being updated theoretically is much larger than that of a single key index, which brings additional costs higher than the single key index. However, when a query condition in a WHERE clause contains more than one field, the query efficiency of a combined index that is made up of multiple fields is definitely higher than the index created with only one field in the filter condition. Because the data filtered through a single key index is incomplete, and the storage engine needs to access more records than the combined index, it will naturally access more data, which means higher IO costs.

A friend might say that you can create multiple single key indexes. It is true that you can create a single key index for each field in the WHERE clause. But is this really working? In this case, the MySQL Query Optimizer will most of the time select only one of the indexes, and then discard the other indexes. Even if he chooses to use two or more indexes to optimize the query with Index_merge, the effect may not be more efficient than selecting one of the single key indexes. Because if you choose to optimize the query by Index_merge, you need to access multiple indexes and MERGE several indexes, which may cost you more than selecting one of the most efficient indexes.

In a general application scenario, as long as not one of the filter fields in most scenarios can filter more than 90% of the data, while the other filter fields are frequently updated, generally more likely to create a composite index, especially in a high concurrency scenario. Because when concurrency is high, even if you save only a little IO consumption for each query, the total amount of resources saved is still significant because of the very high amount of execution.

Of course, creating a composite index does not mean that you need to put all the fields in a query condition in one index, and you should try to make one index available to multiple query statements, minimizing the number of indexes on the same table and reducing the cost of indexing updates due to data updates. You can also reduce the storage space that is consumed by the index.
In addition, MySQL provides another feature that optimizes indexing, which is the prefix index. In MySQL, you can index this field just by using the previous section of a field to reduce the storage space occupied by the index and to improve the efficiency of indexing access. Of course, the function of the prefix index only applies To fields where the field prefix is random and has little repeatability. If you need to index the field prefix content has a lot of duplication, the filtering nature of the index will be reduced, the amount of data accessed through the index will increase, when the prefix index can reduce storage space consumption, but may result in Query access efficiency is greatly reduced, not worth the candle.

Excerpt from http://www.canphp.com/article/show-130.html

Composite Index Optimization

An index on two or more columns is called a composite index.
With additional columns in the index, you can narrow the search, but using an index with two columns differs from using two separate indexes. The structure of a composite index is similar to a phone book, where the names are made up of surnames and names, the phone book is first sorted by last name, and then the person with the same surname is sorted by first name. The phone book is useful if you know the last name, but if you know the last name and first name, the phone book is more useful, but if you only know the name, the phone book will be useless.
So when you create a composite index, you should carefully consider the order of the columns. Composite indexes are useful when searching for all columns in an index, or when searching only the first few columns, and the composite index is useless if you are performing a search only on any subsequent column.
such as: the establishment of the name, age, sex of the composite index.

The principle of establishing a composite index:

If you are likely to perform a search on only one column more than once, the column should be the first column in the composite Index. If you are likely to perform a separate search for two columns in a two-column index, you should create another index that contains only the second column.
As shown in the figure above, if you need to query age and sex in your query, you should create a new composite index that includes age and gender.
A primary key that contains more than one column will always automatically create an index in the form of a composite index, whose columns are in the order in which they appear in the table definition, rather than in the order specified in the primary key definition. When you consider a search that will be performed through a primary key in the future, determine which column should be in the front.
Note that creating a composite index should contain a few columns, and these columns are often used in select queries. Having too many columns in a composite index doesn't give you too much benefit. And because it uses quite a lot of memory to store the values of the columns of a composite index, the result is memory overflow and performance degradation.

Optimizing the sorting of composite indexes:

A composite index is optimized only for order by statements that are the same or opposite in the index.
When you create a composite index, each column defines ascending or descending. For example, define a composite index:
SQL code
CREATE INDEX Idx_example
On table1 (col1 ASC, col2 DESC, col3 ASC)

Three columns are: col1 ascending, col2 descending, col3 Ascending. Now if we execute two queries
1:select col1, col2, col3 from table1 ORDER by col1 ASC, col2 DESC, col3 ASC
Same as Index order
2:select col1, col2, col3 from table1 order by col1 DESC, col2 ASC, Col3 DESC
In contrast to the indexed order
Query 1,2 can be composite index optimization.
If the query is:
Select col1, col2, col3 from table1 ORDER by col1 ASC, col2 ASC, col3 ASC
When sorting results are completely different from indexes, queries at this time are not optimized by composite indexes.

The role of the query optimizer in the WHERE query:

If a multiple-column index exists on columns Col1 and COL2, the following statement: Select * from table where col1=val1 and col2=val2 the query optimizer will attempt to determine which index will find fewer rows. Then the resulting index is used to fetch the value.
1. If a multiple-column index exists, any leftmost index prefix can be used by the optimizer. So the order of the Federated index is different, affect the choice of the index, try to put the value of less in front.
For example: A multiple-column index (col1, col2, col3)
The search for indexes in columns (col1), (col1 col2), (col1 col2 col3) can be useful.

SQL code
SELECT * from TB WHERE col1 = val1
SELECT * from TB WHERE col1 = val1 and col2 = Val2
SELECT * from TB WHERE col1 = val1 and col2 = val2 and col3 = Val3

2. If the column does not form the leftmost prefix of the index, the established index will not work.
Such as:
SQL code
SELECT * from TB WHERE col3 = Val3
SELECT * from TB WHERE col2 = val2
SELECT * from TB WHERE col2 = Val2 and Col3=val3

3. An index is used if the query condition of a like statement does not start with a wildcard character.
For example:% Car or% car% do not use index.
Car% using index.
Disadvantages of indexing:
1. Occupy disk space.
2. Increased time to insert and delete operations. The more indexes a table has, the slower it is to insert and delete. If the request for fast input system should not build too many indexes.

The following are some common index restriction issues

1, use not equal to operator (&LT;&GT;,!=)
In this case, even if there is an index in the column dept_id, the query statement still performs a full table scan
SELECT * FROM dept where Staff_num <> 1000;
But the development does need such a query, is there any solution to the problem?
Yes.
You can use an index to avoid a full table scan by substituting the or syntax for a query without an equal sign: The above statement changes to the following, and you can use the index.
SQL code
SELECT * FROM dept Shere Staff_num < 1000 or dept_id > 1000;

2, use is null or is not NULL
Using is null or is Nuo null also restricts the use of indexes because the database does not define null values. If there is a lot of NULL in the indexed column, the index will not be used (unless the index is a bitmap index, and the bitmap index will be explained in detail in future blog posts). Using NULL in SQL statements can cause a lot of trouble.
The solution to this problem is to define the columns that need to be indexed as non-null (not NULL) when you build the table.

3. Use function
If you do not use a function based index, the optimizer ignores these indexes when using functions for columns that are indexed in the WHERE clause. The following query will not use the index:
SQL code
SELECT * FROM staff where trunc (birthdate) = ' 01-may-82 ';

But the application of the function on the condition, the index can be effective, the above statement to the following statement, you can search through the index.
SQL code
SELECT * FROM staff where birthdate < (to_date (' 01-may-82 ') + 0.9999);

4. Compare mismatched data types
Comparing mismatched data types is also one of the difficult performance issues to discover.
In the following example, dept_id is a varchar2 type field that has an index on the field, but the following statement performs a full table scan.
SQL code
SELECT * FROM dept where dept_id = 900198;

This is because Oracle automatically converts the WHERE clause to To_number (dept_id) = 900198, which is what 3 says, which limits the use of the index.
You can use the index by changing the SQL statement to the following form
SQL code
SELECT * FROM dept where dept_id = ' 900198 ';

Well, here's what to note:

From Lao Wang's blog (http://hi.baidu.com/thinkinginlamp/blog/item/9940728be3986015c8fc7a85.html)

For example, there is an article table, we want to implement a category of Time inverted List table display function:

SELECT * from articles WHERE category_id = ... Order by created DESC LIMIT ...

This kind of query is very common, basically no matter what application can find a lot of similar SQL, academic readers see the above SQL, may say that select * is not good, should only query the required fields, then we simply completely, to change the SQL to the following form:

SELECT ID from articles WHERE category_id = ... Order by created DESC LIMIT ...

We assume that the ID here is the primary key, and as for the specific content of the article, it can be saved to a key-value type cache such as memcached, so that the academic readers should not be able to pick up any problems, so let's consider how to index this sql:

Regardless of the special circumstances such as data distribution, any qualified web developer knows such a SQL, should create a "category_id, created" composite index, but this is the best answer. No, it's time to look back at the headlines: MySQL. Indexing should consider the type of database engine.

If our database engine is InnoDB, then creating a "category_id, created" Composite index is the best answer. Let's look at the INNODB index structure, in InnoDB, the index structure has a special place: The primary key index on its btree leaf node will save the corresponding primary key value, the most direct benefit is covering Index, no longer to the data file to take the value of the ID , you can get it directly in the index.

If our database engine is MyISAM, then creating a "category_id, created" Composite index is not the best answer. Because the MYISAM index structure, the Non-key index does not have an additional value for the corresponding primary key, if you want to use the covering index, you should establish a "category_id, created, id" composite index.

The chatter is over, you should know what I mean. I hope that in the future when you think of the index can be more comprehensive, there are many similar problems in practical applications, such as the majority in the establishment of the index from the cardinality (show index from ...). See this argument is the point of view of the appropriate question, cardinality represents the number of unique values, in general, if the number of unique values in the total number of rows in the proportion of less than 20%, you can think that cardinality is too small, the index in addition to slow down insert/update/ The speed of the delete does not have much effect on the select; there is also a detail when indexing does not take into account the effects of character sets, such as the username field, if only the English, underline and other symbols, then do not use the character set such as Gbk,utf-8, and should use Latin1 or ASCII this simple character set, the index file will be much smaller, the speed will naturally be much faster. These details require the reader to pay more attention to themselves, I will not say more.
Excerpt from http://blog.chinaunix.net/uid-7692530-id-2567605.html
Multi Col index has some advantages over 2 statements with and join conditions, and a low degree of correlation between 2 columns.
Multi Col Index has a great advantage over 2 statements with and join conditions and a high degree of association between 2 columns.
For statements with 2 or join conditions, isolate Col index has some advantages, because in this case multi Col index will result in a full table scan, which can be used to optimize the index merge.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The pros and cons of indexing and how to determine

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The pros and cons of indexing and how to determine

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support