The benefits and disadvantages of indexing a database if it needs to be indexed

Last Update:2014-11-27 Source: Internet

Author: User

Tags mysql query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Article Source: http://chenwei.me/p/24.html

Benefits of indexing

The benefits of indexing can be considered by many readers to be just "improving the efficiency of data retrieval and reducing the IO cost of the database". Indeed, the greatest benefit of creating an index on a field in a table in a database is that the field can be used as a retrieval condition to greatly improve retrieval efficiency, speed up retrieval time, and reduce the amount of data that needs to be read during the retrieval process. But does the benefit of the index only improve the retrieval efficiency of the table data? Of course not, the index has a very important purpose, which is to reduce the sorting cost of data. We know that the data in each index is sorted by index key value, so when the query statement contains a sort grouping operation, if the sort field and the index key field are exactly the same, MySQL Query Optimizer will tell mysqld not to sort after the data is obtained. Because the data obtained from the index already satisfies the customer's ordering requirements. What if it's a group operation? The grouping operation has no way of directly using the index to complete. However, the grouping operations need to be sorted and then grouped, so when the query statement contains grouping operations, and the grouping field is exactly the same as the index key field, then mysqld can also take advantage of this feature, which is already sorted by the index, omitting the sorting operation in the group. The sort grouping operation mainly consumes memory and CPU resources, and if you can take advantage of the index in the sort grouping operation, the CPU resource consumption will be greatly reduced. Disadvantages of indexingThe benefits of indexing are clear, but we can't just see these benefits, and think of the index as the Bible for solving query optimization, as long as you find that query is not running fast enough to put the conditions in the WHERE clause in the index. It is true that indexes can greatly improve the efficiency of data retrieval and improve the performance of sorting grouping operations, but one problem that cannot be overlooked is that indexes are completely independent of the data that is part of the underlying data. Assuming that the column CA in table TA has created index IDX_TA_CA, then any update to the column CA, MySQL will need to update the index data of the column CA while updating the column CA in the table, and adjust the index letter Interest. If the column CA is not indexed, MySQL only has to update the information in the column CA in the table. In this way, the most obvious resource consumption is to increase the amount of IO generated by the update and the amount of computation caused by the index adjustment. In addition, the index IDX_TA_CA of the Column CA needs to occupy storage space, and as the Table TA data volume increases, the space occupied by Idx_ta_ca will increase, so the index will also lead to increased storage space resource consumption. How to determine if you need to create an indexAfter understanding the pros and cons of the index, how do we determine if an index should be created? In fact, there is no very clear law that clearly defines what fields should be indexed and what fields should not be indexed. Because the application scenario is too complex, there are too many differences. Of course, there are still several basic decision strategies that can be found to help with the analysis. 1. more frequent fields that are query criteria should create indexesThe most effective way to improve the efficiency of data query retrieval is to reduce the amount of data that needs to be accessed, and from the benefits of the index above, we know that the index is the most effective means of reducing the IO volume of the query using the Index key field as the search criteria. Therefore, you should generally create indexes for more frequent query criteria fields. 2. fields with poor uniqueness are not suitable for creating indexes individually, even if they are frequently used as query criteriaWhat are the key fields that are too unique? Data stored in these fields, such as Status fields, type fields, and so on, can be reused for a total of several or dozens of values, each of which exists in thousands or more records. There is absolutely no need to create a separate index for this type of field. Because even if the index is created, MySQL query Optimizer Most of the time will not choose to use, if when the MySQL query Optimizer selected this index, it is very regrettable to tell you that this may bring great performance problems. Because each value in an indexed field contains a large number of records, the storage engine can generate a lot of random IO when it accesses data based on the index, or even a large number of duplicate IO at some point. This is mainly because the data is based on the characteristics of the index scan. When we access the data in the table by index, MySQL is accessed sequentially in the order of key values of the index key. In general, more than one record is stored in each data page, but most of these records may not match the order of key values for the index key you are using. If we have the following scenarios, we look for some data with key values of A and B by index. After finding the first record that satisfies the requirement through the A key, it reads the X data page of the record, and then continues to look up the index and finds that another record corresponding to the a key value satisfies the requirement, but this record is not on the X data page, and on the Y data page, the storage engine discards the X data page. and read the Y data page. This continues until all records corresponding to the a key value have been found. Then turn to the B-key value, then found that the record is looking for the X data page, can be read before the X data page has been discarded, can only read the X data page again. At this point, the X data page has actually been read repeatedly two times. Repeated reads may also occur again and again in subsequent lookups, which has undoubtedly greatly increased IO traffic to the storage engine. Not only that, if a key value corresponds to too many data records, that is, through the key value will return a large proportion of the entire table of records, because the index scan generated by the random IO, its efficiency than the full table scan sequential IO efficiency is much lower, even if there is no duplicate IO read, will also cause the overall IO Decreased performance. Many more experienced query tuning experts often say that when a query returns more than 15% of the full table, the index scan should not be used to complete the query. We cannot determine whether the number "15%" is accurate, but at least the field that proves that the uniqueness is too poor is not suitable for creating an index. 3. fields that are updated very frequently are not suitable for creating indexesThe above has been analyzed in the drawbacks of the index, when the field in the index is updated, not only to update the data in the table, but also to update the index data to ensure that the index information is accurate. This problem caused a large increase in IO traffic, not only affected the response time of the update Query, but also affected the resource consumption of the whole storage system, and increased the load of the whole storage system. Of course, not having an updated field is a good place to create an index, and as you can see from the terminology of the decision strategy, it is a "very frequent" field. What kind of update frequency should be considered "very frequent"? Per second? Every minute? or every hour? To tell the truth, it's hard to define. Many times are judged by comparing the number of times that are updated in the same time period and by the number of queries using the field as a condition, if the query through that field is not many, it may take several hours or longer to execute once, and the update is more frequent than the query, which is certainly not suitable for creating an index. Conversely, if we use this field to query more frequently, but the update is not particularly many, such as query dozens of or more can produce an update, I personally think that the additional cost of the update is acceptable. 4. fields that do not appear in the WHERE clause should not create an index

The benefits and disadvantages of indexing a database if it needs to be indexed

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More