I took over an optimization project in the previous section, which requires that the company optimize the indexes of more than existing tables to improve the existing ones and eliminate invalid indexes. In view of the serious shortage of manpower (the resources of two and a half persons were counted at that time), the idea of Manually changing databases one by one was dispelled. The current program structure does not allow revolutionary practices, but can only be improved. Therefore, we have to prepare an automated tool for processing. After the prototype was developed, it was resisted by the O & m dba team during the meeting. The specific reason is unknown. The last unlimited extension. Here we will share our ideas. Welcome to shoot bricks.
The whole idea is that indexes are used for queries and updates, but inappropriate indexes will have a negative impact on insertion and update. It is unlikely that the existing indexes on the table are valid. It is not enough to reconstruct all the new indexes based on the existing data usage. Create a new index based on the query, and delete all the indexes that do not match the existing one. Although it is quite risky for running systems. However, you can perform a critical test.
The specific solution is as follows:
First, regularly capture the cache execution plan on the Hot Standby Database Server (I originally wanted to capture the SQL and found that some SQL statements are really not supported, and there is no possibility of automatic resolution ), then, together with the execution times of the execution, that is, the statistical information of the table, it is down to the data table of a backup server.
After the execution plan has been accumulated several times, the parsing starts. Because the execution plan is a well-formatted XML file, coupled with the XSD file that Microsoft provides the execution plan. We can reverse introduce the SQL predicates corresponding to each node (this XSD has not found any official instructions yet, and can only reverse introduce associations ). For example, to create an index, we care about three types of predicates: Select, join, and where. As long as we get this, we can build a good index. The principle is very simple. Join and where are the basis of the index key, and select can be added to the include of the index.
During the parsing, all execution plans are not divided into statistical processing. The advantage is that you can know the most referenced table fields and the foreign key columns. The data is repeatedly queried. For example, we can conclude that the col1 column of tablea is joined for 10 million times in a day's business process and where2w times. Col2 was selected for 100 times, and only times were where. In this way, we create indexes based on tables rather than individual queries. The final index will weigh the importance of the query frequency and query. If a business query is particularly important, but the execution frequency is not high, we can provide weights to prioritize the creation of indexes. Of course, to create an index, you also need to refer to the table data distribution to determine the order of fields in the index.
Now, the preparation is complete and the index is created. Current conditions, table data distribution, the number of times (select, join, where) Where table fields are queried, and the number of times these SQL predicates appear. The idea of how to create indexes starts is to analyze one by one, consider all possibilities, and then create indexes. It is found that this method is only applicable to the human brain. It is feasible to allow computers to increase their IQ to more than 120 first. Finding reverse thinking is equally useful here. Since we cannot create the most appropriate one at a time, we can create all the index combinations based on the combinations drawn from the execution plan. All join and where operations are placed in the index key. For example:
Select t1.a, t1. B, t1.c, t2.j, t2.k from Table1 T1 join Table1 T2 on t1.a = t2.j where t1.a = 'param'
The initial index is:
Index (a, B) includ (C) and index (j) Include (j, k)
If the SELECT statement is of the small data type and the alter execution plan has a low frequency of data modification, it is put into the include statement. If the big data type and modification are frequent, forget it. In this way, we eliminate the overlap. Partially overlapped, partially overlapped to the end to retain the reference execution frequency and query importance. If the difference is small, it is merged into one, for example:
1.Index (a, B, c) include (d)
2.Index (a, B, d) include (c)
Directly merged:
Index (a, B) include (c, d)
Of course, if alert is rare, it can be mergedIndex (A, B, C, D)For more information, see the frequency of modifying the C and D fields. Remove overlapping primary key. In this way, we basically need indexes.
The process of screening and overwriting against existing indexes is skipped. Simply pull the create index for parsing. It is easy to publish. Write a script to do drop and create when there are few businesses. The source code of the project will not be uploaded because of the confidentiality of the design to the company. Note that the cache for SQL Execution plans for simple queries is relatively short and will be cleared once the cache is insufficient. Pay attention to the error in the execution frequency of these SQL statements.
Sqlserverr2 XSD: http://schemas.microsoft.com/sqlserver/2004/07/showplan/sql2008/showplanxml.xsd
The summarized node ing is as follows:
The query SQL Execution Plan is included in the "stmtsimple" node. If this node is not available, it is generally an execution plan for other types of SQL statements.
Join nodes are generally included in hash and marger nodes. If join is a where condition, it will appear in seekkey and compare nodes, because join columns appear in pairs, it is easy to recognize here. One of them is a parameter (starting with @) or a constant (type = "const"), which must be a where condition.
The final output field of select is easier to find. The first outputlist node is.
Note that each columnreference column generally contains the database name, table name, and column information, but not the system table. Remove.