The previous paragraph took over an optimization project, the requirement is to the company's existing 1W multiple tables indexed optimization, perfect existing, eliminate invalid index. In view of the serious shortage of manpower (at that time, the resources of two and a half people), the idea of manually changing libraries was discouraged. The current program structure is not allowed to engage in revolutionary practices, only to make improvements, so ready to get an automated tool to deal with. Prototype has just been developed, the meeting in order to get out of the operation and maintenance of the DBA team strongly resisted, the specific reasons unknown. The last indefinite extension. Here to share the idea. Welcome to Pat Bricks.
The whole idea is that indexes are services for querying and updating, but inappropriate indexes can negatively affect insertions and updates. It is unlikely that the existing indexes on the table want to identify those that are valid. Then rebuilding all the new indexes based on the existing data usage will not solve the problem. A new index is generated from the query, and then the existing contrast, which does not match all of the deletions, was not originally created. Although the risk is quite large for a running system. But you can do a critical test.
The specific solution is as follows:
First of all, periodically crawl the cache execution plan on a hot standby database server (you might want to crawl SQL to find out that some of the SQL is too mixed up, without the possibility of automated parsing), and then down to the data table of a standby server along with the execution count of the execution of the table.
When the execution plan accumulates several times, it begins parsing. Because the execution plan is a well-formed XML file, plus Microsoft provides an XSD file to execute the plan. We can reverse the corresponding SQL predicate of each node (this XSD has not found the official instructions, only to reverse the introduction of the association). For example, to build an index we are more concerned about three kinds of predicates, namely: Select,join,where. As long as we get these we can build a good index. The principle is simple, the join and the where are the basis for the index key, and select can be added to index's include.
Parsing is not for a single execution plan, but for all the execution plans are decomposed and then statistically processed. The advantage is to be able to know that those table fields are referenced most, those are foreign key columns. The data was repeatedly queried. For example, it can be concluded that the Col1 column of TableA was 10W times in a day's business process and was where2w. And Col2 was select 10W, only 100 times where. So the basis of our indexing is based on the table rather than on a single query. The resulting index will weigh the query frequency and the importance of the query, if a business query is particularly important, but the execution frequency is not high we can provide weights, priority to establish an index. Of course, creating index also refers to the table's data distribution to determine the order of the fields in index.
OK, ready to finish, start indexing. The current conditions, the table data distribution, the number of reference times for table fields (Select,join,where), and the number of occurrences of these SQL predicates. The idea of starting with how to create an index is to analyze each one, consider all the possibilities and then create it. Found that this way only for the human brain, so that the computer to do first let the computer's IQ increase to more than 120 to be feasible. Discovering reverse thinking is also useful here, and since it's not possible to create the most appropriate one at a time, we create all the index combinations based on the combination of the execution plan. All joins and where are placed in the key of index. For example:
Select T1. A, T1. B, T1. C, T2. J, t2.k from Table1 T1 Join Table1 T2 on T1. A = t2.j Where t1. A = ' param '
The initial index is:
Index (A,B) includ (C) and index (j) include (J,k)
About Select if it is a small data type and the data modification frequency is small in the execution plan of ALTER, put it in the include. Large data types and modifications are more frequent. So we eliminate the mutual coverage. Partially overlapping, partially overlapping in the end retains that reference execution frequency and query importance. The difference is very small and is merged into one, such as:
1, Index (a,b,c) Include (D)
2, Index (a,b,d) Include (C)
Merge directly into:
Index (a,b) Include (c,d)
Of course, if alert is very small can also be combined into index (A,B,C,D) to refer to the C,d field to modify the frequency. and primary key overlap of the elimination. That leaves us with basically the index we need.
The process of screening coverage by comparing existing indexes is skipped. Simple pull out create Index to parse processing. It's easy to publish. Write a script to do the drop and create when the business is relatively small. Project source code because the design to the company's confidentiality problem will not upload. A place to note the SQL Execution plan cache for simple queries is shorter and will be cleaned up once the cache is insufficient. Be aware of the error in the execution frequency of these SQL.
SqlserverR2 xsd:schemas.microsoft./sqlserver/2004/07/showplan/sql2008/showplanxml.xsd
The summarized node mappings are listed as follows:
The query SQL execution plan is included in the node "StmtSimple", without which the node is generally the execution plan for other types of SQL.
Join associated nodes and their own types are generally included in Hash,marger, how to join and where conditions will appear in the Seekkey and compare nodes, because the join columns are in pairs, which is easy to recognize, one is the argument (@ begins with or constant (type= "Const") must be the Where condition.
The Select final output field is relatively easy to find, and the first outputlist node is.
Note that because the general columns each columnreference contains the library name, the table name, the column information, but the system table does not. Pay attention to culling.