In many systems, such as the database that I am currently managing, indexes are often abused and even used to create indexes in batches using DTA (Database Engine Tuning Advisor) (DTA is now considered by the individual to be the true usefulness of discovering missing statistics, using DTA in previous projects, It prompted a lot of columns lack of statistical information, and then do not change other operations under the premise, the statistical information manually built up, performance improvement is very obvious. Information about the statistics will be introduced in another article ). A table even has more than 20 indexes (the number of indexes does not have a standard, but to be reasonable, each index should be able to support a large number of queries or additions and deletions of the query function to have the existence value). Too many indexes bring heavy pressure on the server, and there is a sentence: Unreasonable indexes are worse than no indexes. The visible index cannot be built casually.
How do you know which columns need to be indexed? Which indexes can be deleted or merged or modified? Although there are many similar articles and methods on the Internet, it is usually more fragmented and more laborious to use. This paper, combined with personal work experience and on-line data, obtains a method which the individual thinks is effective. In the follow-up work and learning, if there are improvements, the text will be modified synchronously.
Note: The indexes mentioned here are non-clustered indexes, if not special instructions.
Introduced:
Most people know that the benefits of indexing, but very few people deeply feel its disadvantage. I have always thought that there is no absolute good or absolute bad for indexes in SQL Server (and even for all database management systems, and most things in the world). It's like giving you an axe and a blade to cut a tree and cut a piece of paper, you have to cut a tree with a blade, an axe to cut a piece of paper, and then say that the blades and axes are rotten things, obviously unreasonable (this is not a joke, not a brain teaser, don't tangle too much).
Now that SQL Server has provided some features for you to choose from, there are scenarios and scenarios where it is not applicable. With good, the query can be raised from a few hours to a few seconds, with bad, you can reduce a query from a few seconds to a few hours. This article does not intend to discuss the internal mechanism of the index too deeply, and more detailed information can be read in the sixth chapter of the Art of SQL Server performance optimization and management. Here just to express, about the index: Built to be built to be well-reasoned. The deletion/merger should also be justified by the view that it is inappropriate to delete and that the effect is added and often only counterproductive.
So the rational use of the index, the content of this article is divided into several parts:
- The first part introduces the problem of too many indexes;
- The second part introduces the problem of insufficient index;
- The third part is the problem of unreasonable index;
- The last part is a demonstration of the actual operation of the previous three questions. However, because this part is longer, more, in order to avoid reading fatigue, so independent of a demonstration.
The original intention of this article is as follows: You can see that this is a dead loop, because of the following causes the index on the table is confusing, thus affecting the efficiency of the whole system. Here's a description.
Insufficient index:
The index is well understood, either there is no index at all except the clustered index (and sometimes not even), but it does not support the operation on the table well enough.
First of all, only the clustered index (for the heap table, in the real world is rarely seen, there is a need to think about whether there is a heap table). This usually occurs in the table you just created, or with a smaller number of tables. However, with the expansion of subsequent functions and the increase in data on the table (assuming that the table itself has more columns), using a clustered index every time will become increasingly inefficient. Because many queries require only a few columns, the clustered index loads all the columns of the table, which also consumes memory resources. At this point, you need to add an appropriate nonclustered index to support the query (the query here contains more than just select). Then go to the next situation: nonclustered indexes do not support operations on the table well.
When using nonclustered indexes, it is possible to limit the columns of nonclustered indexes to a very small number due to the so-called "military" and "iron laws" that are widely circulated on the Internet. Thus, in the course of the development of the system, it is very easy to see that the columns required for some important and large data queries are not included in the nonclustered index, causing the optimizer to choose a clustered index or key lookup (key lookup). These two lookups do not represent a problem, but to a certain extent, these two lookups are not efficient to fetch data directly from the nonclustered index.
Index not enough plainly is not covered by the query required columns, this coverage depends on the actual situation, and not necessarily each system can have a full coverage of the index of all queries, especially the core table, the above query may be various, indexing for all queries will result in the following said index too many cases, Eventually into a vicious circle.
In addition, an index-less solution is usually built to index, and the operation of the index is best handled by an experienced minority, otherwise anyone can build an index, and a lack of communication, a for his own query to build a, b itself can modify a created by the index to support their own query, but add one alone, Finally, there are too many indexes and unreasonable indexes. Unfortunately again into a vicious circle.
In the case of insufficient indexes, sometimes in addition to the index, you can also consider merging or modifying existing indexes, what is more, not the index caused by performance problems, in this case can focus on statistical information or statement optimization (in fact, statement optimization and database design optimization should be done first, and the most effective, But this topic is too broad, so it is not intended to be introduced here, we still assume that the statement and database has not much room for improvement, focus on the index).
Too many indexes:
The so-called index is too many, refers to the number of nonclustered indexes on a single table too much, as far as how much is too much, this actually no standard, although some books on the single table does not have more than 6 index, but those books are usually very long books (about 10 years ago), then the hardware resources are not more tense, It is not possible to support too many indexes. But even today's hardware resources, with the complexity of the system and the increase in data volume, hardware resources are also easy to be insufficient, and too many indexes can bring a lot of risk. The risk of too many indexes is mainly as follows:
- Reduce the performance of adding and deleting operations. As a result of additions and deletions, especially the deletion, increase two, often caused the change of the clustered index key, the index of the non-clustered leaf node stored the clustered index key, so it will also be updated. These nonclustered indexes tend not to improve the performance of these operations, but instead negatively affect operations. This is called the reaching. But note that some additions and deletions with a where condition, in these cases, some indexes can improve performance, such as the millions of the table to delete a row of data, if there is no index (assuming that even the clustered index does not), then only scan the entire table delete, in the formal environment to try to know, the speed is simply unacceptable.
- Too many indexes also cause the database to "volume" to become larger. In essence, the performance of a database is almost always caused by "volume". We know that a clustered index is the table itself, how large the table is, and how large the clustered index is, depending on the level of the B-tree, which may be larger than the table (but generally not much). Instead of a clustered index, the leaf node has a certain amount of volume because it stores the index key data. Large volume also causes storage pressure and maintenance overhead. Because SQL Server does not directly access the disk, each operation needs to load the data from the disk into memory, the capacity of the memory is relatively small compared to the disk, once the index is large, the query process caused by the memory pressure will also be the result of the data cache and the plan cache have a negligible impact. Maintenance of clustered indexes, such as rebuilds, can cause a cascade of nonclustered indexes. The more indexes you have, the slower the speed. Many systems do not have enough time for you to maintain the index arbitrarily.
- Increased SQL Server optimization overhead. Although the optimization engine does not usually take much time to optimize, at least relative to execution, the optimization time you may not be aware of, but over-optimized, compiled recompile, will be the CPU pressure. When the index is large, the optimizer must analyze the query and the statistics above, and if there are indexes that support this query, the optimizer must be less expensive than which index to do so. Frankly speaking, to choose the optimizer also has the choice phobia, too many choices it will compromise, not necessarily will spend more resources to choose the optimal scheme, in case the influence of other operations at that time caused the optimizer to choose an unreasonable index, performance is worse.
- Statistics update overhead, by default, SQL Server calculates the statistics for the columns in the index definition, which are index-level, each index has its own statistics, and the more indexes, the greater the scope of the update, and the longer it takes, due to data changes or other changes that bring statistics updates.
- Index invalidation: This is the reason why I write this series, because I do not regularly appear in the database of some functional cards, through the reconstruction of the index, the function to restore a certain period of stability, but after a period of time has arisen. From the surface, this is similar to the failure of the index, but the official data does not appear in the index failure description, so you can consider the index failure is only a representation, more is the optimizer of the index selection confusion. I will take the time to start a series of articles devoted to this part of the discussion. This is no longer happening by reducing the number of indexes and increasing the reusability of each index.
Simply put, a single-table index should not be too much for an OLTP database. In this era of fast-food culture, many people want to get the standard value directly, not to do the actual analysis, if not to give a value, according to personal experience:
- For the core table: do not exceed 7 for all indexes.
- For normal tables: do not exceed 5 for all indexes.
- For small tables: do not exceed 3 for all indexes.
When a single table has many indexes (such as more than 20 above), it should be considered whether it is really necessary to maintain this quantity, and the solution is given in the final part of this series. Of course, in extreme cases, the performance of tables in more than 20 indexes is not necessarily bad. Remember: The specific situation of specific analysis.
Synthesis of the front two points, summed up into a sentence: too little!!
The index is unreasonable:
In fact, the above two cases, many times because the index design unreasonable result. Aside from the clustered index, the nonclustered index, if the index design is unreasonable, then there is no way to play the expected role, in most cases it does not work, which is similar to the result of insufficient indexes. At this time a lot of people will choose to build a new index, if the understanding of the index is insufficient, the possible new index can also not support the query, resulting in re-creation of the index, over time, there are too many indexes.
Another situation is that in the middle of the project, the index design is unreasonable, the number of indexes is too large, so many people (including me) will choose to "delete." Of course, since I am writing this article, the representative will not be very violent to delete, my principle is: build to be reasonable, delete also must be justified, change also must be justified. If you can't tell the reason, don't touch it. Because of existence, there must be reason, only reasonable unreasonable. When the index is dropped, there may be a situation where the index is insufficient, followed by a loop that was mentioned earlier. To avoid confusion, this is the end of the circular discussion. The following is a look at the unreasonable index situation.
Generally, the index is unreasonable due to three situations:
1. There are too many columns in the index, and when there are a lot of columns in the index, the optimizer might prefer to use a clustered index, because the clustered index itself stores the data, and the nonclustered index sometimes takes two times to locate, which is not efficient. At the same time, although the indexes themselves are not ordered, clustered indexes can help in some cases to sort queries. Also, frankly, if your nonclustered index contains most of the columns of the table, why not build one again and use the clustered index directly? Nonclustered indexes in many cases, in order to reduce the index volume (because there are usually fewer columns), it is not worth the effort to build an index of too many columns, even if there is a slight improvement in performance, it is not recommended to consider the subsequent maintenance overhead.
2. In a multicolumn index, the first column order is unreasonable: an experienced SQL Server practitioner should know that the first column of the index has a very important position. This is about statistics, which are not detailed here for the time being. Because the statistics directly affect the optimizer in the optimization process of the index selection, if the first column of the index using a very low selection (such as gender, such as only a very small number of different values of the column, the selection is very low) of the column, then it is possible to find the index can be found to efficiently locate the required data, but went through the index Even the optimizer does not use this index with other nonclustered indexes or clustered indexes, making the index a dummy and losing its meaning. The discussion on this part is carried out below, in short, the first column of a multicolumn index should use as high a selection as possible, the data type as "narrow", and the column with few changes after storage as the first column.
3. Include index and overwrite index: Before SQL2005, there are only two types of indexes, clustered and nonclustered, but there are many variants of nonclustered indexes starting with 2005, and one variant is the inclusion index. It is widely used in the missing index hints in graphical execution plans. So far, I've seen a hint in the graphical execution plan that is missing an index, with only the index included, and no other types of clustered and nonclustered indexes. Because of the widespread use of indexes, slowly, many people have forgotten their predecessor-the Overwrite index:
A) The overlay index refers to a nonclustered index that contains all the columns required by the query.
b) The inclusion index refers to a nonclustered index, but the index definition column (that is, after the table name, the set of columns before the Include keyword) stores some of the table's columns, and the include also defines some columns later.
c) The main difference between the two is that the value of the column (also called the Non-key column) that appears after the Include keyword is stored in the leaf node of the nonclustered index only, and the key values for all columns appearing in the overwrite index are stored in each layer node. To some extent, the size of the containing index is smaller, and the order of the columns after the include does not matter, and avoids the inherent limitation of the regular nonclustered index to 16 columns, 900bytes. In general, the inclusion of indexes has many advantages.
But precisely because of these advantages, many people build indexes regardless of 3,721, full use include index. It is clear that this claim cannot be abused. Because the include index is relative to the overwrite index, there are several disadvantages:
A) Portability: Although the odds are minimal, if you need to downgrade the database back to 2000, the inclusion index will give an error. Here comes the question of another programming specification: T-SQL does not exactly follow standard SQL, it has its own unique place, for some writing, if you can meet the requirements, then the preferred standard notation rather than T-SQL own writing, which also has a better portability. such as and! =, the former is the standard syntax, the latter is not. It is recommended to use this to express "not equal".
b) Overriding the functionality of the query: using the ADVENTUREWORKS2008R2 sample database, use the following code
1. Create a demo environment:
12345678910111213 |
use AdventureWorks2008R2
go
--为避免影响演示环境,创建一个新表dbo.Person用于测试
select *
into dbo.Person
from Person.Person
GO
--参照原表创建主键:
ALTER TABLE dbo.Person
ADD CONSTRAINT
PK_Person
PRIMARY KEY CLUSTERED
(
BusinessEntityID
)
GO
|
2. Write the query: Do not build any index, through the actual execution plan can see only the walk-through clustered index
1234 |
--写一个演示查询,由于一开始没有非聚集索引,所以查询会使用聚集索引 select Title,FirstName,MiddleName,LastName from dbo.Person where FirstName like ‘o%‘ |
3. Add an overlay index that overrides the columns of the Where condition and select condition in the query
123456789 |
--这里为了避免涉及过多内容,对索引首列做了限制,用于满足where条件 CREATE NONCLUSTERED INDEX IX_Person_FirstName ON dbo.Person(FirstName,Title,MiddleName,LastName) GO --添加一个覆盖索引,覆盖查询中WHERE条件和SELECT条件的列 select Title,FirstName,MiddleName,LastName from dbo.Person where FirstName like ‘o%‘ |
From the visible, the index can overwrite our query:
4. Delete the index above, add a containing index, but the FirstName column in the Where condition only appears in the include:
123456789 |
--创建包含索引 CREATE NONCLUSTERED INDEX IX_Person_FirstName ON dbo.Person(Title) INCLUDE(FirstName,MiddleName,LastName) GO --删除上面的索引,添加一个包含索引,但是WHERE条件中的FirstName列只出现在INCLUDE中 select Title,FirstName,MiddleName,LastName from dbo.Person where FirstName like ‘o%‘ |
Looking at the execution plan, you can see that the index scan is going, because although the columns are the same, the columns that appear in the include are often used only to assist select rather than assist where, so we do not have the desired effect.
As we can see from the demo above, if the design is unreasonable, it will also cause indexing problems, and the reader will be able to test the included index, select's list now include, and where list the effects before include. You can see that this inclusion index satisfies our expectations. And you can test that the column order in the include can be inconsistent with SELECT, which in this case does not affect the effect.
Summarize:
Since the index is a very large and very advanced topic, I have read a 600-page All-English book on SQL Server index, the dense 600 pages have not fully answered my question, so here I filtered a lot of details, the discussion and presentation focused on the topic I want to express. The next article will demonstrate how to handle these indexing problems.
In addition, it is necessary to remind that the index problem of a system can not only be so three classes, but based on the actual operation, the recent personal work in the process of finding the problem of sharing, in the follow-up process will continue to improve the topic. Ad Time: More information you can read my book, "The Art of SQL Server performance optimization and management."
Next post: SQL Server index Maintenance (1)--How to get Index usage
SQL Server Index maintenance: Common indexing issues for systems