First, preface
Many database system performance is not ideal because the systems are not overall optimized, there are a lot of low-performing SQL statements. The primary reason for the poor performance of such SQL statements is the lack of efficient indexing. No index In addition to the statement itself to run slow, but also caused a large number of disk read and write operations, so that the overall system performance is affected by the poor. The first way to solve this type of system is to optimize these SQL statements that are not indexed or indexed well enough.
This article discusses the relevant content related to indexes, and describes how to apply indexing techniques to optimize SQL statements by analyzing the execution plan of statements. By analyzing the execution plan, readers can check whether the index is useful and how to create an efficient index. This article has a certain reference significance to the database management personnel and the database system development personnel.
If the reader does not know which SQL statements should be optimized for the database system, it is recommended that the reader refer to the author's other article, "Applying Profiler to optimize the SQL Server database system." The article shows you how to use Profiler and Read80Trace tools to find the key and frequently-running SQL statements in your database system, and you can devote your energies to these most optimized SQL statements.
Second, the key to create the index
Optimized the key to the SQL statement is to minimize the logical reads of the statement. the logical reads refers to the total number of data pages that need to be accessed at the time the statement executes, which is 8K. The less logical reads, the less memory and CPU time it needs, and the faster the statement executes. It goes without saying that the greatest benefit of an index is that it can greatly reduce the number of logical reads of SQL statements, thus greatly reducing the execution time of statements. The key to creating an index is to be able to significantly reduce the logical reads. an index is not good, mainly see it reduced logical reads more.
Run the SET STATISTICS IO command to get the logical reads information for the SQL statement. Examples are as follows:
Run the following command in query Analyzer:
/***** Script 1 *****************************/
SET STATISTICS IO on
Select Au_id,au_lname, au_fname
from pubs. Authors WHERE au_lname = ' Green '
SET STATISTICS IO on
/********************************************/
The output results are as follows:
au_id au_lname au_fname
----------- ---------------------------------------- --------------------
213-46-8915 Green Marjorie
(1 row (s) affected)
Table ' authors '. Scan count 1, logical reads 1, physical reads 0, Read-ahead reads 0.
The above logical reads 1 means that the total logical read of the SELECT statement is 1. Logical reads the less the better. if Logical Reads is very large, and the number of rows returned is very small, that is, the difference between the two is large, then often the meaning of the statement needs optimization . For example, the statement is not indexed, or the index is not good enough. Note the difference between logical reads and the physical reads behind. The Logical reads contains the number of pages that the statement accesses from the memory data buffer and the number of pages read from the physical disk. Instead, physical reads represents data pages that do not reside in memory buffers that need to be read from disk. Read-ahead reads is a pre-read that SQL Server generates to improve performance. Read-ahead may read more data. We focus on logical reads when it comes to optimization. Note If physical reads or read-ahead reads is large, it often means that the execution time of the statement (duration) is partially spent waiting on the physical disk IO.
Second, single-field index, combined index and overlay index
As the name implies, the single-field index refers to the index of only one of the fields, whereas a composite index refers to an index with multiple fields.
The following examples describe some of the techniques for creating these indexes, and how to use the execution plan to determine whether an SQL statement is using an index.
1. Index The fields that appear in the WHERE clause
Run the following statement to create the required tables for the sample:
/**************script 2************************************/
Use tempdb
Go
if exists (SELECT * from dbo.sysobjects WHERE id = object_id (n ' [Tbl1] ') and OBJECTPROPERTY (ID, n ' isusertable ') = 1)
drop table [TBL1]
GO
CREATE TABLE Tbl1
(Student number int, student name varchar (20), Sex char (2), age int, enrollment time datetime, remark char (500))
Go
DECLARE @i int
Set @i=0
DECLARE @j int
Set @j=0
While @i<5000
Begin
if (rand () *10>3) set @j=1 else set @j=0
INSERT into TBL1 values (@i,
Char (rand () *10+100) +char (rand () *5+50) +char (rand () *3+100) +char (rand () *6+80),
@j, 20+rand () *10,convert (varchar), GETDATE ()-rand () *3000,112),
Char (rand () *9+100) +char (rand () *4+50) +char (rand () *2+130) +char (rand () *5+70))
Set @[email protected]+1
End
/**************************************************/
Then we look at how the following statement should create the index:
/********script 3**********************************/
Set STATISTICS PROFILE on
SET STATISTICS IO on
Go
Select Student name, admission time from TBL1 where student number =972
Go
Set STATISTICS profile off
SET STATISTICS IO off
Go
/****************************************************/
Note that the above SET STATISTICS Profile command outputs the execution plan of the statement. You might ask, why not set SHOWPLAN_ALL? It is also possible to use SET SHOWPLAN_ALL. However, the set statistics profile outputs the execution plan that is actually used when the SQL statement is run, and the SET SHOWPLAN_ALL output is the expected (Estimate) execution plan. Using SET SHOWPLAN_ALL is the subsequent statement that does not actually run.
The result of the script output (section) is as follows:
Student Name Enrollment Time
-------------------- ------------------------------------------------------
G4eq 2005-05-29 00:00:00.000
(1 row (s) affected)
Table ' Tbl1 '. Scan count 1,logical reads 385, physical reads 0,read-ahead reads 0.
Rows executes Stmttext
------------------------------------------------------------------------------
1 1 SELECT [student name]=[student name],[enrollment time]=[admission time] from [TBL1]
1 1 |--Table Scan(OBJECT: ([tempdb].[ DBO]. [TBL1]), Where: ([TBL1].
(2 row (s) affected)
As you can see from the output above, this statement executes using the table scan, which is a full table scan of the entire table. The performance of a full-table scan is usually poor and should be avoided as much as possible. If the SELECT statement above is a critical statement that is frequently run by the database system, you should create an appropriate index on it. One of the techniques for creating an index is to create an index on a field that often appears in the Where condition . So for the SELECT statement above, you should create a single field index on the Student Number field IDX_ student number:
Create nonclustered index IDX_ student number on TBL1 (student number)
Then run script 3, and some of the results are as follows:
Table ' Tbl1 '. Scan count 1, logical reads 3, physical reads 0, Read-ahead reads 0.
Rows executes Stmttext
---------------------------------------------------------------------------------------------
1 1 SELECT [student name]=[student name],[enrollment time]=[admission time] from [TBL1] WHERE [student number][email protected]
1 1 |--bookmark Lookup (Bookmark: ([Bmk1000]), OBJECT: ([tempdb]. [dbo]. [TBL1]))
1 1 |--Index Seek(OBJECT: ([tempdb].[ DBO]. [TBL1]. [idx_ student number ]), SEEK: ([TBL1].
The results above show that the IDX_ student number we just created is actually used. The logical reads of the statement is greatly reduced, and the decrease from 385 to 3,table scan before the index becomes the index Seek, which greatly improves performance. As you can tell from the above example, if you see the index scan of the table scan or the clustered index in the execution plan (the index scan of the clustered index is equivalent to the table scan), and the corresponding logical reads is quite large, Then try to make it into index seek. try to avoid Table Scan or index scan is a common technique used to optimize SQL statements. index seek usually requires much less logical reads than the previous two.
2. Combined index
if There are multiple fields in the where statement, so you can consider creating a composite index . Examples are as follows:
/*****script 4******************************************/
Set STATISTICS PROFILE on
SET STATISTICS IO on
Go
Select Student name, admission time from TBL1
Where admission time >= ' 20050301 ' and admission time < ' 20050305 ' and age >24
Go
Set STATISTICS profile off
SET STATISTICS IO off
Go
/*******************************************************/
To improve the performance of this statement, you can set up a combined index on the time and age of enrollment as follows:
Create nonclustered index IDX_ enrollment time age on TBL1 (enrollment time, age)
You might ask, what if you put the entry time and age field in a different position to create the following combination index?
Create nonclustered index IDX_ age entry time on TBL1 (age, enrollment time)
This index is not as good as the previous one. Analyze the uniqueness of the two fields:
Select COUNT (*) from TBL1 GROUP by entry time
Select COUNT (*) from TBL1 GROUP by age
Some of the output results are as follows:
DISTINCT_VALUE_OF Admission Time
(2426 row (s) affected)
Distinct_value_of Age
(s) affected)
The results show that the admission Time field has 2,426 unique values, while the age field is only 10. This means that the entry Time field is more unique than the age field. For the above two indexes run script 4, you will find that the first index statement logical reads is 8 and the second index results in a logical reads of 16, a difference of one times. If the table is large then the performance difference is conceivable. Therefore, the order of the fields in the composite index is very important, and the more unique the fields the more forward . In addition, regardless of the index of a composite or individual column, try not to select those fields that are very low in uniqueness. For example, it doesn't make much sense to build an index on a field with only two values of 0 and 1.
Sometimes you have to decide whether to index each related field individually or to set up a composite index. For example, if the following statements are frequently performed:
Select C1, c2,c3 from tblname where c1= ' abc ' and C2=3
Select C1, C3 from Tblname where c1= ' B '
Select C1, C2 from Tblname where c2=10
How should the index be built? This depends on the proportions of each statement. If most statements are always based on C1 and C2 queries, then a combined index (C1+C2) or an overlay index is useful, and then add an index that is created separately for the C3. Conversely, if the first statement runs very few times, most of the statements are the following two, then of course, C1 and C2 are indexed. You might ask, is it possible to index C1 and C2, respectively, on the first statement? OK. For some statements, SQL Server may query the data using two indexes (that is, indexing cross-technology) and then get the results by the intersection. But sometimes SQL Server does not necessarily use all of the single-field indexes that you create. So if you are indexing a single field, we recommend using SET STATISTICS profile to verify that the index is actually fully used. Logical reads the fewer indexes the better.
3. Overwrite Index
Is there a better index for the SELECT statement in Script 4? Some. That is to use the overlay index (covered index). Overwriting an index makes it possible for the statement to have all the required data without accessing the table and accessing the index only. Because the clustered index leaf node is the data, it does not matter whether it is overwritten or not, so the overriding index is primarily for nonclustered indexes. I wonder if you have noticed that the implementation plan we discussed earlier has a bookmark lookup keyword in addition to index seek. The bookmark lookup indicates that the statement also needs additional bookmark lookup operations on the table to get the data after it accesses the index. That is, to get a row of data at least two Io, one access to the index, one access to the base table. If the statement returns a large number of rows, the cost of the bookmark lookup operation is significant. Overriding the index avoids expensive bookmark lookup operations, reduces the number of Io, and improves the performance of the statement.
The overwrite index needs to contain all the fields that appear in the SELECT clause and the WHERE clause. The fields in the where statement are in front, and select is behind. For the SELECT statement in Script 5, the overwrite index is as follows:
Create nonclustered index idx_covered on tbl1 (in?? Time, age, student name)
Then run script 4 with the following output:
Table ' Tbl1 '. Scan count 1, logical reads 2, physical reads 0, Read-ahead reads 0.
Rows executes Stmttext
------------------------------------------------------------------------------------------------------
6 1 SELECT [student name]=[student name],[entry time]=[admission time] from [TBL1] WHERE [Admission time]>[email protected] and [Admission time]<@2
6 1 |--index Seek (OBJECT: ([tempdb].[ DBO]. [TBL1]. [idx_covered]), SEEK: ([tbl1].[ Admission time], [TBL1]. [age])
Compare the above logical reads, is greatly reduced. The Bookmark lookup operation also disappears. Therefore, creating an overlay index is a very useful optimization technique to reduce the performance of logical reads elevation statements.
In fact, the creation principle of indexes is more complex. Sometimes you can't include all the fields in the WHERE clause in the index. When considering whether an index should contain a field, you should consider the role of the field in the statement. For example, if you often return very few rows with a field as a where condition for exact matches, then it is absolutely worth indexing the field. For example, for those very unique fields such as primary and foreign keys, the fields that often appear in the group By,order by are worth creating indexes. Because of the limited space, this is no longer unfolding. The online manual for SQL Server has good content, please refer to the reader for your own reference.
Application of index technology to optimize SQL statement one