SQL Server INDEX OPTIMIZATION practices

Source: Internet
Author: User

Importance of Indexes

Indexes in database performance optimization are definitely a heavyweight factor. It can be said that improper indexing and other optimization measures will be meaningless.

Clustered Index(Clustered index)Non-clustered Index(Non-clustered index)

The most common explanation is that the order of clustered indexes is the physical storage order of the data, but the index order of non-clustered indexes is irrelevant to the physical sorting order of the data. For example, if you go to the Chinese character "crawling" page of Xinhua Dictionary, it is the part starting with P. This is the physical storage Order (clustered index), instead of the Directory, find the page number of the Chinese character "crawling", and then find the word (non-clustered index) based on the page number ).

The following table shows when to use the clustered index and non-clustered index:

Action

Use clustered Index

Use non-clustered Index

Columns are sorted by group.

Ying

Ying

Returns data within a certain range.

Ying

Should not

One or few different values

Should not

Should not

Different decimal values

Ying

Should not

Different values of large numbers

Should not

Ying

Frequently updated Columns

Should not

Ying

Foreign key column

Ying

Ying

Primary Key Column

Ying

Ying

Frequently modify index Columns

Should not

Ying

Uniqueness of clustered Indexes

The order of formal clustered indexes is the physical storage order of data. Therefore, a table can have only one clustered index, because physical storage can have only one order. Because a table can only have one clustered index at most, it is more precious. It is critical to set clustering indexes for a table.

The biggest misunderstanding for beginners: automatically setting the primary key as a clustered Index

Because this is the default primary key behavior of sqlserver, if you set a primary key, it sets the primary key as a clustered index, and a table can only have one clustered index at most, therefore, many people set other indexes as non-clustered indexes. This is the biggest misunderstanding. Even some primary keys are meaningless auto increment fields. In this case, clustered index is a waste of help for efficiency.

As we have just said, clustering index has the best performance and uniqueness, so it is very precious and must be carefully set. Generally, you need to select a field based on the most common SQL query method of the table. A field can be used as a clustering index or a combination of clustering indexes. This depends on the actual situation.

In fact, when creating a table, you first need to set the primary key, then add the Cluster Index we want, and finally set the primary key, sqlserver automatically sets the primary key as a non-clustered index (automatically selected based on the situation ). If you have set the primary key as a clustered index, you must first Delete the primary key, add the expected clustered index, and then restore the primary key.

Remember, our ultimate goal is to minimize logical IO in the case of the same result set.

Let's start with a simple example.

A simple table:

Create Table [DBO]. [Table1] (

[ID] [int] identity (1, 1) not null,

[Data1] [int] not null default (0 )),

[Data2] [int] not null default (0 )),

[Data3] [int] not null default (0 )),

[Name1] [nvarchar] (50) not null default (''),

[Name2] [nvarchar] (50) not null default (''),

[Name3] [nvarchar] (50) default (''),

[Dtat] [datetime] not null default (getdate ())

Test data (10 million records ):

Declare @ I int

Set @ I = 1

While @ I <100000

Begin

Insert into Table1 ([data1], [data2], [data3], [name1], [name2], [name3])

Values (@ I, 2 * @ I, 3 * @ I, cast (@ I as nvarchar (50), cast (2 * @ I as nvarchar (50 )), cast (3 * @ I as nvarchar (50 )))

Set @ I = @ I + 1

End

Update Table1 set dtat = dateadd (S, data1, dtat)

Open the IO statistics and time statistics of the query Analyzer:

Set statistics Io on;

Set statistics time on;

Display the actual "Execution Plan ":

The most common SQL query is as follows:

Select * From Table1 where data1 = 2 order by dtat DESC;

First, set the primary key ID in Table1, and the system automatically creates a clustered index for the primary key.

Then execute the statement and the result is:

Table 'table1'. Scan count 1, logical reads 911, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server execution times:

CPU time = 16 MS, elapsed time = 7 ms.

 

Then we create a non-clustered index in the data1 and dtat fields respectively:

Create nonclustered index [n_data1] on [DBO]. [Table1]

(

[Data1] ASC

) With (sort_in_tempdb = OFF, drop_existing = OFF, ignore_dup_key = OFF, online = OFF) on [primary]

Create nonclustered index [n_dtat] on [DBO]. [Table1]

(

[Dtat] ASC

) With (sort_in_tempdb = OFF, drop_existing = OFF, ignore_dup_key = OFF, online = OFF) on [primary]

Run the statement again. The result is:

Table 'table1'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server execution times:

CPU time = 0 MS, elapsed time = 39 Ms.

 

We can see that no performance improvement is achieved when indexes are set up, and more time is consumed, so we can continue to adjust the indexes.

Then we delete all non-clustered indexes and delete the primary key, so that all indexes are deleted.Create a composite index data1 and dtat, and add the primary key:

Create clustered index [c_data=dtat] on [DBO]. [Table1]

(

[Data1] ASC,

[Dtat] ASC

) With (sort_in_tempdb = OFF, drop_existing = OFF, ignore_dup_key = OFF, online = OFF) on [primary]

Execute the statement again:

Table 'table1'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server execution times:

CPU time = 0 MS, elapsed time = 1 ms.

 

We can see that only the clustered index seek eliminates index scan and nested loop, and the execution time is only 1 ms, achieving the goal of initial optimization.

Composite Index Summary

Summary of the above optimization practices, pay attention to the choice of clustered index. First, we need to find the most commonly used SQL queries, such as the case in this example that is similar to the combination condition query. In this case, we 'd better use the combination clustering index, in addition, the most commonly used fields should be placed before the composite clustered index. Otherwise, the index will not have good results. See the following example:

The query condition falls on the second field of the composite index, causing index scan. The results are very poor. The execution time is:

Table 'table1'. Scan count 1, logical reads 238, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server execution times:

CPU time = 16 MS, elapsed time = 22 Ms.

If only the query condition is the first field, there is no problem because the leftmost prefix principle of the composite index is as follows:

Table 'table1'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server execution times:

CPU time = 0 MS, elapsed time = 1 ms.

It can be seen that the most used fields should be placed before the clustered index.

Why is index seek better than index scan?

Index scanning means traversing the B-tree, while seek is directly locating the B-tree search.

Most of the index scan results appear in the index column in the expression. The database engine cannot directly determine the value of the column you want, so it can only scan the entire index for calculation. Index seek is much better. The database engine only needs to scan several branch nodes to locate the records you want. If the leaf node of the clustered index is a record, then clustered index scan is basically equivalent to full table scan.

Some Optimization Principles

    1. 1. The index created by default is a non-clustered index, but sometimes it is not optimal. In a non-clustered index, data is physically stored on the data page randomly. Reasonable index design should be based on the analysis and prediction of various queries. Generally speaking:
      A. Columns with a large number of repeated values and frequent range queries (>,<,>=, <=) and order by and group by are available.
      Create a cluster index;
      B. Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index;
      C. The composite index should try its best to make the key Query Form an index overwrite. Its leading column must be the most frequently used column. Although indexes can improve performance, the more indexes, the better. On the contrary, too many indexes will lead to low system efficiency. Each time you add an index to a table, you must update the index set.
      2. Order by and gropu by use order by and group by phrases. Any index can improve select performance.

3. Before a multi-table operation is executed, the query optimizer will list several possible connection solutions based on the connection conditions and find the best solution with the minimum system overhead. The join conditions must fully consider the tables with indexes and tables with multiple rows. The selection of the internal and external tables can be determined by the formula: Number of matched rows in the outer table * Number of times each query is performed in the inner table, the minimum product is the best solution.
4. Any operation on a column will cause a table scan, including database functions and calculation expressions. During query, try to move the operation to the right of the equal sign.
5. In And or clauses usually use worksheets to invalidate indexes. If a large number of duplicate values are not generated, consider splitting the clause. The split clause should contain the index.

SQL Optimization Principle 2:
1. Use a smaller data type as much as possible to meet your needs: for example, use mediumint instead of Int.
2. Try to set all columns as not null. If you want to save null, set it manually instead of setting it as the default value.
3. Use varchar, text, and blob types as little as possible
4. If your data is only a small amount of data you know. It is best to use the enum type
 

Join principles

SQL Server has three types of Join Operations:

    • Nested loops joins
    • Merge joins
    • Hash joins

If the join input is small, for example, less than 10 rows, and other join input is large and the index is on its column, nested loops joins is the fastest. (For the reason, see Understanding nested loops joins)

If both join inputs are not small, but are sorted in the index column (for example, scanning sorted indexes obtained after scanning the sorted index), merge joins is the fastest. (For the reason, see Understanding merge joins)

Hash joins can effectively process a large number of input without sorting or indexing. It is particularly effective for processing intermediate results of complex queries. (For more information, see Understanding hash joins)

 

How to analyze SQL statements

Microsoft msdn gives the answer: http://msdn.microsoft.com/en-us/library/ms191227.aspx

 

Find the SQL statement with the worst performance in the database

Which table is optimized? Start from? First, locate the performance bottleneck and find the slowest running SQL. Follow these steps:

1. Run DBCC freeproccache to clear the cache

2. Run yourProgram, Or your SQL or stored procedure, operate the database

3. Run the following SQL statement to find the slowest SQL statement:

Select db_id (db. dbid) 'database name'
, Object_id (db. objectid) 'object'
, Qs. creation_time 'compile plan time'
, Qs. last_execution_time 'time of the last execution plan'
, Qs.exe cution_count 'number of executions'
, Qs. total_elapsed_time/1000 'total time used (seconds )'
, Qs. total_physical_reads 'Total number of physical reads'
, Qs. total_worker_time/1000 'total CPU time (seconds )'
, Qs. total_logical_writes 'Total number of logical writes'
, Qs. total_logical_reads n 'Total number of logical reads'
, Qs. total_elapsed_time/1000 n' total consumed time (seconds )'
, Substring (St. Text, (Qs. statement_start_offset/2) + 1,
(Case statement_end_offset
When-1 then datalength (St. Text)
Else Qs. statement_end_offset
End-Qs. statement_start_offset)/2) + 1) as 'Statement execution'
From SYS. dm_exec_query_stats as QS cross apply
SYS. dm_exec_ SQL _text (Qs. SQL _handle) as St inner join
(Select *
From SYS. dm_exec_cached_plans CP cross apply
SYS. dm_exec_query_plan (CP. plan_handle)
) DB
On Qs. plan_handle = dB. plan_handle
Where substring (St. Text, (Qs. statement_start_offset/2) + 1,
(Case statement_end_offset
When-1 then datalength (St. Text)
Else Qs. statement_end_offset
End-Qs. statement_start_offset)/2) + 1) Not like '% fetch %'
Order by QS. total_elapsed_time/1000 DESC

 

Use sqlserver profiler to find the SQL statement with the worst performance in the database

First open sqlserver profiler:

Click "new trace" in the toolbar, use the default template, and click "run.

An error may occur: "Only TrueType fonts are supported. There id not a TrueType font ". Don't worry. Click Tools> options and select a new font, such as vendana. (This is a Microsoft Bug)

After running, sqlserver profiler monitors database activities, so it is best to perform more operations on the database you want to monitor. And click STOP. Save the trace result to a file or table.

Save to table: In the menu "file"-"Save as"-"trace table", for example, enter a new table name of the master database: profiletrace, and save it.

Find the most time-consuming SQL statement:

Use master

Select * From profiletrace order by duration DESC;

After finding the performance bottleneck, you can perform specific tuning.

For more information about using sqlserver profiler, refer:

Http://www.codeproject.com/KB/database/DiagnoseProblemsSQLServer.aspx

 

Use sqlserver Database Engine Tuning Advisor database engine optimization Consultant

The preceding SQL Server Profiler is used to obtain trace. This optimization consultant can also be used. You can use it to get your optimization consultant from sqlserver. For example, you need to add an index to this table...

First open the database engine optimization consultant:

Then open the result of profiler (we saved it to the profiletrace table of the master database ):

Click "Start analysis" to view the optimization suggestions after the operation is complete (the index is finally suggested in the figure, and the performance is improved by 72%)

This method can be used to get your optimization consultant from sqlserver.

Continue reading:

    • SQL Server INDEX OPTIMIZATION practice (2)-index coverage

You mayArticleInterested:

    • How many questions have you thought about when programmers start a website?
    • CTO changes in the technical architecture of Douban network and Intranet
    • Ajax delayed asynchronous loading sidebar + server-side cache Ajax output
    • the second-level drop-down menu is hidden, and the CSS setting of Z-index does not work in IE. Solution
    • text with simple js to achieve the effect of the drive lamp (jquery is not required)
    • timeout settings and event processing for jquery and extjs
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.