SQL Server INDEX OPTIMIZATION

Source: Internet
Author: User

Clustered index,The data stored in the table is stored in the order of indexes. the retrieval efficiency is higher than that of normal indexes, but it has a greater impact on the addition, modification, and deletion of data.

Non-clustered IndexDoes not affect the data storage sequence in the table. The retrieval efficiency is lower than that of clustered indexes, and has little impact on data addition, modification, and deletion.

How to make your SQL run faster
---- People tend to fall into a misunderstanding when using SQL, that is, they are too concerned about whether the obtained results are correct and ignore
Performance differences may exist between different implementation methods. This performance difference lies in large or complex databases.
Especially in environments (such as online transaction processing (OLTP) or decision support system DSS. My work practices
It is found that poor SQL statements often come from inappropriate index design, unfilled connection conditions, and unoptimized whe.
Re clause. After they are properly optimized, their running speed is significantly improved! Below I will
Summary:
---- For more intuitive explanation of the problem, the SQL running time of all instances has been tested, and the average time is no longer than 1 second.
Indicates (<1 second ).
---- Test environment --
---- HOST: HP LH II
---- Clock speed: 330 MHz
---- Memory: 128 MB
---- Operating System: operserver5.0.4
---- Database: sybase11.0.3
I. unreasonable index design
---- For example, a table with 620000 rows of record and rows with different indexes has the following SQL statements:
---- 1. A non-cluster index is created on date.
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (25 seconds)
Select date, sum (amount) from record group by date
(55 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (27 seconds)
---- Analysis:
---- There are a large number of duplicate values on date. In non-clustered indexes, data is physically stored on the data page at random
During range search, you must perform a table scan to find all rows in this range.
---- 2. A cluster index on Date
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (14 seconds)
Select date, sum (amount) from record group by date
(28 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (14 seconds)
---- Analysis:
---- Under the cluster index, data is physically stored on the data page in order, and duplicate values are arranged together.
When searching, you can first find the start and end points of this range, and only scan the data page within this range to avoid
Perimeter scanning improves the query speed.
---- 3. composite indexes on place, date, and amount
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (26 seconds)
Select date, sum (amount) from record group by date
(27 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (<1 second)
---- Analysis:
---- This is an unreasonable composite index, because its leading column is place, and the first and second SQL statements do not reference
Place is used, so no index is used. The third SQL uses place, and all referenced columns are included in the group.
The index overwrite is formed, so it is very fast.
---- 4. composite indexes on date, place, and amount
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (<1 second)
Select date, sum (amount) from record group by date
(11 seconds)
Select count (*) from record where date>
'123' and place in ('bj ', 'sh') (<1 second)
---- Analysis:
---- This is a reasonable combination of indexes. It uses date as the leading column so that each SQL can use the index and
In addition, index coverage is formed in the first and third SQL statements, so the performance is optimal.
---- 5. Summary:
---- The index created by default is a non-clustered index, but sometimes it is not the best. A reasonable index design requires
It is based on the analysis and prediction of various queries. Generally speaking:
---- ① There are a large number of repeated values and frequent range queries
(Between, >,<, >=, <=) and order
For columns generated by group by, you can consider creating a cluster index;
---- ②. Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index;
---- ③ The composite index should try to overwrite key queries, and its leading column must be the most frequently used column.

2. Incomplete connection conditions:
---- For example, a table card contains 7896 rows, a non-clustered index on card_no, and a table account contains 191122 rows.
There is a non-clustered index on account_no. The execution of the two SQL statements of explain in different table connection conditions:

Select sum (A. Amount) from account,
Card B where a. card_no = B. card_no (20 seconds)
---- Change SQL:
Select sum (A. Amount) from account,
Card B where a. card_no = B. card_no and.
Account_no = B. account_no (<1 second)
---- Analysis:
---- In the first connection condition, the best query solution is to use the account as the outer table, and use the card as the inner table.
The number of I/O times of indexes on the card can be estimated by the following formula:
---- 22541 page + on the account of the outer table (the second row of the account of the outer table * Corresponds to the outer layer on the card of the inner table
3 pages to be searched in the first row of the table) = 595907 times I/O
---- In the second join condition, the best query solution is to use card as the outer table, and account as the inner table, using
The number of I/O times of an index on an account can be estimated by the following formula:
---- 1944 page + on the card of the outer table (the first row of the card of the outer table * corresponding to each of the outer tables on the account of the inner table
4 pages of the row to be searched) = 33528 times I/O
---- It can be seen that only a full set of connection conditions can be executed for the best solution.
---- Conclusion:
---- 1. Before a multi-table operation is executed, the query optimizer will list several groups of possible connected parties based on the connection conditions.
And find the best solution with the minimum system overhead. The connection conditions must fully consider the tables with indexes and the number of rows
Table; the choice of the internal and external table can be determined by the formula: Number of matched rows in the outer table * Number of times each query is performed in the inner table, multiplied
The minimum product is the best solution.
---- 2. view the execution method -- use set showplanon to open the showplan option and you will see the connection
The order of access and the index used. For more detailed information, you need to use the SA role to execute DBCC (3604,310, 30
2 ).
3. Where clause that cannot be optimized
---- 1. For example, the columns in the following SQL condition statements have an appropriate index, but the execution speed is very slow:
Select * from record where
Substring (card_no, 5378) = '000000' (13 seconds)
Select * from record where
Amount/30 <1000 (11 seconds)
Select * from record where
Convert (char (10), date, 112) = '000000' (10 seconds)
---- Analysis:
---- Any column operation results in the WHERE clause are calculated by column during SQL Execution, so it has
The index on the column is not used for table search. If these results are obtained during query compilation
It can be optimized by the SQL optimizer and indexed to avoid table search. Therefore, rewrite the SQL statement as follows:
Select * from record where card_no like
'123' (<1 second)
Select * from record where amount
<1000*30 (<1 second)
Select * from record where date = '2014/1/01'
(<1 second)
---- You will find that SQL is obviously getting faster!
---- 2. For example, the stuff table has 200000 rows and the id_no table has non-clustered indexes. Please refer to the following SQL statement:
Select count (*) from stuff where id_no in ('0', '1 ')
(23 seconds)
---- Analysis:
---- 'In' in the where condition is logically equivalent to 'or', so the syntax analyzer converts in ('0', '1 ')
Id_no = '0' or id_no = '1. We expect it to search for each or clause separately, and then return the result
Add, so that the index on id_no can be used; but in fact (according to showplan), it uses the "or policy"
That is, first extract the rows that satisfy each or clause, store them to the worksheet of the temporary database, and then create a unique index to remove
Repeat the rows and finally calculate the results from this temporary table. Therefore, the id_no index is not used in the actual process.
The time is also affected by the performance of the tempdb database.
---- Practice has proved that the more rows in a table, the worse the performance of the worksheet. When stuff has 620000 rows
220 seconds! It is better to separate the or clause:
Select count (*) from stuff where id_no = '0'
Select count (*) from stuff where id_no = '1'
---- Two results are obtained, and the addition is worthwhile. Because each sentence uses an index, the execution time is only 3 seconds,
In the case of Row 3, the time is only 4 seconds. Or, write a simple stored procedure in a better way:
Create proc count_stuff
Declare @ A int
Declare @ B INT
Declare @ C int
Declare @ d char (10)
Begin
Select @ A = count (*) from stuff where id_no = '0'
Select @ B = count (*) from stuff where id_no = '1'
End
Select @ C = @ A + @ B
Select @ d = convert (char (10), @ C)
Print @ d
---- Calculate the result directly, and the execution time is as fast as above!
---- Conclusion:
---- It can be seen that the WHERE clause uses the index and cannot be optimized, that is, table scanning or additional overhead occurs.

---- 1. Any operations on columns will cause table scanning, including database functions and calculation expressions.
Move the operation to the right of the equal sign as much as possible.
---- 2.in, or clauses usually use worksheets to invalidate indexes. If a large number of duplicate values are not generated, consider
Separate the sub-statement. The split sub-statement should contain the index.
---- 3. Be good at using stored procedures to make SQL more flexible and efficient.
---- From the above examples, we can see that the essence of SQL optimization is to use the optimizer
To identify the statement and use indexes in full, reduce the number of I/O scans on the table, and avoid table search as much as possible. Actually s
The performance optimization of Ql is a complex process. These are only a manifestation of the application layer.
Resource Configuration involving the database layer, traffic control at the network layer, and overall design of the operating system layer

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.