SQL Server INDEX OPTIMIZATION
Clustered Index , The data stored in the table is stored in the order of indexes. , Higher retrieval efficiency than normal indexes , However, data is added / Modify / The deletion has a large impact.
Non-clustered Index , The data storage sequence in the table is not affected. , Low retrieval efficiency than clustered Indexes , Add data / Modify / The deletion has little impact.
How to make your SQL Faster running
---- People are using SQL Often fall into a misunderstanding, that is, too concerned with the result is correct, and ignore
Performance differences may exist between different implementation methods. This performance difference lies in large or complex databases.
Environment (such as online transaction processing OLTP Or Decision Support System DSS . My work practices
Found, bad SQL It often comes from inappropriate index design, inadequate connection conditions, and unoptimizable Whe
Re Clause. After they are properly optimized, their running speed is significantly improved! Below I will
Summary:
---- To more intuitively describe the problem SQL The running time is tested.
Indicates ( <1 Seconds ).
---- Test Environment --
---- Host: HP LH II
---- Clock speed: 330 MHz
---- Memory: 128 MB
---- Operating System: Operserver 5.0.4
---- Database: Sybase11.0.3
I. unreasonable index design
---- Example: Table Record Yes 620000 Rows and shards are under different indexes. SQL Running status:
---- 1. In Date A non-cluster index is created.
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (25 Seconds )
Select date, sum (amount) from record group by date
(55 Seconds )
Select count (*) from record where date>
'2013' and place in ('bj ', 'sh') (27 Seconds )
---- Analysis:
---- Date There are a large number of duplicate values. In a non-clustered index, data is physically stored on the data page at random
During range search, you must perform a table scan to find all rows in this range.
---- 2. In Date On a Cluster Index
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 ( 14 Seconds)
Select date, sum (amount) from record group by date
( 28 Seconds)
Select count (*) from record where date>
'2013' and place in ('bj ', 'sh ') ( 14 Seconds)
---- Analysis:
---- Under the cluster index, data is physically stored on the data page in order, and duplicate values are arranged together.
When searching, you can first find the start and end points of this range, and only scan the data page within this range to avoid
Perimeter scanning improves the query speed.
---- 3. In Place , Date , Amount Composite Index on
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 ( 26 Seconds)
Select date, sum (amount) from record group by date
( 27 Seconds)
Select count (*) from record where date>
'2013' and place in ('bj ', 'sh ') ( <1 Seconds)
---- Analysis:
---- This is an unreasonable composite index because its leading column is Place , Article 1 and Article 2 SQL No reference
Use Place Therefore, the index is not used. The third SQL Used Place And all referenced columns are included in the group.
The index overwrite is formed, so it is very fast.
---- 4. In Date , Place , Amount Composite Index on
Select count (*) from record where date>
'20140901' and date <'20140901' and amount>
2000 (<1 Seconds )
Select date, sum (amount) from record group by date
( 11 Seconds)
Select count (*) from record where date>
'2013' and place in ('bj ', 'sh ') ( <1 Seconds)
---- Analysis:
---- This is a reasonable composite index. It sets Date As the leading column SQL You can use the index and
And in the first and third SQL And thus achieve the optimal performance.
---- 5. Summary:
---- The index created by default is a non-clustered index, but sometimes it is not the best. A reasonable index design requires
It is based on the analysis and prediction of various queries. Generally speaking:
---- ① . There are a large number of repeated values and frequent range queries
( Between,>, < , >=, <= ) And Order
, Group Create a cluster index for the columns that occur;
---- ② . Multiple columns are frequently accessed at the same time, and each column contains duplicate values. You can consider creating a composite index;
---- ③ . Composite indexes should try to overwrite key queries. The leading column must be the most frequently used column.
2. Incomplete connection conditions:
----Example: Table Card Yes 7896 Line, in Card_no There is a non-clustered index, table Account Yes 191122 Line, in
Account_no There is a non-clustered index on the two tables. SQL Execution:
Select sum (A. Amount) from account,
Card B where a. card_no = B. card_no ( 20 Seconds)
---- Set SQL Changed:
Select sum (A. Amount) from account,
Card B where a. card_no = B. card_no and.
Account_no = B. account_no ( <1 Seconds)
---- Analysis:
---- In the first connection condition, the best query solution is Account As an outer table, Card As an inner table, using
Card On the index, its I/O The number of times can be estimated by the following formula:
---- Outer table Account On 22541 Page + (Outer table Account Of 191122 Line * Inner table Card Corresponds to outer layer
The first row of the table to be searched 3 Page) = 595907 Times I/O
---- In the second connection condition, the best query solution is Card As an outer table, Account As an inner table, using
Account On the index, its I/O The number of times can be estimated by the following formula:
---- Outer table Card On 1944 Page + (Outer table Card Of 7896 Line * Inner table Account Corresponds to each outer table
The 4 Page) = 33528 Times I/O
---- It can be seen that only a full set of connection conditions can be executed for the best solution.
---- Summary:
---- 1. Before a multi-table operation is executed, the query optimizer will list several groups of potential consumers based on the connection conditions.
And find the best solution with the minimum system overhead. The connection conditions must fully consider the tables with indexes and the number of rows
Table; the choice of the internal and external table can be determined by the formula: Number of matched rows in the outer table * The number of times each query is performed in the inner table.
The minimum product is the best solution.
---- 2. How to view the execution plan -- Use Set showplanon , Open Showplan You can see the connection
The order of access and the index information to use. For more detailed information, you need SA Role execution DBCC (3604,310, 30
2) .
Iii. unoptimizable Where Clause
---- 1. Example: SQL The column in the Condition Statement has an appropriate index, but the execution speed is very slow:
Select * from record where
Substring (card_no, 5378) = '123' (13 Seconds )
Select * from record where
Amount/30 <1000 ( 11 Seconds)
Select * from record where
Convert (char (10), date, 112) = '20140901' ( 10 Seconds)
---- Analysis:
---- Where The result of any operation on the column in the clause is SQL The column-by-column calculation result is required.
The index on the column is not used for table search. If these results are obtained during query compilation
You can SQL The optimizer optimizes and uses indexes to avoid table search. SQL Rewrite it as follows:
Select * from record where card_no like
'123' ( <1 Seconds)
Select * from record where amount
& Lt; 1000*30 ( <1 Seconds)
Select * from record where date =' 2017/12/01 '
( <1 Seconds)
---- You will find SQL Faster!
---- 2. Example: Table Stuff Yes 200000 Row, Id_no There is a non-Cluster Index, please refer to the following SQL :
Select count (*) from stuff where id_no in ('0', '1 ')
( 23 Seconds)
---- Analysis:
---- Where In the condition 'In' Logically equivalent 'Or' , So the syntax analyzer will In ('0', '1 ') Conversion
Is Id_no = '0' or id_no = '1' . We expect that it will Or Clause searches separately, and then returns
Add, so that you can use Id_no But (according Showplan ) , It uses "Or Policy "
That is, each Or Clause row, save it to the worksheet of the temporary database, and create a unique index to remove it.
Repeat the rows and finally calculate the results from this temporary table. Therefore, the actual process is not used. Id_no Index and complete
Time is subject Tempdb The impact of database performance.
---- Practice has proved that the more rows in a table, the worse the performance of the worksheet. Stuff Yes 620000 When running
Actually achieved 220 Seconds! It is better Or Separate clauses:
Select count (*) from stuff where id_no = '0'
Select count (*) from stuff where id_no = '1'
---- Two results are obtained, and the addition is more cost-effective. Because each sentence uses an index, the execution time is only 3 Seconds,
In 620000 The time is only 4 Seconds. Or, write a simple stored procedure in a better way:
Create proc count_stuff
Declare @ A int
Declare @ B INT
Declare @ C int
Declare @ d char (10)
Begin
Select @ A = count (*) from stuff where id_no = '0'
Select @ B = count (*) from stuff where id_no = '1'
End
Select @ C = @ A + @ B
Select @ d = convert (char (10), @ C)
Print @ d
---- Calculate the result directly, and the execution time is as fast as above!
---- Summary:
---- It can be seen that the so-called optimization is Where The clause uses the index. If the index is not optimized, table scanning or additional overhead occurs.
---- 1. Any operations on the column will cause the table to scan, including database functions, calculation expressions, and so on.
Move the operation to the right of the equal sign as much as possible.
---- 2.In , Or The clause usually uses a worksheet to invalidate the index. If a large number of duplicate values are not generated, consider
Separate the sub-statement. The split sub-statement should contain the index.
---- 3. Be good at using stored procedures. SQL It becomes more flexible and efficient.
----From the above examples, we can see that, SQL The essence of optimization is to use the optimizer
To identify the statement and use the index in full to reduce the number of table scans I/O Times to avoid table search. Actually S
QL Performance optimization is a complex process. The above is only a manifestation of the application layer.
Resource Configuration involving the database layer, traffic control at the network layer, and overall design of the operating system layer
Reprinted hereArticleAnd want to help some friends.