T1 table 100 billion data, (insertion time 36 minutes, COUNT (*) query 19 seconds, space occupied about 670M)
1. Truly full use of the index
such as like ' Zhang% ' is in line with Sarg (meet the scanning parameters) standard
And like '% ' is not the standard
Wildcard% use of the first character of the string causes the index to be unusable, although it is difficult to avoid it in practice, it should be understood, at least, that the performance of this usage is low.
**********************************************
2. The "non" operator does not satisfy the Sarg form, making the index unusable
Statements that do not satisfy the Sarg form typically include non-operator statements, such as: not,! =, <>,!<,!>, not EXISTS, not and not.
If you use not or <>, it is best to convert to another method, such as the following example:
T1 table 100 billion data, constructed as follows: (insert time 36 minutes, COUNT (*) query 19 seconds, space occupied about 670M)
DECLARE @i INT
SET @i = 1
While @i<1000000
BEGIN
INSERT into T1 VALUES (' Zhang ' +convert (char (), @i), ' 3.2 ', 77);
SET @i + 1;
END
Three ways to inquire:
SELECT * from t1 WHERE ID <>300000
SELECT * FROM T1 WHERE ID not in (300000)
SELECT * from T1 WHERE ID >299999 and ID < 300001
It is clear from the execution plan that the last method is used instead of the previous two ways of querying.
That's what the internet says, but you do the test 100W data, the cost plan is the same.
*********************************************
3. The function operation does not satisfy the Sarg form, causes the index to be unusable
Example: The columns in the following SQL conditional statements are indexed appropriately, but are very slow to execute:
SELECT * FROM record where substring (card_no,1,4) =′5378′ (13 seconds)
SELECT * from record where amount/30< 1000 (11 seconds)
SELECT * FROM record where convert (char (ten), date,112) =′19991201′ (10 seconds)
Analysis:
Any action on a column in the WHERE clause results in a column-wise calculation at the SQL runtime, so it has to perform a full-table scan without using the index above the column, and if the results are available at query compile time, it can be optimized by the SQL optimizer, using the index, avoiding table searches, So rewrite the SQL as follows:
SELECT * from record where Card_no like′5378%′ (< 1 sec)
SELECT * FROM record where amount < 1000*30 (< 1 seconds)
SELECT * FROM record where date=′1999/12/01′ (< 1 seconds)
You'll find SQL significantly faster
Waiting to be tested .....
**********************************************
4. Try not to make any direct processing of indexed fields
SELECT * from employs where first_name + last_name = ' Beill Cliton ';
The index cannot be used, instead:
SELECT * FROM Employee where
first_name = substr (' Beill Cliton ', 1,instr (' Beill Cliton ', ')-1)
and
last_name = substr (' Beill Cliton ', InStr (' Beill Cliton ', ') +1)
You can use the index
***********************************************
5. Different types of index performance is not the same, should be used as high-performance
For example, the index lookup efficiency of numeric type is higher than string type, and the index of fixed length string Char,nchar is higher than that of variable length string Varchar,nvarchar.
should be
Where Username= ' Zhang San ' and age>20
improved to
where age>20 and Username= ' Zhang San '
Note: In this case, the Query Analysis optimization feature of SQL can automatically rearrange the order of conditions, but it is recommended to manually arrange them beforehand.
**************************************************
6. In some cases the in function is equivalent to or, and neither can fully utilize the index
Example: Table stuff has 200000 rows, id_no on the non-clustered index, see the following sql:
Select COUNT (*) from stuff where id_no in (′0′,′1′) (23 seconds)
We expect it to look for each or clause separately, and then add the results together, so that you can take advantage of the index on the id_no, but in fact it takes the "or policy", which is to take out the rows that satisfy each or clause, deposit the worksheets in the staging database, and then create a unique index to remove the duplicate rows. Finally, the result is calculated from this temporary table. Therefore, the actual process does not take advantage of the index on Id_no, and the completion time is affected by the tempdb database performance.
Practice has shown that the more rows the table, the worse the performance of the worksheet, when the stuff has 620000 lines, the execution time will be very long! If you determine that different conditions do not produce a large number of duplicate values, you might as well separate the OR clause:
Select COUNT (*) from stuff where id_no=′0′
Select COUNT (*) from stuff where id_no=′1′
Get two results, then use union as an addition. Because each sentence uses an index, the execution time is relatively short,
Select COUNT (*) from stuff where id_no=′0′
Union
Select COUNT (*) from stuff where id_no=′1′
From a practical point of view, using Union is generally more efficient than using or, whereas the exist keyword and in keyword are similar in usage and similar in performance, resulting in a full table scan, which is inefficient, and exist may be faster than in, according to unverified statements.
***************************************************
7. Use workarounds to improve query efficiency
The LIKE keyword supports wildcard matching, but this is a particularly time-consuming match. For example: SELECT * FROM customer where zipcode like "21_ _ _", even if an index has been established on the ZipCode field, in which case a full table scan may be used. If you change the statement to: SELECT * from Customer where zipcode > "21000", the index is used when executing the query, greatly increasing the speed. But this flexibility is limited, should not cause loss of business sense, for postal code, zipcode like "21_ _ _" and ZipCode > "21000" meaning is exactly the same.
Rengeyouzhi, but riches in the day, life allows wandering, but not allowed to wasted.
8.order by clustered index column to sort most efficiently
Sorting is a time-consuming operation, and you should try to simplify or avoid sorting large tables, such as narrowing the range of sorted columns, sorting only on indexed columns, and so on.
Let's see: (GID is the primary key, Fariqi is the aggregate index column)
Select top 10000 gid,fariqi,reader,title from Tgongwen
Spents: 196 milliseconds. Scan count 1, logic read 289 times, physical read 1 times, pre-read 1527 times.
Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by GID ASC
Spents: 4720 milliseconds. Scan count 1, logic read 41,956 times, physical read 0 times, pre-read 1287 times.
Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by gid Desc
Spents: 4736 milliseconds. Scan count 1, logic read 55,350 times, physical read 10 times, pre-read 775 times.
Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by Fariqi ASC
Spents: 173 milliseconds. Scan count 1, logic read 290 times, physical read 0 times, pre-read 0 times.
Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER BY Fariqi Desc
Spents: 156 milliseconds. Scan count 1, logic read 289 times, physical read 0 times, pre-read 0 times.
At the same time, in order to sort by a field, whether it is a positive or reverse order, the speed is basically equivalent.
********************************************************
9. Measures to save data query system overhead
(1) Use top to minimize the amount of data taken out
(2) field extraction to avoid "select *" according to the principle of "how much and how much to ask"
The larger the field size, the more resources the select consumes, such as a field of type int that is much faster than taking char. Each time we extract a field, the data extraction speed will be improved accordingly. The amplitude of ascension is judged by the size of the discarded field.
(3) The COUNT (*) comparison with the count (field) method
The speed with count (*) and COUNT (primary key) is equivalent, and count (*) is faster than any other field summary except the primary key, and the longer the field, the slower the rollup. If you use COUNT (*), SQL Server automatically finds the smallest field to summarize. Of course, if you write directly count (primary key) will come more directly
(4) If you have nested queries, filter out the data in the inner layer as much as possible
If a column appears in both the main query and the WHERE clause, it is likely that the subquery must be queried again once the column values in the main query have changed. And the more nesting level of query, the lower the efficiency, so we should try to avoid sub-query. If the subquery is unavoidable, filter out as many rows as possible in the subquery
(5) When a multi-table association query, you need to pay attention to the table order, and filter out the data as early as possible
When using joins for multi-table association queries, you should use a scenario with minimal system overhead. Join conditions to fully consider the table with the index, the number of rows of tables, and pay attention to optimize the table order; The simple point is to reduce the amount of data that will be associated afterwards as early as possible.
In general, SQL Server automatically optimizes connections to tables. For example:
Select Name,no from A
Join B on A. id=b.id
Join C on C.id=a.id
Where Name= ' Wang '
Although the a table is listed first in from, then B, and finally C. However, SQL Server may use the C table first. The principle of its selection is that the total amount of data found in other tables can be reduced in relation to the query being limited to a single line or a few rows. In most cases, SQL Server makes the best choice, but if you find that a complex join query is slower than expected, you can use the SET FORCEPLAN statement to force SQL Server to use the table in the order in which the tables appear. As in the example above: Set Forceplan on.......set Forceplan the execution order of the OFF table will be executed in the order you wrote. View the 2 execution efficiencies in Query Analyzer to select the connection order for the table. The disadvantage of SET Forceplan is that it can only be used in stored procedures
Original: Http://hi.baidu.com/mayw1985/item/2092f0427fcdf5e6dc0f6cab
SQLSERVER2008R2 correct use of indexes