Correct use of indexes

Source: Internet
Author: User

T1 table 100 billion data, (insertion time 36 minutes, COUNT (*) query 19 seconds, space occupied about 670M)

1. Really full use of index such as like ' Zhang% ' is in line with the SARG (meet the scanning parameters) standard and like '% Zhang ' does not conform to the standard

Wildcard% use of the first character of the string causes the index to be unusable, although it is difficult to avoid it in practice, it should be understood, at least, that the performance of this usage is low.

**********************************************

2. The "non" operator does not satisfy the Sarg form, making the index unusable for statements that do not satisfy the Sarg form the most typical case is non-operator statements such as: not,! =, <>,!<,!>, not EXISTS, not, and not. If you use not or <>, it is best to convert to another method, such as the following example:

T1 table 100 billion data, constructed as follows: (insert time 36 minutes, COUNT (*) query 19 seconds, space occupied about 670M)

DECLARE @i INT SET @i = 1 while @i<1000000 BEGIN INSERT into T1 VALUES (' Zhang ' +convert (char (), @i), ' 3.2 ', 77); SET @i + 1; END

Three ways to inquire:

SELECT * from t1 where ID <>300000 a select * from T1 where ID not in (300000) SELECT * from T1 where ID >299999 an D ID < 300001

It is clear from the execution plan that the last method is used instead of the previous two ways of querying. That's what the internet says, but you do the test 100W data, the cost plan is the same.

*********************************************

3. The function operation does not satisfy the Sarg form, causes the index to be unable to use the example: the following SQL condition statement has the proper index in the column, but the execution speed is very slow:

SELECT * FROM record where substring (card_no,1,4) =′5378′ (13 seconds) select * from record where amount/30< 1000 (11 seconds) SELECT * From record where convert (char (ten), date,112) =′19991201′ (10 seconds)

Analysis:

Any action on a column in the WHERE clause results in a column-wise calculation at the SQL runtime, so it has to perform a full-table scan without using the index above the column, and if the results are available at query compile time, it can be optimized by the SQL optimizer, using the index, avoiding table searches, So rewrite the SQL as follows:

SELECT * from record where Card_no like′5378%′ (< 1 seconds) select * from record where amount < 1000*30 (< 1 seconds) SELECT * From record where date=′1999/12/01′ (< 1 seconds)

You'll find SQL significantly faster

Waiting to be tested .....

**********************************************

4. Try not to make any direct processing of indexed fields

SELECT * from employs where first_name + last_name = ' Beill Cliton ';

The index cannot be used, instead:

SELECT * FROM employee WHERE first_name = substr (' Beill Cliton ', 1,instr (' Beill Cliton ', ')-1) and last_name = substr (' bei ll Cliton ', InStr (' Beill Cliton ', ') +1)

You can use the index

***********************************************

5. Different types of index performance is not the same, should be used as high efficiency as possible first: index lookup efficiency of numeric types is higher than string type, fixed-length string Char,nchar index efficiency is higher than the variable-length string Varchar,nvarchar index. Where Username= ' Zhang San ' and age>20 should be improved to where age>20 and Username= ' Zhang San ' note: Here, the query analysis optimizations for SQL can automatically rearrange the order of conditions, but it is recommended to manually arrange them beforehand.

**************************************************

6. In some cases the role of in is comparable to or, and neither can be fully utilized in the index example: Table stuff has 200000 rows, and Id_no has a non-clustered index, see this sql:select count (*) from stuff where id_no in (′0′,′1 (23 sec) We expect that it will look up each or clause separately, and then add the results together, so that it can take advantage of the index on the id_no, but in fact it takes the "or policy", which is to take the rows that satisfy each or clause first, the worksheets in the staging database, and the unique index to remove the duplicate rows. Finally, the result is calculated from this temporary table. Therefore, the actual process does not take advantage of the index on Id_no, and the completion time is affected by the tempdb database performance. Practice has shown that the more rows the table, the worse the performance of the worksheet, when the stuff has 620000 lines, the execution time will be very long! If you determine that different conditions do not produce a large number of duplicate values, you might as well separate the OR clause:

Select COUNT (*) from stuff where id_no=′0′select count (*) from stuff where id_no=′1′

Get two results, then use union as an addition. Because each sentence uses an index, the execution time is relatively short,

Select COUNT (*) from stuff where Id_no=′0′union select count (*) from stuff where id_no=′1′

From a practical point of view, using Union is generally more efficient than using or, whereas the exist keyword and in keyword are similar in usage and similar in performance, resulting in a full table scan, which is inefficient, and exist may be faster than in, according to unverified statements.

***************************************************

7. Use workarounds to improve query efficiency

The LIKE keyword supports wildcard matching, but this is a particularly time-consuming match. For example: SELECT * FROM customer where zipcode like "21_ _ _", even if an index has been established on the ZipCode field, in which case a full table scan may be used. If you change the statement to: SELECT * from Customer where zipcode > "21000", the index is used when executing the query, greatly increasing the speed. But this flexibility is limited, should not cause loss of business sense, for postal code, zipcode like "21_ _ _" and ZipCode > "21000" meaning is exactly the same.

Rengeyouzhi, but riches in the day, life allows wandering, but not allowed to wasted.

8.order by sorted by clustered index column The most efficient sort is a time-consuming operation, and you should try to simplify or avoid sorting large tables, such as narrowing the range of sorted columns, sorting only on indexed columns, and so on. Let's see: (GID is the primary key, Fariqi is the Aggregate index column) SELECT top 10000 gid,fariqi,reader,title from Tgongwen spents: 196 milliseconds. Scan count 1, logic read 289 times, physical read 1 times, pre-read 1527 times.

Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by GID ASC: 4720 Ms. Scan count 1, logic read 41,956 times, physical read 0 times, pre-read 1287 times.

Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by gid Desc: 4736 Ms. Scan count 1, logic read 55,350 times, physical read 10 times, pre-read 775 times.

Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by Fariqi ASC spents: 173 milliseconds. Scan count 1, logic read 290 times, physical read 0 times, pre-read 0 times.

Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER BY Fariqi Desc: 156 Ms. Scan count 1, logic read 289 times, physical read 0 times, pre-read 0 times.

At the same time, in order to sort by a field, whether it is a positive or reverse order, the speed is basically equivalent.

********************************************************

9. Measures to save data query system overhead (1) Use top to minimize the amount of data fetched (2) field extraction to avoid the "select *" field size, the larger the number, the more resources the select consumes, For example, a field of type int would be much faster than taking a char. Each time we extract a field, the data extraction speed will be improved accordingly. The amplitude of ascension is determined by the size of the discarded field (3) the Count (*) is comparable to the count (field) method with Count (*) and with count (primary key), and COUNT (*) is faster than any other field summary except the primary key, and the longer the field, The faster the rollup is. If you use COUNT (*), SQL Server automatically finds the smallest field to summarize. Of course, if you directly write count (primary key) will come more directly (4) when there is a nested query, as far as possible to filter out the data in the inner layer if a column appears in both the main query and the WHERE clause, it is likely that the subquery must be queried again once the column values in the main query have changed. And the more nesting level of query, the lower the efficiency, so we should try to avoid sub-query. If the subquery is unavoidable, then in the subquery to filter out as many rows (5) Multi-table association query, you need to pay attention to the table order, and as early as possible to filter out the data when using join for multi-table association query, should use the least system overhead scheme. Join conditions to fully consider the table with the index, the number of rows of tables, and pay attention to optimize the table order; The simple point is to reduce the amount of data that will be associated afterwards as early as possible.

In general, SQL Server automatically optimizes connections to tables. For example: Select Name,no from a joins B on A. Id=b.id join C on c.id=a.id where Name= ' Wang ' although the A table is listed first in from, then B, and finally C. However, SQL Server may use the C table first. The principle of its selection is that the total amount of data found in other tables can be reduced in relation to the query being limited to a single line or a few rows. In most cases, SQL Server makes the best choice, but if you find that a complex join query is slower than expected, you can use the SET FORCEPLAN statement to force SQL Server to use the table in the order in which the tables appear. As in the example above: Set Forceplan on.......set Forceplan the execution order of the OFF table will be executed in the order you wrote. View the 2 execution efficiencies in Query Analyzer to select the connection order for the table. The disadvantage of SET Forceplan is that it can only be used in stored procedures.

Correct use of indexes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.