SQL Server improves SQL statements with indexes

Source: Internet
Author: User

a lot of people don't know How SQL statements are executed in SQL Server, and they worry that the SQL statements they write are misunderstood by SQL Server. Like what:

1.select * FROM table1 where name= ' Zhangsan ' and TID > 10000 and execute SELECT * FROM table1 where TID > 10000 and name= ' Z Hangsan "

Some people do not know whether the execution efficiency of the above two statements is the same, because if it is simple from the statement, the two statements are indeed different, if the TID is an aggregate index, then the last sentence only from the table of 10,000 records after the row , and the previous sentence to look at the whole table to see a few name= ' Zhangsan ', and then based on the constraints of conditions tid>10000 to propose the results of the query.

in fact, such worries are unnecessary. There is a query analysis optimizer in SQL Server that calculates the search criteria in the WHERE clause and determines which index narrows the search space for table scans, which means that it can be automatically optimized.

Although the query optimizer can automate query optimization based on the WHERE clause, it is still necessary to understand how the query optimizer works, if not, and sometimes the query optimizer does not query quickly as you intended.

during the query analysis phase, the query optimizer looks at each stage of the query and decides whether it is useful to limit the amount of data that needs to be scanned. If a stage can be used as a scanning parameter (SARG), then it is called an optimization, and the index can be used to quickly obtain the required data.

Sarg definition: Used to limit the search to an operation, because it usually refers to a specific match, a worthy range of matching or more than two conditions and connection. The form is as follows:

Column Name operators < constants or variables > or < constants or variables > operator column names

Column names can appear on one side of the operator, while constants or variables appear on the other side of the operator. Such as:

Name= ' Zhang San '

Price >5000

5000< Price

Name= ' Zhang San ' and price >5000

If an expression does not meet Sarg, it cannot limit the scope of the search, which means that SQL Server must determine for each row whether it satisfies all the conditions in the WHERE clause. So an index is useless for an expression that does not satisfy the Sarg form.

after the introduction After Sarg, let's summarize the experience of using SARG and some of the findings that have been encountered in practice and in some materials:

1. Whether a like statement belongs to Sarg depends on the type of wildcard you are using

such as: Name like' Zhang% ', which belongs to Sarg

and:name like '% Zhang ', does not belong to Sarg.

The reason is that the wildcard % is opened in the string so that the index is unusable.

2, or will cause a full table scan

Name= ' Zhang San ' and price >5000 symbol SARG, while: Name= ' Zhang San ' or price >5000 does not conform to SARG. Using or causes a full table scan.

3. Non-operator, function-induced statements that do not satisfy the Sarg form

not satisfied The most typical case of a SARG form is a statement that includes non-operator statements, such as not,! =, <>,!<,!>, not EXISTS, not, not, and so on, plus functions. Here are a few examples that do not satisfy the Sarg form:

ABS (Price) <5000

Name like '% three '

Some expressions, such as:

WHERE Price *2>5000

SQL Server will also assume that Sarg,sql server will convert this type to:

WHERE Price >2500/2

but we do not recommend such use, because sometimes SQL Server does not guarantee that this conversion is completely equivalent to the original expression.

4, in the role of equivalent and OR

Statement:

SELECT * FROM table1 where tid in (2,3) and select * FROM table1 where tid=2 or tid=3

is the same, it will cause a full table scan, and if there is an index on the TID, its index will be invalidated.

5, try to use less

6, exists and in execution efficiency is the same

Much of the data showsthat exists is more efficient than in, and should be used instead of not exists as much as possible. But in fact, I experimented with it and found that both the implementation efficiency is the same, both in front and without. Because of the subquery involved, we experimented with the pubs database that comes with SQL Server. We can open the statistics I/O State of SQL Server before running:

1. (1) Select Title,price from the titles where title_id in (select title_id from sales where qty>30)

The result of this sentence is:

table ' sales '. Scan Count 18, logic read 56 times, physical read 0 times, pre-read 0 times.

table ' titles '. Scan count 1, logic read 2 times, physical read 0 times, pre-read 0 times.

1. (2) Select Title,price from the titles where exists (select * from sales where sales.title_id=titles.title_id and qty>30)

The result of the second sentence is:

table ' sales '. Scan Count 18, logic read 56 times, physical read 0 times, pre-read 0 times.

table ' titles '. Scan count 1, logic read 2 times, physical read 0 times, pre-read 0 times.

from this we can see that the efficiency of execution is the same with exists and in.

7, like execution efficiency with the function charindex () and the preceding wildcard character%

Front, we talked about, if in The like is preceded by the wildcard character%, then it will cause a full table scan, so its execution efficiency is low. However, some data show that the use of function charindex () instead of like speed will have a large increase, after I tried to find that this explanation is also wrong:

1.select Gid,title,fariqi,reader from Tgongwen where CHARINDEX (' Forensic detachment ', reader) >0 and fariqi> ' 2004-5-5 '

spents:7 seconds, plus: Scan count 4, logic read 7,155 times, physical read 0 times, pre-read 0 times.

1.select Gid,title,fariqi,reader from Tgongwen where reader like '% ' + ' forensic detachment ' + '% ' and fariqi> ' ' 2004-5-5 '

spents:7 seconds, plus: Scan count 4, logic read 7,155 times, physical read 0 times, pre-read 0 times.

8, the Union is not absolutely more efficient than or execution

we've talked about using or in the WHERE clause to cause a full table scan, generally, the data I've seen is recommended to use Union instead of or. It turns out that this argument is applicable to most of them.

1.select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 ' or gid>9990000

spents:68 seconds. Scan count 1, logic read 404,008 times, physical read 283 times, pre-read 392,163 times.

1.select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '

Union

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid>9990000

spents:9 seconds. Scan Count 8, logic read 67,489 times, physical read 216 times, pre-read 7,499 times.

It seems that the Union in general is more efficient than using or.

But after the experiment, I found that if the query column on or both sides is the same, then the Union and with or the execution speed is much worse, although here the Union scan is the index, and or scan the full table.

1.select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 ' or fariqi= ' 2004-2-5 '

spents:6423 milliseconds. Scan count 2, logic read 14,726 times, physical read 1 times, pre-read 7,176 times.

1.select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '

Union

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-2-5 '

spents:11640 milliseconds. Scan Count 8, logic read 14,806 times, physical read 108 times, pre-read 1144 times.

9, the field extraction to follow the "how much, how much" principle, avoid "select *"

Let's do an experiment:

1.select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by gid Desc

spents:4673 milliseconds

1.select top 10000 gid,fariqi,title from Tgongwen ORDER by gid Desc

spents:1376 milliseconds

1.select top 10000 Gid,fariqi from Tgongwen ORDER by gid Desc

spents:80 milliseconds

As a result, each time we extract a single field, the data extraction speed will be correspondingly improved. The speed of ascension depends on the size of the field you discard.

10, COUNT (*) is not slower than count (field)

Some of the information says that using * will count all columns, which is obviously less efficient than a world listing. This argument is in fact unfounded. Let's see:

1.select Count (*) from Tgongwen

spents:1500 milliseconds

1.select count (GID) from Tgongwen

spents:1483 milliseconds

1.select count (Fariqi) from Tgongwen

spents:3140 milliseconds

1.select count (title) from Tgongwen

spents:52050 milliseconds

As can be seen from the above, if the speed of count (*) and COUNT (primary key) is equivalent, and count (*) is faster than any other field except the primary key, and the longer the field, the faster the rollup. I think, if you use COUNT (*), SQL Server may automatically find the smallest field to summarize. Of course, if you write the count (primary key) directly, it will come more directly.

11, order by clustered index column to sort the most efficient

Let's see: (GID is the primary key, Fariqi is the Aggregate index column):

1.select top 10000 gid,fariqi,reader,title from Tgongwen

spents:196 milliseconds. Scan count 1, logic read 289 times, physical read 1 times, pre-read 1527 times.

1.select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by GID ASC

spents:4720 milliseconds. Scan count 1, logic read 41,956 times, physical read 0 times, pre-read 1287 times.

1.select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by gid Desc

spents:4736 milliseconds. Scan count 1, logic read 55,350 times, physical read 10 times, pre-read 775 times.

1.select top 10000 gid,fariqi,reader,title from Tgongwen ORDER by Fariqi ASC

spents:173 milliseconds. Scan count 1, logic read 290 times, physical read 0 times, pre-read 0 times.

1.select top 10000 gid,fariqi,reader,title from Tgongwen ORDER BY Fariqi Desc

spents:156 milliseconds. Scan count 1, logic read 289 times, physical read 0 times, pre-read 0 times.

as we can see from the above, the speed of unordered and the number of logical reads are equivalent to the "ORDER by clustered index column", but these are much faster than the "ORDER by nonclustered index column" query speed.

At the same time, in order to sort by a field, whether it is a positive or reverse order, the speed is basically equivalent.

12. Efficient Top

in fact, when querying and extracting very large datasets, the biggest factor affecting database response time is not the data lookup, but the physical i/0 operation. Such as:

1.select Top * FROM (

Select top 10000 gid,fariqi,title from Tgongwen

where neibuyonghu= ' office '

ORDER BY gid Desc) as a

ORDER BY GID ASC

This statement, in theory, the execution time of the whole statement should be longer than the execution time of the clause, but the opposite is true. Because the clause executes after 10,000 records are returned , and the entire statement returns only 10 statements, the most important factor that affects the database response time is physical I/O operations. One of the most effective ways to limit physical I/O operations here is to use the top keyword. The top keyword is a system-optimized word in SQL Server that extracts previous or previous percentage data. Through the application of the author in practice, it is found that top is very useful, and the efficiency is very high. But this word is not in another large database Oracle, which is not a pity, although it can be solved in Oracle with other methods (such as: RowNumber). In a later discussion about the "paging display stored procedure for TENS data", we will use the keyword top.

so far, we've discussed how to quickly query the data you need from a large-capacity database. Of course, we introduce these methods are "soft" method, in practice, we also have to consider a variety of "hard" factors, such as: Network performance, server performance, operating system performance, and even network cards, switches and so on.

SQL Server improves SQL statements with indexes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.