SQL statement optimization and efficiency in SQL Server

Source: Internet
Author: User

Many people don't know how SQL statements are executed in SQL Server, and they worry that the SQL statements they write will be misunderstood by SQL Server. Like what:

Name=> 10000 

and execution:

Name=' Zhangsan ' 

Some people do not know whether the execution efficiency of the above two statements is the same, because if it is simple from the statement, the two statements are indeed different, if the TID is an aggregate index, then the last sentence only from the table of 10,000 records after the row , and the previous sentence to look up from the full table to see a few name= ' Zhangsan ', and then based on the constraints of conditions tid>10000 to put forward the query results.
In fact, such worries are unnecessary. There is a query analysis optimizer in SQL Server that calculates the search criteria in the WHERE clause and determines which index narrows the search space for table scans, which means that it can be automatically optimized.
Although the query optimizer can automate query optimization based on the WHERE clause, it is still necessary to understand how the query optimizer works, if not, and sometimes the query optimizer does not query quickly as you intended.
During the query analysis phase, the query optimizer looks at each stage of the query and decides whether it is useful to limit the amount of data that needs to be scanned. If a stage can be used as a scanning parameter (SARG), then it is called an optimization, and the index can be used to quickly obtain the required data.
Sarg definition: Used to limit the search to an operation, because it usually refers to a specific match, a worthy range of matching or more than two conditions and connection. The form is as follows:

< constants or variables > or < constants or variables > operator column names  

Column names can appear on one side of the operator, while constants or variables appear on the other side of the operator. Such as:

Name=' Zhang San ' price >50005000< price name= andprice >5000     

If an expression does not meet the form of sarg, it cannot limit the scope of the search, which means that SQL Server must determine for each row whether it satisfies all the conditions in the WHERE clause. So an index is useless for an expression that does not satisfy the Sarg form.
After the introduction of Sarg, we will summarize the experience of using SARG and the conclusions of certain materials encountered in practice:

1. Whether a like statement belongs to Sarg depends on the type of wildcard you are using

--such as:--This belongs to Sarg--and:--, does not belong to Sarg. 

The reason is that the wildcard% is opened in the string so that the index is unusable.

2, or will cause a full table scan
Name= ' Zhang San ' and price >5000 symbol SARG, while: Name= ' Zhang San ' or price >5000 does not conform to SARG. Using or causes a full table scan.

3. Non-operator, function-induced statements that do not satisfy the Sarg form
The most typical case of a statement that does not satisfy the Sarg form is a statement that includes non-operators, such as not,! =, <>,!<,!>, not-EXISTS, not-in, not-like, and also functions. Here are a few examples that do not satisfy the Sarg form:

ABS(Price )<5000'% three '--some expressions, such as:where price * * >5000--sql Server will also be considered as SARG, SQL Server translates this type into:WHERE price >2500/2          

However, we do not recommend this, because sometimes SQL Server does not guarantee that this conversion is completely equivalent to the original expression.

4, in the role of equivalent and OR

Statement:

(2, 3)--and where Tidor tid=3     

Is the same, it will cause a full table scan, and if there is an index on the TID, its index will be invalidated.

5. Use as little as possible not

6, exists and in execution efficiency is the same
Much of the data shows that exists is more efficient than in, and should be used instead of not exists as much as possible. But in fact, I experimented with it and found that both the implementation efficiency is the same, both in front and without. Because of the subquery involved, we experimented with the pubs database that comes with SQL Server. We can open the statistics I/O State of SQL Server before running:

(1)

Select title(where qty>30)   

The result of this sentence is:

Table ' Sales '. Scan Count 18, logic read 56 times, physical read 0 times, pre-read 0 times.
Table ' titles '. Scan count 1, logic read 2 times, physical read 0 times, pre-read 0 times.

(2)

Select title from thetitles (fromsales where sales. title_id=titles andqty>30 )

The result of the second sentence is:

Table ' Sales '. Scan Count 18, logic read 56 times, physical read 0 times, pre-read 0 times.
Table ' titles '. Scan count 1, logic read 2 times, physical read 0 times, pre-read 0 times.

From this we can see that the efficiency of execution is the same with exists and in.

7, like execution efficiency with the function charindex () and the preceding wildcard character%
Earlier, we talked about the fact that if you precede the like with a wildcard, it will cause a full table scan, so its execution is inefficient. However, some data show that the use of function charindex () instead of like speed will have a large increase, after I tried to find that this explanation is also wrong:

Select Gid, title, Fariqi fromTgongwen charindex(' forensic detachment ', Reader) andFariqi >' 2004-5-5 '          

Spents: 7 seconds, plus: Scan count 4, logic read 7,155 times, physical read 0 times, pre-read 0 times.

Select Gid, title, Fariqi fromTgongwen and Fariqi>' 2004-5-5 '     

Spents: 7 seconds, plus: Scan count 4, logic read 7,155 times, physical read 0 times, pre-read 0 times.

8, the Union is not absolutely more efficient than or execution
We've talked about using or in the WHERE clause to cause a full table scan, generally, the data I've seen is recommended to use Union instead of or. It turns out that this argument is applicable to most of them.

Select Gid, Fariqi, Neibuyonghu, reader fromTgongwen where Fariqi=or gid>9990000 

Spents: 68 seconds. Scan count 1, logic read 404,008 times, physical read 283 times, pre-read 392,163 times.

Select Gid, Fariqi, Neibuyonghu, readerwhere Fariqi=' 2004-9-16 'UnionSelect GID , Fariqi, Neibuyonghu, readerwhere GID>9990000        

Spents: 9 seconds. Scan Count 8, logic read 67,489 times, physical read 216 times, pre-read 7,499 times.

It seems that the Union in general is more efficient than using or.

But after the experiment, I found that if the query column on or both sides is the same, then the Union and with or the execution speed is much worse, although here the Union scan is the index, and or scan the full table.

Select Gid, Fariqi, Neibuyonghu, reader fromTgongwen where Fariqi=or Fariqi=' 2004-2-5 '        

Spents: 6423 milliseconds. Scan count 2, logic read 14,726 times, physical read 1 times, pre-read 7,176 times.

Select Gid, Fariqi, Neibuyonghu, readerwhere Fariqi=' 2004-9-16 'UnionSelect GID , Fariqi, Neibuyonghu, readerwhere Fariqi=' 2004-2-5 '     

Spents: 11640 milliseconds. Scan Count 8, logic read 14,806 times, physical read 108 times, pre-read 1144 times.

9, the field extraction to follow the "how much, how much" principle, avoid "select *"
Let's do an experiment:

Select top 10000 gid, Fariqi, readerdesc  

Spents: 4673 milliseconds

Select top 10000 gid, Fariqidesc 

Spents: 1376 milliseconds

Select top 10000 giddesc

Spents: 80 milliseconds

As a result, each time we extract a single field, the data extraction speed will be correspondingly improved. The speed of ascension depends on the size of the field you discard.

10, COUNT (*) is not slower than count (field)
Some of the information says that using * will count all columns, which is obviously less efficient than a world listing. This argument is in fact unfounded. Let's see:

Count(* fromtgongwen  

Spents: 1500 milliseconds

Count(GID  

Spents: 1483 milliseconds

Count(Fariqi fromTgongwen 

Spents: 3140 milliseconds

Count(title fromTgongwen 

Spents: 52050 milliseconds

As can be seen from the above, if the speed of count (*) and COUNT (primary key) is equivalent, and count (*) is faster than any other field except the primary key, and the longer the field, the faster the rollup. I think, if you use COUNT (*), SQL Server may automatically find the smallest field to summarize. Of course, if you write the count (primary key) directly, it will come more directly.

11, order by clustered index column to sort the most efficient
Let's see: (GID is the primary key, Fariqi is the Aggregate index column):

Select top 10000 gid, Fariqi, reader fromTgongwen  

Spents: 196 milliseconds. Scan count 1, logic read 289 times, physical read 1 times, pre-read 1527 times.

Select top 10000 gid, Fariqi, readerASC  

Spents: 4720 milliseconds. Scan count 1, logic read 41,956 times, physical read 0 times, pre-read 1287 times.

Select top 10000 gid, Fariqi, readerdesc  

Spents: 4736 milliseconds. Scan count 1, logic read 55,350 times, physical read 10 times, pre-read 775 times.

Select top 10000 gid, Fariqi, readerASC  

Spents: 173 milliseconds. Scan count 1, logic read 290 times, physical read 0 times, pre-read 0 times.

Select top 10000 gid, Fariqi, readerdesc  

Spents: 156 milliseconds. Scan count 1, logic read 289 times, physical read 0 times, pre-read 0 times.

As we can see from the above, the speed of unordered and the number of logical reads are equivalent to the "ORDER by clustered index column", but these are much faster than the "ORDER by nonclustered index column" query speed.
At the same time, in order to sort by a field, whether it is a positive or reverse order, the speed is basically equivalent.

12, high-efficiency TOP
In fact, when querying and extracting very large data sets, the biggest factor that affects database response time is not the data lookup, but the physical i/0 operation. Such as:

(select top 10000 gid, Fariqi fromTgongwenwhere Neibuyonghu=' office 'desc asa ASC        

This statement, in theory, the execution time of the whole statement should be longer than the execution time of the clause, but the opposite is true. Because the clause executes after 10,000 records are returned, and the entire statement returns only 10 statements, the most important factor that affects the database response time is physical I/O operations. One of the most effective ways to limit physical I/O operations here is to use the top keyword. The top keyword is a system-optimized word in SQL Server that extracts previous or previous percentage data. Through the application of the author in practice, it is found that top is very useful, and the efficiency is very high. But this word is not in another large database Oracle, which is not a pity, although it can be solved in Oracle with other methods (such as: RowNumber). In a later discussion about the "paging display stored procedure for TENS data", we will use the keyword top.
So far, we've discussed how to quickly query the data you need from a large-capacity database. Of course, we introduce these methods are "soft" method, in practice, we also have to consider a variety of "hard" factors, such as: Network performance, server performance, operating system performance, and even network cards, switches and so on.

SQL statement optimization and efficiency in SQL Server

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.