SQL SQL statement optimization and efficiency issues in SQL Server

SQL SQL statement optimization and efficiency issues in SQL Server _mssql

Last Update:2017-01-19 Source: Internet

Author: User

Tags constant

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Many people do not know how SQL statements are executed in SQL Server, fearing that their written SQL statements will be misunderstood by SQL Server. Like what:

SELECT * FROM table1 where name= ' Zhangsan ' and TID > 10000

and implementation:

SELECT * FROM table1 where TID > 10000 and name= ' Zhangsan '

Some people do not know the execution efficiency of the above two statements is the same, because if it is simple to read from the statement, the two statements are indeed different, if the TID is an aggregation index, then the following sentence only from the table after 10,000 records to find the line , and the previous sentence to look up from the whole table to see a few name= ' Zhangsan ', and then based on the constraints of the conditions tid>10000 to provide query results.
In fact, such fears are unnecessary. There is a query analysis optimizer in SQL Server that calculates the search criteria in the WHERE clause and determines which index reduces the search space for the table scan, that is, it enables automatic optimization.
Although the query optimizer can automatically query optimization based on the WHERE clause, it is still necessary to understand how the query optimizer works, and if not, sometimes the query optimizer does not follow your intent to make a quick query.
During the query analysis phase, the query optimizer looks at each stage of the query and determines whether it is useful to limit the amount of data that needs to be scanned. If a phase can be used as a scan parameter (SARG), it is called an optimization and can be used to quickly obtain the required data.
Sarg definition: An operation that restricts the search because it usually refers to a specific match, a match that is worth the range, or a connection of more than two conditions. The form is as follows:

Column name operator < constant or variable >

< constant or variable > action column name names can appear on one side of the operator, while constants or variables appear on the other side of the operator. Such as:

Name= ' John '

Price >5000

5000< Price

Name= ' John ' and Price >5000

If an expression does not satisfy the Sarg form, it cannot limit the scope of the search, that is, SQL Server must determine for each row whether it satisfies all the conditions in the WHERE clause. So an index is useless for expressions that do not satisfy the Sarg form.

After introducing the SARG, let's summarize the experience of using SARG and the different conclusions encountered in practice and some of the data:

1. Whether the like statement belongs to Sarg depends on the type of wildcard character used

--such as:
Name like ' Zhang% '--this belongs to Sarg

--and:
The name like '% Zhang '--, does not belong to Sarg.

The reason is that the wildcard% is not available for indexing when the string is opened.

2, or will cause full table scan
name= ' John ' and price >5000 symbol Sarg, and: Name= ' John ' or price >5000 does not conform to SARG. Using or can cause a full table scan.

3, non-operator, function caused by not satisfied with the Sarg form of statements
statements that do not satisfy the Sarg form are typically those that include non-operator statements, such as not,!=, <>,!<,!>, not EXISTS, no in, not, and so on, as well as functions. Here are a few examples that do not satisfy the Sarg form:

ABS (Price) <5000

Name like '% three '

--some expressions, such as:

WHERE Price *2>5000

The--sql server will also assume that Sarg,sql server converts this into:
WHERE Price >2500/2

This is not recommended, however, because sometimes SQL Server does not guarantee that the conversion is completely equivalent to the original expression.

4, in the effect is quite with or

Statement:

Select * FROM table1 where tid in (2,3)

--and

Select * FROM table1 where tid=2 or tid=3

Is the same, it will cause a full table scan, and its index will be invalidated if there is an index on the TID.

5, as little as possible with not

6, exists and in the implementation of the same efficiency is the same
Many of the data show that exists is more efficient than in, and should be used not to exists as much as possible instead of in. But in fact, I experimented with it and found that neither the front band nor not, the execution efficiency of the two is the same. Because of the subquery involved, we experimented with the pubs database with SQL Server. Before running, we can open the statistics I/O State of SQL Server:

(1)
Select Title,price from titles where title_id into (select title_id from sales where qty>30)

The result of the sentence is:

Table ' Sales '. Scan Count 18, logical read 56 times, physics read 0 times, pre-read 0 times.
Table ' titles '. Scan count 1, logical read 2 times, physics read 0 times, pre-read 0 times.

(2)
Select Title,price from titles
where exists (SELECT * FROM Sales
where sales.title_id=titles.title_id and qty>30)

The results of the second sentence are:

Table ' Sales '. Scan Count 18, logical read 56 times, physics read 0 times, pre-read 0 times.
Table ' titles '. Scan count 1, logical read 2 times, physics read 0 times, pre-read 0 times.

From this we can see that the execution efficiency of using exists and in is the same.

7, with the function charindex () and the preceding plus wildcard character% of like execution efficiency
Earlier, we talked about the fact that if you add a wildcard character to the like, it will cause a full table scan, so its execution efficiency is low. But some of the information said that the use of function charindex () to replace like speed will have a big upgrade, after my trial, found that this explanation is also wrong:

Select Gid,title,fariqi,reader from Tgongwen
where CHARINDEX (' Criminal Investigation Detachment ', reader) >0 and fariqi> ' 2004-5-5 '

Spents: 7 Seconds, in addition: Scan count 4, logical read 7,155 times, physics read 0 times, pre-read 0 times.

Select Gid,title,fariqi,reader from Tgongwen
Where reader like '% ' + ' criminal Investigation detachment ' + '% ' and fariqi> ' 2004-5-5 '

Spents: 7 Seconds, in addition: Scan count 4, logical read 7,155 times, physics read 0 times, pre-read 0 times.

8, Union is not absolutely more efficient than the execution of or
we've talked about the use of or in a WHERE clause that causes a full table scan, and, generally, the information I've seen is a recommendation to use Union instead of or. As it turns out, this is true for most of these claims.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen
where fariqi= ' 2004-9-16 ' or gid>9990000

Spents: 68 seconds. Scan count 1, logical read 404,008 times, physics read 283 times, pre-read 392,163 times.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '
Union
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid>9990000

Spents: 9 seconds. Scan Count 8, logical read 67,489 times, physics read 216 times, pre-read 7,499 times.

It seems that using Union is much more efficient than using or in general.

But after the experiment, the author found that if or both of the query column is the same, then the Union and the execution speed with or is much worse, although here is the Union Scan index, and or scan is the whole table.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen
where fariqi= ' 2004-9-16 ' or fariqi= ' 2004-2-5 '

Spents: 6423 Ms. Scan count 2, logical read 14,726 times, physics read 1 times, pre-read 7,176 times.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '
Union
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-2-5 '

Spents: 11640 Ms. Scan Count 8, logical read 14,806 times, Physics read 108 times, pre-read 1144 times.

9, the field extraction according to "How much, how much" principle, avoid "select *"

Let's do a test:

Select top 10000 gid,fariqi,reader,title to Tgongwen ORDER by gid Desc
Spents: 4673 ms
Select top 10000 gid,fariqi,title to Tgongwen ORDER by gid Desc
Spents: 1376 ms
Select top 10000 Gid,fariqi to Tgongwen ORDER by gid Desc
Spents: 80 ms

From this, we have to extract a field each less, the data extraction speed will have a corresponding upgrade. The speed of ascension depends on the size of the field you have discarded.

10, COUNT (*) is not slower than count (field)
Some sources say that it is obviously less efficient than the list of names in the world to use the * to count all the columns. Such a statement is in fact unfounded. We look at:

Select COUNT (*) from Tgongwen
Spents: 1500 ms

Select COUNT (GID) from Tgongwen
Spents: 1483 ms

Select COUNT (Fariqi) from Tgongwen
Spents: 3140 ms

Select COUNT (title) from Tgongwen
Spents: 52050 ms

As you can see from the above, if you use COUNT (*) and the speed of count (primary key) are equivalent, and count (*) is faster than any other field except the primary key, and the longer the field, the slower the rollup. I think that if you use COUNT (*), SQL Server may automatically find the smallest field to summarize. Of course, if you write the count (primary key) directly, it will come more directly.

11. Ordered by clustered indexed columns the most efficient
Let's see: (GID is a primary key, Fariqi is an aggregated indexed column):

Select top 10000 gid,fariqi,reader,title from Tgongwen

Spents: 196 Ms. Scan count 1, logical read 289 times, physics read 1 times, pre-read 1527 times.

Select top 10000 gid,fariqi,reader,title to Tgongwen ORDER by GID ASC

Spents: 4720 Ms. Scan count 1, logical read 41,956 times, physics read 0 times, pre-read 1287 times.

Select top 10000 gid,fariqi,reader,title to Tgongwen ORDER by gid Desc
Spents: 4736 Ms. Scan count 1, logical read 55,350 times, physics read 10 times, pre-read 775 times.

Select top 10000 gid,fariqi,reader,title to Tgongwen ORDER by Fariqi ASC
Spents: 173 Ms. Scan count 1, logical read 290 times, physics read 0 times, pre-read 0 times.

Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER BY Fariqi Desc
Spents: 156 Ms. Scan count 1, logical read 289 times, physics read 0 times, pre-read 0 times.

As we can see from the above, the speed of the unordered and the number of logical reads are equivalent to the speed of the "Order by clustered index column", but these are much faster than the "ORDER by nonclustered index column" query speed.
At the same time, according to a certain field to sort, whether the positive sequence or reverse, the speed is basically equivalent.

12, high efficiency of top
in fact, when querying and extracting super large volume dataset, the biggest factor that affects the database response time is not the data lookup, but the physical i/0 operation. Such as:

Select Top * FROM (
Select top 10000 gid,fariqi,title from Tgongwen
where neibuyonghu= ' office '
ORDER BY gid Desc) as a
ORDER BY GID ASC

This statement, theoretically, the execution time of the entire statement should be longer than the execution time of the clause, but the opposite is true. Because a clause returns 10,000 records after execution, and the entire statement returns only 10 statements, the most important factor affecting database response time is physical I/O operations. One of the most effective ways to limit physical I/O operations here is to use top keywords. The top keyword is a system-optimized word used in SQL Server to extract the first or previous percentages of data. The application of the author in practice, found that top is really good and efficient. But this word is not in another large database Oracle, which cannot be said to be a pity, although it can be solved in Oracle in other ways, such as: RowNumber. We'll use the top keyword in a future discussion of "Implementing TENS data paging display stored procedures."

So far, we've discussed how to quickly query the data methods you need from a large-capacity database. Of course, we introduce these methods are "soft" method, in practice, we also have to consider a variety of "hard" factors, such as: Network performance, server performance, operating system performance, even network adapters, switches and so on.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More