SQL Server SQL statement optimization skills

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Subquery Optimization

A good and commendable rule is to try to replace all subqueries with connections. The optimizer can sometimes "flat" subqueries automatically and replace them with regular or external connections. But that is not always valid. Explicit connections provide more options for selecting the sequence of the table and finding the most likely plan. When you optimize a special query, It is very different to know whether to remove the self-query.

Example

The following query selects the names of all tables in the pubs database and the clustered indexes of each table (if any ). If no clustered index exists, the table name is still displayed in the list, and the clustered index column is displayed as a dotted line. The two queries return the same result set, but the first uses a subquery, and the second uses an external connection. Compare query plans generated by Microsoft SQL Server
SUBQUERY SOLUTION

----------------------

SELECT st. stor_name AS 'store ',

(Select sum (bs. qty)

FROM big_sales AS bs

WHERE bs. stor_id = st. stor_id), 0)

AS 'books Sold'

FROM stores AS st

WHERE st. stor_id IN

(Select distinct stor_id

FROM big_sales)
JOIN SOLUTION

----------------------

SELECT st. stor_name AS 'store ',

SUM (bs. qty) AS 'books Sold'

FROM stores AS st

JOIN big_sales AS bs

ON bs. stor_id = st. stor_id

WHERE st. stor_id IN

(Select distinct stor_id

FROM big_sales)

Group by st. stor_name

SUBQUERY SOLUTION

----------------------

SQL Server parse and compile time:

CPU time = 28 MS

Elapsed time = 28 MS

SQL Server Execution Times:

CPU time = 145 MS

Elapsed time = 145 MS

Table 'Big _ sales'. Scan count 14, logical reads

1884, physical reads 0, read-ahead reads 0.

Table 'stores'. Scan count 12, logical reads 24,
Physical reads 0, read-ahead reads 0.

JOIN SOLUTION

----------------------

SQL Server parse and compile time:

CPUs time = 50 MS

Elapsed time = 54 MS

SQL Server Execution Times:

CPU time = 109 MS

Elapsed time = 109 MS

Table 'Big _ sales'. Scan count 14, logical reads

966, physical reads 0, read-ahead reads 0.

Table 'stores'. Scan count 12, logical reads 24,
Physical reads 0, read-ahead reads 0.

Without further exploration, we can see that the connection is faster in terms of CPU and total real-time consumption, and only half of the logic read of the subquery solution is required. In addition, these two cases are accompanied BY the same result set. Although the sorting ORDER is different, this is because the join query (because of its group by clause) has an implicit order:

Store Books Sold

-------------------------------------------------

Barnum's 154125

Bookbeat 518080

Doc-U-Mat: Quality Laundry and Books 581130

Eric the reading Books 76931

Fricative Bookshop 259060

Message & Brews 161090

(6 row (s) affected)

Store Books Sold

-------------------------------------------------

Eric the reading Books 76931

Barnum's 154125

Message & Brews 161090

Doc-U-Mat: Quality Laundry and Books 581130

Fricative Bookshop 259060

Bookbeat 518080

(6 row (s) affected)

View the query plan displayed by this subquery method:

| -- Compute Scalar (DEFINE :( [Expr1006] = isnull ([Expr1004], 0 )))

| -- Nested Loops (Left Outer Join, outer references :( [st]. [stor_id])

| -- Nested Loops (Inner Join, outer references :( [big_sales]. [stor_id])

| -- Stream Aggregate (group by :( [big_sales]. [stor_id])

| -- Clustered Index Scan (OBJECT :( [pubs]. [dbo]. [big_sales].

[UPKCL_big_sales]), ordered forward)

| -- Clustered Index Seek (OBJECT :( [pubs]. [dbo]. [stores]. [UPK_storeid]

AS [st]),

SEEK :( [st]. [stor_id] = [big_sales]. [stor_id]) ordered forward)

| -- Stream Aggregate (DEFINE :( [Expr1004] = SUM ([bs]. [qty])

| -- Clustered Index Seek (OBJECT :( [pubs]. [dbo]. [big_sales].

[UPKCL_big_sales] AS [bs]),

SEEK :( [bs]. [stor_id] = [st]. [stor_id]) ordered forward)

On the other hand, we can obtain the following for the sum query operation:

| -- Stream Aggregate (group by :( [st]. [stor_name])

DEFINE :( [Expr1004] = SUM ([partialagg1005])

| -- Sort (order by :( [st]. [stor_name] ASC ))

| -- Nested Loops (Left Semi Join, outer references :( [st]. [stor_id])

| -- Nested Loops (Inner Join, outer references :( [bs]. [stor_id])

| -- Stream Aggregate (group by :( [bs]. [stor_id])

DEFINE :( [partialagg1005] = SUM ([bs]. [qty])

| -- Clustered Index Scan (OBJECT :( [pubs]. [dbo]. [big_sales].

[UPKCL_big_sales] AS [bs]), ordered forward)

| -- Clustered Index Seek (OBJECT :( [pubs]. [dbo]. [stores].

[UPK_storeid] AS [st]),

SEEK :( [st]. [stor_id] = [bs]. [stor_id]) ordered forward)

| -- Clustered Index Seek (OBJECT :( [pubs]. [dbo]. [big_sales].

[UPKCL_big_sales]),

SEEK :( [big_sales]. [stor_id] = [st]. [stor_id]) ordered forward)

Using connections is a more effective solution. It does not require additional stream aggregate (stream aggregate), that is, the sum of values required for the subquery in the big_sales.qty column.

UNION vs UNION ALL

Whenever possible, use union all instead of UNION. The difference is that UNION does not exclude duplicate rows and sorts the results. You need to create a temporary worksheet to select the result of no repeated rows. Use it to sort all rows and sort them before output. (In a select distinct query, a query plan is displayed, and a stream aggregation is found, consuming more than 30% of the resources to process the query ). If you know exactly what you need, you can use UNION. But if you estimate that there are no duplicate rows in the result set, use union all. It is only selected from a table or a join, and then selected from another table, appended to the bottom of the first result set. Union all does not require worksheets and sorting (unless caused by other conditions ). In most cases, union all is more efficient. A potentially dangerous problem is that the use of UNION will generate a huge flood of temporary worksheets in the database. If you want to obtain a large number of result sets from the UNION query, this may happen.

Example

The following query is the ID of all the stores in the table sales in the pubs database, and the ID of all the stores in the table big_sales. More than 70,000 rows of data are added to this table. The difference between the two solutions is only the comparison of the use of UNION and union all. However, adding the ALL keyword in this plan produces three major differences. In the first scheme, stream aggregation and sorting are required before the result set is returned to the client. The second query is more efficient, especially for large tables. In this example, Two Queries return the same result set, although the order is different. There are two temporary tables in our test. Your results may be slightly different.

UNION SOLUTION
-----------------------
UNION ALL SOLUTION
-----------------------

SELECT stor_id FROM big_sales

UNION

SELECT stor_id FROM sales

----------------------------
SELECT stor_id FROM big_sales

UNION ALL

SELECT stor_id FROM sales

----------------------------

| -- Merge Join (Union)

| -- Stream Aggregate (group:

([Big_sales]. [stor_id])

| -- Clustered Index Scan

(OBJECT :( [pubs]. [dbo].

[Big_sales].

[UPKCL_big_sales]),

Ordered forward)

| -- Stream Aggregate (group:

([Sales]. [stor_id])

| -- Clustered Index Scan

(OBJECT :( [pubs]. [dbo].

[Sales]. [UPKCL_sales]),

Ordered forward)
| -- Concatenation

| -- Index Scan

(OBJECT :( [pubs]. [dbo].

[Big_sales]. [ndx_sales_ttlID])

| -- Index Scan

(OBJECT :( [pubs]. [dbo].

[Sales]. [titleidind])

UNION SOLUTION

-----------------------

Table 'sales'. Scan count 1, logical

Reads 2, physical reads 0,

Read-ahead reads 0.

Table 'Big _ sales'. Scan count 1,

Logical

Reads 463, physical reads 0,

Read-ahead reads 0.
UNION ALL SOLUTION

-----------------------

Table 'sales'. Scan count 1, logical

Reads 1, physical reads 0,

Read-ahead reads 0.

Table 'Big _ sales'. Scan count 1,

Logical

Reads 224, physical reads 0,

Read-ahead reads 0.

Although the result set in this example is interchangeable, you can see that the union all statement consumes half of the resources less than the UNION statement. Therefore, use the union all clause if your result set is expected and it is determined that there are no duplicates.

Function and expression constraints Index

When you use built-in functions or expressions in the index column, the optimizer cannot use the indexes of these columns. Rewrite these conditions as much as possible. Do not include index columns in expressions.

Example

You should help SQL Server remove any expressions around index value columns. The following query is a row selected from the table jobs using the unique key value of the unique clustered index. If you use an expression in this column, this index does not work. However, once you set the condition 'job _ id-2 = 0' to 'job _ id = 2', the optimizer performs the seek operation on the clustered index.

QUERY WITH SUPPRESSED INDEX
-----------------------
OPTIMIZED QUERY USING INDEX
-----------------------

SELECT *

FROM jobs

WHERE (job_id-2) = 0
SELECT *

FROM jobs

WHERE job_id = 2

| -- Clustered Index Scan (OBJECT:

([Pubs]. [dbo]. [jobs].

[PK _ jobs _ 117F9D94]),

WHERE :( Convert ([jobs]. [job_id])-

2 = 0 ))
| -- Clustered Index Seek (OBJECT:

([Pubs]. [dbo]. [jobs].

[PK _ jobs _ 117F9D94]),

SEEK :( [jobs]. [job_id] = Convert ([@ 1])

Ordered forward)

Note that a SEEK is much better than a SCAN,

As in the previous query.

The following table lists various types of query examples, which are forbidden to use column indexes. At the same time, a rewrite method is provided to achieve better performance.

QUERY WITH SUPPRESSED INDEX
---------------------------------------
OPTIMIZED QUERY USING INDEX
--------------------------------------

DECLARE @ job_id VARCHAR (5)

SELECT @ job_id = '2'

SELECT *

FROM jobs

Where convert (VARCHAR (5 ),

Job_id) = @ job_id

-------------------------------
DECLARE @ job_id VARCHAR (5)

SELECT @ job_id = '2'

SELECT *

FROM jobs

WHERE job_id = CONVERT (

SMALLINT, @ job_id)

-------------------------------

SELECT *

FROM authors

WHERE au_fname + ''+ au_lname

= 'Johnson white'

-------------------------------
SELECT *

FROM authors

WHERE au_fname = 'johnson'

AND au_lname = 'white'

-------------------------------

SELECT *

FROM authors

Where substring (au_lname, 1, 2) = 'wh'

-------------------------------
SELECT *

FROM authors

WHERE au_lname LIKE 'wh %'

-------------------------------

Create index employee_hire_date

ON employee (hire_date)

-- Get all employees hired

-- In the 1st quarter of 1990:

SELECT *

FROM employee

Where datepart (year, hire_date) = 1990

And datepart (quarter, hire_date) = 1

-------------------------------
Create index employee_hire_date

ON employee (hire_date)

-- Get all employees hired

-- In the 1st quarter of 1990:

SELECT *

FROM employee

WHERE hire_date> = '2017/123'

AND hire_date <'2014/1/123'

-------------------------------

-- Suppose that hire_date may

-- Contain time other than 12 AM

-- Who was hired on 2/21/1990?

SELECT *

FROM employee

Where convert (CHAR (10 ),

Hire_date, 101) = '2017/123'
-- Suppose that hire_date may

-- Contain time other than 12 AM

-- Who was hired on 2/21/1990?

SELECT *

FROM employee

WHERE hire_date> = '2017/123'

AND hire_date <'2014/1/123'

SET NOCOUNT ON

The use of set nocount on to speed up T-SQL code surprised SQL Server developers and database system administrators. You may have noticed that a successful query returns system information about the affected number of rows. In many cases, you do not need this information. This set nocount on command allows you to disable all subqueries in your session transaction until you issue set nocount off.
This option is not only about the decorative effect of the output. It reduces the amount of information transmitted from the server to the client. Therefore, it helps reduce network traffic and increase the overall response time of your transactions. The time for passing a single message can be ignored, but in this case, a script executes some queries in a loop and sends thousands of bytes of useless information to the user.

For example, a file contains a T-SQL batch that inserts 9999 rows in the big_sales table.

-- Assumes the existence of a table called BIG_SALES, a copy of pubs .. sales

SET NOCOUNT ON

DECLARE @ separator VARCHAR (25 ),

@ Message VARCHAR (25 ),

@ Counter INT,

@ Ord_nbr VARCHAR (20 ),

@ Order_date DATETIME,

@ Store_nbr INT,

@ Qty_sold INT,

@ Terms VARCHAR (12 ),

@ Title CHAR (6 ),

@ Starttime DATETIME

SET @ STARTTIME = GETDATE ()

SELECT @ counter = 0,

@ Separator = REPLICATE ('-', 25)

WHILE @ counter< 9999

BEGIN

SET @ counter = @ counter + 1

SET @ ord_nbr = 'y' + CAST (@ counter as varchar (5 ))

SET @ order_date = DATEADD (hour, (@ counter * 8), 'Jan 01 1999 ')

SET @ store_nbr =

Case when @ counter <999 THEN '000000'

WHEN @ counter BETWEEN 1000 AND 2999 THEN '123'

WHEN @ counter BETWEEN 3000 AND 3999 THEN '123'

WHEN @ counter BETWEEN 4000 AND 6999 THEN '123'

WHEN @ counter BETWEEN 7000 AND 7999 THEN '123'

WHEN @ counter BETWEEN 8000 AND 9999 THEN '123'

ELSE '000000'

END

SET @ qty_sold =

Case when @ counter BETWEEN 0 AND 2999 THEN 11

WHEN @ counter BETWEEN 3000 AND 5999 THEN 23

ELSE 37

END

SET @ terms =

Case when @ counter BETWEEN 0 AND 2999 THEN 'net 30'

WHEN @ counter BETWEEN 3000 AND 5999 THEN 'net 60'

ELSE 'on invoice'

END

-- SET @ title = (SELECT title_id FROM big_sales WHERE qty = (select max (qty)

FROM big_sales ))

SET @ title =

Case when @ counter <999 THEN 'mc222222'

WHEN @ counter BETWEEN 1000 AND 1999 THEN 'mc222222'

WHEN @ counter BETWEEN 2000 AND 3999 THEN 'mc3026'

WHEN @ counter BETWEEN 4000 AND 5999 THEN 'ps2106'

WHEN @ counter BETWEEN 6000 AND 6999 THEN 'ps777777'

WHEN @ counter BETWEEN 7000 AND 7999 THEN 'tc3218'

ELSE 'ps1372'

END

-- PRINT @ separator

-- SELECT @ message = STR (@ counter, 10) -- + STR (SQRT (CONVERT (FLOAT,

@ Counter), 10, 4)

-- PRINT @ message

BEGIN TRAN

Insert into [pubs]. [dbo]. [big_sales] ([stor_id], [ord_num], [ord_date],

[Qty], [payterms], [title_id])

VALUES (@ store_nbr, CAST (@ ord_nbr as char (5), @ order_date, @ qty_sold,

@ Terms, @ title)

COMMIT TRAN

END

SET @ message = CAST (DATEDIFF (MS, @ starttime, GETDATE () as varchar (20 ))

PRINT @ message

TRUNCATE table big_sales

Insert into big_sales

SELECT * FROM sales

SELECT title_id, sum (qty)

FROM big_sales

Group by title_id

Order by sum (qty)

SELECT * FROM big_sales

When the set nocount off command is executed, the actual consumption time is 5176 milliseconds. When the set nocount on command is executed, the actual consumption time is 1620 milliseconds. If you do not need to output the number of rows, add the set nocount on command at the beginning of each stored procedure and script.

TOP and SET ROWCOUNT

The TOP clause in the SELECT statement limits the number of rows returned by a single query, while set rowcount limits the number of rows affected by subsequent queries. These commands provide high efficiency in many programming tasks.
Set rowcount to SET the maximum number of rows that can be affected in SELECT, INSERT, update or delete statements. These settings take effect immediately when the command is executed and only affect the current session. To remove this restriction, run set rowcount 0.
In some actual tasks, TOP or set rowcount is more efficient for programming than standard SQL commands. Let's prove in several examples:

TOP n

One of the most popular queries in almost all databases is to request the first N items in a list. In the pubs database case, we can find the first five items that sell the best CD. Compare TOP, set rowcount and ansi SQL.

Pure ansi SQL:

Select title, ytd_sales

From titles

Where (select count (*)

From titles B

Where B. ytd_sales> a. ytd_sales

) <5

Order by ytd_sales DESC

This pure ansi SQL scheme executes an association subquery with low efficiency. In this example, no index is supported on ytd_sales. In addition, this pure standard SQL command does not filter the null values that fall into ytd_sales, and there is no difference between multiple cdns.

Use set rowcount:

Set rowcount 5

SELECT title, ytd_sales

FROM titles

Order by ytd_sales DESC

Set rowcount 0

Use TOP n:

Select top 5 title, ytd_sales

FROM titles

Order by ytd_sales DESC

The second solution uses set rowcount to stop the SELECT query. The third solution uses TOP n to stop the first five rows. In this case, we also need an order by clause to forcibly sort the entire table before obtaining the results. The two query plans are actually the same. However, the key point of TOP over set rowcount is that SET must process the worksheet required BY the order by clause, rather than TOP.

In a large table, we can create an index on ytd_sales to avoid sorting. The query will use this index to locate the first five rows and stop. Compared with the first scheme, It scans the entire table and executes an associated subquery for each row. In small tables, the performance difference is very small. However, in a large table, the processing time of the first solution may be several hours, and the last two methods are several seconds.

When determining whether a query is required, consider whether only a few rows are required. If yes, using the TOP clause will save a lot of time.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL Server SQL statement optimization skills

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL Server SQL statement optimization skills

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support