Understand the SQL query plan of SQL Server)

Source: Internet
Author: User

Getting started

Let's take a simple example to help you understand how to read the query plan. You can run the set showplan_text on command, you can also set the same options in the configuration attributes of SQL query analyzer to get the query plan.

Note: The pubs. big_sales table is used in this example. This table is identical to the pubs. Sales table, except for the record with more than 80000 rows, and serves as the main data in the simple explain Plan example.

As shown below, this simplest query will scan the entire clustered index if it exists. Note that the clustered key value is in the physical order and data is stored in this order. Therefore, if the clustered key value exists, you may avoid scanning the entire table. Even if the column you selected is not in the clustered key value, for example, ord_date, the query engine uses an index to scan and return the result set.

Select * From big_salesselect ord_datefrom big_salesstmttext success | -- clusteredindexscan (Object :( [pubs]. [DBO]. [big_sales]. [upkcl_big_sales])

The above query shows that the returned data volume is very different, so the query of small result set (ord_date) runs faster than other queries, because there are a large number of underlying I/O. However, the two query plans are actually the same. You can use other indexes to improve performance. For example, a non-clustered index exists in the title_id column:

Select title_idfrom big_salesstmttext success | -- index scan (Object :( [pubs]. [DBO]. [big_sales]. [ndx_sales_ttlid])

The execution time of the preceding query is very small compared with the select * query, because all results can be obtained from non-clustered indexes. This type of query is called covering query (overwrite query) because all result sets are overwritten by a non-clustered index.

Seek and scan

The first thing is that you need to differentiate the seek and scan operations in the query plan.

Note: A simple but useful rule is that seek operations are efficient, and scan operations are not very efficient even if they are not very poor. The seek operation is direct or at least fast, while the scan operation needs to read the entire object (table, clustered index or non-clustered index ). Therefore, scan usually consumes more resources than seek. If your query plan is only for scan operations, you should consider adjusting your query.

The where clause can significantly differ in query performance, as shown below:

Select * From big_saleswhere stor_id = '2013' stmttext ----------------------------------------------------------------------- | -- clusteredindex seek (Object: ([pubs]. [DBO]. [big_sales]. [upkcl_big_sales]), seek: ([big_sales]. [stor_id] = {@ 1} ordered forward)

The preceding query performs the seek operation on the clustered index instead of the scan operation. This showplan accurately describes the seek operation based on stor_id and the results are sorted in the order stored in the index. Because SQL Server supports the same performance of index forward and backward scrolling, You can see ordered forward or ordered backward in the query plan. This only tells you how to read the table or index. You can even use the ASC and desc keywords in the order by clause to perform these operations. The query plan returned by the range query is similar to the query plan directly queried previously. The following two range queries provide some information:

Select * From big_saleswhere stor_id> = '2013' stmttext success |-clusteredindex seek (Object: ([pubs]. [DBO]. [big_sales]. [upkcl_big_sales]), seek: ([big_sales]. [stor_id]> = '123') Order forward

The preceding query looks like the previous example. Except for the seek predicate, it is a bit different.

Select * From big_saleswhere stor_id between '000000' and '000000' stmttext success |-clusteredindex seek (Object: ([pubs]. [DBO]. [big_sales]. [upkcl_big_sales]), seek :( [big_sales]. [stor_id]> = '20140901' and ([big_sales]. [stor_id] <= '000000') Order forward)

This looks the same. Only the search predicates have changed. Because the search is very fast, this query is quite good.

Seek and scan can also contain where predicates. In this case, This predicate tells you which records the WHERE clause filters out from the result set. Because it is executed as a component of seek or scan, the where clause usually does not damage or improve the performance of this operation. The where clause will help the query optimizer find indexes that may have the best performance.

An important part of query optimization is to determine whether to perform the seek operation on an index. If so, an index with the best performance is found. In most cases, the query engine is able to find existing indexes. However, there are currently three common indexing problems:

◆ Database designers, usually application developers, do not create any indexes in the table.
◆ Database designers generally cannot guess the common query or transaction types, so the index or primary key built on the table is often inefficient.
◆ When an index table is created, even if the Database Designer guesses more accurately, the transaction load will change over time, making these indexes less efficient.

If you see a large number of scans instead of seek in your query plan, you should evaluate your index again. For example, look at the following query:

Select ord_numfrom saleswhere ord_date is not Nuland ord_date> 'Jan 01,2002 12:00:00 am 'stemttext success | -- clustered index scan (Object: ([pubs]. [DBO]. [sales]. [upkcl_sales]), where: ([sales]. [ord_date]> 'Jan 12:00:00 am '))

Now this query performs the seek index operation on the newly created sales_ord_date index.

Describes the branch steps by comparing connections and subqueries

A correct old rule is that, when the result set is the same, the connection has better performance than the subquery.

Select au_fname, au_lnamefrom authorswhere au_id in (select au_id from titleauther) stmttext limit | --- nested loops (inner join, outer references :( [titleauthor]. [au_id]) | -- stream aggregate (group by :( [titleauthor]. [au_id]) | -- index scan (Object :( [pubs]. [DBO]. [titleauthor]. [auidind]), ordered forward) | -- clusteredindex seek (Object :( [pubs]. [DBO]. [authors]. [upkcl_auidind]), seek :( [authors]. [au_id] = [titleauthor]. [au_id]) ordered forward) Table 'authors '. scan Count 38, logical reads 76, physical reads 0, read-ahead reads 0. table 'titleauthor '. scan count 2, logical reads 2, physical reads 1, read-ahead reads 0.

In this case, the query engine selects a nested loop operation. This query is forced to use clustered indexes to read the entire authors table and execute a large number of logical page reads during processing.

Note: In the query with a branch step, the contraction shows you the branches whose steps are other steps.

Select distinct au_fname, au_lnamefrom authors as ajoin titleauthor as t on. au_id = T. au_idstmttext ------------------------------------------------------------------------------- | -- stream aggregate (group by: ([a]. [au_lname]. [A]. [au_fname]) |-nested loops (inner join, outer references: ([a]. [au_id]) |-index scan (Object :( [pubs]. [DBO]. [authors]. [authord] as [a]), ordered forward) |-index seek (Object: [pubs]. [DBO]. [titleauthor]. [authord] as [T]), seek: ([T]. [au_id] = [A]. [au_id]) Order forward) Table 'titleauthor '. scan Count 23, logical reads 23, physical reads 0, read ahead reads 0. table 'authors '. scan count 1, logical reads 1, physical reads 0, read-ahead read 0.

In the preceding query, the number of logical reads in the titleauthors table is increased, while that in the authors table is decreased. Note that stream aggregation has a higher position in the query plan, that is, it occurs later.

Comparison query plan

You can use the query plan to compare the relative Effects of Two Different queries. For example, you may want to know whether a query adds additional overhead than another query or chooses different index policies.

In this example, we compare two queries. The first uses substring and the second uses like.

Select * From authorswhere substring (au_lname, 1, 2) = 'wh 'stmttext -------------------------------------------------------- | -- clusteredindex scan (Object: ([pubs]. [DBO]. [authors]. [upkcl_auidind]) where: (substring ([authors]. [au_lname], 1, 2) = 'wh '))

Compared with similar queries using like:

Select * From authorswhere au_lname like 'wh % 'stmttext success |-bookmark Lookup (Bookmark: ([bmk1000]), object: ([pubs]. [DBO]. [authors]) |-index seek (Object :( [pubs]. [DBO]. [authors]. [sunmind]), seek :( [authors]. [au_lname]> = 'wgp 'and [authors]. [au_lname] <'wi '), where :( like ([authors]. [au_lname], 'wh % ', null) ordered forward)

Obviously, the second query uses the index seek operation. The first query uses the clustered index scan operation, and the second query plan is simpler than the first one.

When we compare the best query plan, we prefer to use the graphic execution plan feature of set statistics profile on or SQL query analyzer instead of set showplan_text on. These tools will clearly show you the processing consumption of each query plan step in a percent way. This tells you more or less query consumption for each choice. You can run two or more queries at the same time to see which one is the best.

To obtain the most comprehensive performance measurement information, it is also important to use set statistics Io and set statistics time.

Understanding the impact of Connection

The different query steps above demonstrate how SQL Server 2000 uses a large number of operations to parse the join statement. Each join strategy has its strengths and weaknesses. However, in some rare cases, the query engine selects less efficient strategies, such as the hash (hash) or merge (merge) policies, A simple nested loop is enough to provide good performance.

SQL Server uses three join policies, which are listed from simple to complex:

Nested loop

Nested loops are the best strategy for small data tables that use simple inner join. It is best suited when the number of records in two tables is very different and there is an index on the connected columns. Nested loop connections require the least I/O and comparison.

Nested loops loop one record at a time in the External table (usually a table with a small amount of data), and then find and output matched records in the internal table. There are many names about nested loop policies. For example, querying the entire table or index is called a nested loop connection of naive (ignorant. When normal or temporary indexes are used, they are called index nested loop connections or temporary index nested loop connections.

Merge

Merge is the best strategy for tables with large data volumes and similar data volumes that use sort join columns. The merge operation first sorts all data and then cyclically generates output. Good join performance is based on index creation on the corresponding column, which is usually used in the join predicate equation.

The merge connection utilizes the advantages of pre-sorting. It obtains row data from each output and performs direct comparison. For example, an inner join returns a record with equal join predicates. If not, records with lower values will be discarded and compared with the next record. This process continues until all records are checked. Sometimes the join operation is used to compare tables with many-to-many relationships. In this case, SQL server uses a temporary table to store these rows.

If a where clause exists in the query using the merge join clause, the merge join predicate is calculated first. Then, each record of the combined join predicates is calculated again by other predicates in the where statement. Microsoft calls it residual predicate (residual predicate ).

Hash (hash)

Hash is the best strategy for tables with large data volumes and different capacities, as well as tables with no complex connection requirements for sorting or indexing of connected columns. The hash method is used for union, intersect, inner, left, right, and outer join operations, as well as set matching and difference operations. Hash is also used for connecting tables without useful indexes. The hash operation creates a temporary hash table, loops through all the data, and generates output.

Hash uses a build (constructed) input (usually a table with a small amount of data) and probe input. This hash key (that is, the column in the join predicate or in the group by list) is queried to process the connection. The remaining predicates are all other operations that are not used in the WHERE clause to connect itself. The remaining predicates are calculated after the join predicates. When constructing a hash connection, SQL Server can select different options in the following order of priority:

In-memory Hash (in-memory hash): The in-memory hash connection first scans the entire build input to the memory, and then creates a temporary hash table in the memory. Calculate the hash value and insert each record into the hash. Scan the test input one by one. Each test input is compared with the corresponding hash. If matched, it is returned in the result set.

Hybrid Hash (hybrid hash): If the hash is only slightly larger than the available memory, SQL Server may merge some aspects of the in-memory hash connection and Grace hash connection, which are called hybrid hash connections.

Grace hash: The Grace hash option is used when hash join is too large to be processed in memory. In that case, the entire build input and probe input will be read. They are then divided into multiple temporary worksheets. This step is called partition fan-out. The hash function on the hash key value ensures that all connection records are in the same partition worksheet. Partition fanout splits two time-consuming steps into many small steps, which can be processed concurrently. Then the hash connection will be used for each pair of worksheets, and all matching results will be returned in the result set.

Recursive Hash (recursive hash): Sometimes the partition fan-out table generated by Grace hash is too large to be further partitioned. This is called recursive hash.

Note that the hash and merge connections process each table once. If you use set statistics Io on to measure this type of query, you will see the illusion of low I/O. However, the low I/O does not mean that these connection policies must be faster than nested loop connections, because it requires a huge amount of computing.

Note that hash join requires a large amount of computing. If you find that some queries in production always use Hash connections, you should be prompted to optimize your queries or add indexes to the underlying table.

In the following example, we will show the standard nested loop (using the default query plan) and hash and merge connections (force use tips ).

Select. au_fname,. au_lname, T. titlefrom authors as ainner join titleauthor Taon. au_id = TA. au_idinner join titles ton T. title_id = TA. title_idorder by au_lname ASC, au_fname ascstmttext limit | -- nestedloop (inner join, outer references :( [Ta], [title_id]) | -- nested loops (inner join, outer references :( [a], [au_id]) | -- indexscan (Object :( [pubs]. [DBO]. [authors]. [aunmind] as [a], ordered forwad) | -- index seek (Object :( [pubs]. [DBO]. [titleauthor]. [auidind] as [Ta], seek: ([Ta]. [au_id] = [A]. [au_id]) ordered forwad) | -- clustered index seek (Object :( [pubs]. [DBO]. [titleas]. [upkcl_titleidind] as [T], seek: ([T]. [title_id] = [Ta]. [title_id]) ordered forwad)

The preceding plan shows the standard query plan generated through SQL Server. We can force SQL Server to use the prompt to show us how it handles merge and hash connections.

Select. au_fname,. au_lname, T. titlefrom authors as ainner merge join titleauthor Taon. au_id = TA. au_idinner hash join titles ton T. title_id = TA. title_idorder by au_lname ASC, au_fname ASC warning: the join order has been enforced because a local join hint is used. stmttext success | -- Sort (order by :( [A]. [au_lname] ASC, [a]. [au_fname] ASC) | -- hash match (inner join, hash :( [Ta]. [title_id]) = ([T]. [title_id]), residual :( [Ta]. [title_id] = [T]. [title_id]) | -- merge join (inner join, merge :( [a] [au_id] = [Ta]. [au_id]), residual :( [Ta]. [au_id] = [A]. [au_id]) | -- clustered index scan (Object :( [pubs]. [DBO]. [authors]. [upkcl_auidind] as [a], ordered forwad) | -- index scan object :( [pubs]. [DBO]. [titleauthor]. [auidind] as [Ta], ordered forwad) | -- index scan (Object :( [pubs]. [DBO]. [titles]. [titleind] as [T])

In this example, you can clearly see that each join uses the predicates of other joins as the remaining predicates. (You will also notice that SQL Server has issued a warning when prompted ). This query also forces the use of the sort operation to support hash and merge connections.

 

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/zhou__zhou/archive/2007/06/26/1666500.aspx

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.