SQL Execution Plan parsing (2)-graph execution plan for basic query (I)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To some extent, learning to read a graphic execution plan is similar to learning a new language. The difference is that the language is based on icons, and there are very few words (icons. Each icon represents an operator. In this chapter, "icons" and "operators" can be used interchangeably.

In the previous chapter, we met two operators (select and table scan). In fact, there are 79 operators in total. Fortunately, we don't need to learn all of them to start reading the execution plan, most queries only use a small part. If you encounter the icon we didn't mention, see http://msdn2.microsoft.com/en-us/library/ms175913.aspx

There are four operators of different types in the graphic execution plan:

Logical and physical operators (logical and physical operators): blue icon, indicating query execution or DML Declaration
Parallelism physical operators: it is also a blue icon that represents parallel operations. In a sense, it is a subset of logical and physical operators. It is considered separately because it undertakes analysis of execution plans of different levels.
Cursor operators: A yellow icon that represents the T-SQL cursor operation.
Language elements: a green icon that represents the T-SQL language elements, such as assign, declare, if, select, and while.

This chapter focuses on logical and physical operators, including parallel physical operators.

We need to understand the operator behavior. Some operators, such as sort, hash match, and hash join, need a certain amount of memory for execution. Therefore, if such operators exist in the query, they may need to wait for available memory, it has a negative impact on performance. Most operators run in blocking or non-blocking mode. The non-blocking operator creates output data when it receives input data. The blocking operator must wait until all input data arrives before generating output data.

Single Table query

1. Clustered index Scan)

SELECT*FROM Person.Contact

Below is the actual execution plan

Figure 2-1

We can see that a clustered index scan is executed to obtain data. The clustered index is pk_contact_contactid and 19972 rows of data are obtained.

Indexes in SQL Server are stored in a B-tree. Clustered indexes not only store the key structure as conventional indexes, but also store and sort data. This is also the main reason that a table can only have one clustered index.

Clustered index scanning is basically the same as full table scan. The entire index or most indexes are traversed row by row to determine which data is required for query.

As shown in this example, index scanning usually occurs in scenarios where the optimizer considers that the number of rows to be returned is too large. Instead of using keys in the index, it is better to simply scan all data.

2. Clustered index search (clustered index seek)

Add a where clause to the preceding query.

SELECT*FROM Person.Contact where ContactId =1

We will get the following execution plan

Figure 2-2

Index search is completely different from index scan. Index scan traverses all rows to find the desired data. index search, whether clustered index or non-clustered index, this occurs when the optimizer can locate the index and obtain the required data through the index. Therefore, it must tell the storage engine to find the value through the specified index key. The index search operation is similar to finding the correct number of pages in the book directory to quickly find words. Clustered index search also has other advantages. It is not only cheaper than index scanning, but also does not require additional steps to obtain data because the data is stored in the index. Note that the ordered attribute is true here.

3. Non-clustered index search (non-clustered index seek)

SELECT ContactID
FROM Person.Contact
WHERE EmailAddress LIKE'sab%'

For example, the ix_contact_emailaddress index is used.

Figure 2-3

Note: The icon name for non-clustered index search is incorrect and is written as index seek, but this does not have a big impact.

Like clustered index search, non-clustered index search also uses rows to be returned. What's different is that non-clustered index search uses non-clustered indexes, the Optimizer may find all the required data in a non-clustered index, or you may need to search for data from the clustered index. This extra Io operation will slightly reduce the performance, as described in the following section.

4. Key Lookup)

Let's slightly modify the above query and retrieve multiple columns.

SELECT ContactID,
    LastName,
    Phone
FROM Person.Contact
WHERE EmailAddress LIKE'sab%'

The execution plan is as follows:

PS: My database version is old, and there is no key lookup. Instead, it is a clustered index with the lookup attribute of true. The following figure shows a 2-6 index captured from the book, which looks a little fuzzy.

Figure 2-4

We finally saw the first plan with multiple operations. from right to sit, from top to bottom, the first is the index search operation for ix_contact_emailaddress. this is a non-clustered index with a unique value. It is not a overwriting index for our query ). The so-called non-overwriting means that the index does not contain all the columns required for the query, and data must be obtained from the clustered index. We can see from the output list of index seek that there are emailaddress and contactid columns.

Figure 2-5

Then key lookup uses the key value to find the corresponding row from the clustered index pk_contact_contactid. Its Output list is the lastname and phone columns. For example

Figure 2-6

A key lookup is to use a clustered index to perform bookmark lookup ). The emergence of key lookup indicates that the query performance can be improved by overwriting indexes. If the index is overwriting, the key lookup can be avoided. A nested loop join operation must appear along with key lookup, which is used to combine the results of the two operations.

Figure 2-7

Nested loop connection is a standard type of connection. Its appearance does not mean performance problems. In our example, because key lookup exists, nested loop connections are required to combine the rows of index seek and the rows of key lookup. If no key lookup exists, nested loop connections will not appear.

5. Full table Scan)

As the name suggests, full table scan is to scan the table row by row to obtain the required data.

SELECT*
FROM[dbo].[DatabaseLog]

Plan

Figure 2-8

Full table scan occurs for several reasons, usually because there is no available index, and the optimizer has to retrieve all rows. Another common cause is to return all rows of the table. As shown in this example, scanning all rows is usually faster than searching each row using indexes, no matter whether there is any index. There is also a small number of rows in the table, and the optimizer thinks that scanning all rows is faster than using indexes.

6. Rid Lookup)

If we specify a filter condition for the preceding query in the primary key column, an execution plan using the RID query is obtained.

SELECT*
FROM[dbo].[DatabaseLog]
WHERE DatabaseLogID =1

Execution Plan

Figure 2-9

To return results, the optimizer first performs index search (index seek) on the primary key to identify whether the row meets the where condition, but the index does not contain all the required data. Looking at the tool tips2-10 of index seek above, we will find the bmk1000 in the output list, which tells us that index search is actually part of the bookmarksearch. The optimizer then performs the RID query and uses the row identifier to locate the row to be returned. The RID query is a bookmarkdonefile query that occurs on the heap table (a table without clustered indexes. In other words, because the table has no clustered index, a row identifier linked to the heap index must be used. This increases disk I/O, because two different operations need to be performed to combine them through nested loops.

Figure 2-10

The tool hint searched by the RID is as follows:

Figure 2-11

We saw bmk1000 again. This time it was in seek predicates. This means that the query plan uses a bookmarked query (in our example, the RID query ). In this case, we only need to search for one row. The performance is not a big problem. If the RID query returns many rows, we need to carefully consider how to reduce disk Io, rewrite queries, add clustered indexes, or use overwriting indexes.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL Execution Plan parsing (2)-graph execution plan for basic query (I)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support