SQL Server execution engine entry Diagram

Source: Internet
Author: User

This article aims to classify information about each operation in the execution plan.

Data Access Operations

The first basic operation is to access data. This can be done either by directly accessing the table or by accessing the index. The data in a table is organized into Heap and B-tree. Data is organized through Heap when no clustered index is created in the table. This is unordered, after a clustered index is created in a table, the data of non-clustered indexes are organized in the B-tree mode. In this way, data is stored in an orderly manner. Generally, non-clustered indexes only contain some columns of the entire table. For filtering indexes, they only contain some rows.

Apart from the different data organization methods, data access is also divided into two methods: Scan and search, scanning is to Scan all the data in the entire structure, searching is only to find part of the data in the entire structure. Therefore, we can see that the heap is unordered, so it is impossible to perform the search (Seek) operation on the heap. The order relative to the B-tree makes it possible to perform the search in B-tree. A heap scan is performed to access data in a heap-organized table, as shown in figure 1.

Figure 1. Table Scan

It can be seen that the table scan icon clearly shows the nature of the table scan. It is scanned from start to end in an unordered organization table.

Clustering indexes and non-clustered indexes in the B-tree structure can also be scanned. Generally, clustered index scanning is performed to obtain all the data in the index table or to obtain the index row tree as the majority of the data so that the scanning cost is less than the search cost. 2.

Figure 2. Clustered index Scan

The clustered index scan icon also clearly shows the nature of clustered index scanning. After finding the leftmost leaf node, scan all leaf nodes in sequence to scan the entire structure. Of course, the same concept is true for non-clustered indexes, as shown in 3.

Figure 3. Non-clustered index Scan

For only part of data in the B-tree structure, index search (Seek) makes the B-tree meaningful. Based on the searched key value, you can go down a single path from the root of B to avoid unnecessary page scanning. Figure 4 shows an index search in the query plan.

Figure 4. Clustered index search

The index search icon is also fascinating. You can see that the root line of the icon goes down from the root node to the leaf node. That is, to find the page where the requested data is located, it is not difficult to see that if we need to find multiple data entries and are scattered on different pages, this search operation needs to be repeated and executed many times, when the number of queries reaches a certain level, SQL Server will select an index scan that consumes a relatively low level rather than repeat the index search. For non-clustered index search, the concept is the same, so it is no longer a piece.

Bookmark Lookup)

You may think that if a non-clustered index can quickly find the data you want, but unfortunately, a non-clustered index does not contain all the columns you want, what should you do? At this time, SQL Server will face two options: directly access the basic table to obtain data or find data in the non-clustered index, and then go to the basic table to obtain the columns not covered by the non-clustered index. This option depends on the estimated number of rows and other statistical information. The query analyzer selects the one that consumes less.

Search for a simple bookmarks 5.

Figure 5. A simple bookmarks search

From Figure 5, we can see that we first find the expected row through a non-clustered index, but this index does not contain all columns. Therefore, we need to find these columns in the basic table, therefore, you need to perform a Key search. If the basic table is organized by a heap, the Key Lookup will be changed to the RID Lookup ), key lookup and RID lookup are collectively referred to as bookmarks lookup.

However, sometimes the number of rows returned by the index search results in a much better performance than the direct scanning operation. Therefore, the SQL Server then chooses to scan instead of bookmarks. 6.

Figure 6. The StateProvinceID column has a non-clustered index, but because there are too many rows returned, the analyzer will select scan instead of bookmarked search.

This estimation is based on the statistical information. For the statistical information, see my previous blog post: Talking about the impact of statistics on queries in SQL Server.

Aggregation)

Aggregate functions may cause aggregation operations. An aggregate function is a process of aggregating data from a set into one data according to certain rules or grouping into multiple data according to rules. Some Aggregate functions such as avg, sum, min, and distinct keywords may lead to two types of Aggregation operations: Stream Aggregation and Hash Aggregation ).

Stream Aggregation)

Before stream aggregation needs to execute the aggregate function, the aggregated data set is ordered. This ordered data can be performed either through Sort in the execution plan, you can also directly obtain ordered data from clustered or non-clustered indexes. In addition, aggregation operations without Group by are converted into scalar aggregation. Such operations must be executed for stream aggregation.

For example, we perform scalar aggregation directly, as shown in 7.

Figure 7. Stream Aggregation

However, for clauses with the Group by clause, because data needs to be sorted according to the column after the group by clause, Sort is required to ensure sorting. Note that the Sort operation is a memory-consuming operation. When the memory is insufficient, it will also occupy tempdb. SQL Server Always selects the lowest cost among Sort operations and hash matching. Operation 8 that requires Sort is shown in.

Figure 8. Stream aggregation to be sorted

In figure 8, the sorting operations are sorted by ProductLine, and then aggregated by group.

Hash aggregation)

The above stream aggregation is suitable for a relatively small amount of data, but for a relatively large table. The cost of using a hash set is lower than that of sorting. A hash set is aggregated by creating a hash in the memory, so you do not need to sort the data set. The hash list created in the memory uses the column following Group by as the key value, as shown in 9.


Figure 9. Hash Aggregation

After a hash is created in the memory, the hash is processed as the key based on the values following group by. When the hash key does not exist, add an entry to the hash table. When the key already exists in the hash table, calculate the Value in the hash table according to the rule (The rule is an aggregate function, such as Sum or avg ).

Join)

When multiple tables are connected (bookmarked searches and the connections between indexes are counted), SQL Server uses three different Connection Methods: Nested Loops Join ), merge Join and Hash Join ). These connections are not better than the other, but each connection method will adapt to a specific scenario.

Nested loop Join)

Figure 10 shows a simple nested loop connection.

Figure 10. Example of a nested loop connection

The icon of the nested loop connection is also very expressive, and the external input (Outer input) is located above. Here, the clustered index scan is used. And Inner Input, which is the clustered index search. The external input is executed only once. The internal input is searched based on each row where the external input meets the Join condition. Because there are 290 rows, internal input is executed 290 times.

You can see in. 11 in the Properties window:

Figure 11. Number of internal input executions

According to the nested loop principle, it is not difficult to see that because external input is scanning and internal input is searching, when the external input result set of two Join tables is small, the query optimizer is more inclined to choose the nested loop mode when the tables searched by internal input are very large.

Merge Join)

Different from nested loops, the merge join only performs one access from each table. Based on this principle, the merged connections are much faster than nested loops. The following shows a typical merge connection, as shown in Figure 12.

Figure 12. Merge connections

From the principle of merging connections, it is hard to imagine that the first merge connection requires the order of both parties, and the Join condition is equal. Because the two input conditions are already ordered, a row is compared from each input set. equal results are returned, and unequal results are discarded, it is not difficult to see why Merge join only allows equal signs after Join. We can see this principle from the icon in Figure 11.

If the two sides of the input data are out of order, the query analyzer will not choose to merge connections. We can also use the index prompt to force the merge connections, the execution plan must be sorted by adding a sorting step, as shown in Figure 13.

Figure 13. Merge Join by sorting

Hash Join)

Hash connections only need to access the data of both parties once. Hash connections are implemented by creating a hash in the memory. This is a relatively high memory consumption. If the memory is insufficient, it will also occupy tempdb. However, they do not need to be sorted by both parties as they do in the merged connection. A typical hash connection 14 is shown.

Figure 14. Hash join

Here, I deleted the clustered index of Costomer. Otherwise, the two sorted SQL servers will choose a merge connection with a lower cost. SQL Server generates a hash table using the two preceding inputs. The following inputs are used for testing. You can see these information in the Properties window, as shown in Figure 15.

Figure 15. Hash key generation and hash key Detection

 

Generally, hash matching is used when two inputs are large and the requested data is not sorted by one or both parties.

Parallel

When multiple tables are connected, SQL Server allows parallel queries with multiple CPUs or multiple cores, which undoubtedly improves the efficiency, as shown in Example 16 of parallel queries.

Figure 16. Improve parallel efficiency

Summary

This article briefly introduces the common operating principles in the SQL Server execution plan. Understanding these steps and principles is the basic skill for optimizing queries.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.