"Reprint" SQL execution plan

Last Update:2018-02-08 Source: Internet

Author: User

Tags joins scalar

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To understand the execution plan, you have to understand it, the various nouns. Since I don't know much about it. This article is intended as a write-only, do not understand the understand before writing.

At the beginning, it is important to note that the first time you look at the execution plan, theSQL Server execution plan is viewed from right to left.

Noun Analysis:

　　Scan: Iterate through the data row by line.

First set up a table and show you what it looks like.

CREATE TABLE person (    Id int IDENTITY () is not null,    Name nvarchar (+) NULL, age    int null,    Height i NT NULL, area    nvarchar (+) null,    marryhistory nvarchar (ten) null,    educationalbackground nvarchar (ten) null ,    Address nvarchar (+) NULL,    Insiteid int null) on [PRIMARY]

The data in the table is around 140,000, presumably similar to the following:

This table, temporarily without any indexes.

One, data access operation 1, table scan

Table scan occurs when a table scan occurs on a heap table and no available indexes are available, indicating that the entire table is scanned once.

Now, let's execute a simple query on this table:

SELECT * from person WHERE Name = ' Childe '

View the execution plan as follows:

The table scan, as the name implies, is the entire table scan, to find the data you need.

2. Clustered Index Scan

Clustered Index Scan: occurs in a clustered table and is equivalent to a full table scan operation, but is more efficient when the conditions for clustered columns such as (WHERE Id > 10) are used.

Below we add a clustered index to this table in the ID column

CREATE CLUSTERED INDEX ix_id on person (Id)

Execute the same query statement again:

SELECT * from person WHERE Name = ' Childe '

The implementation plan is as follows:

Why is the clustered index built in the ID column, which has an impact on the scan? Not to mention the name condition also does not matter ah?

In fact, after you add a clustered index, the table becomes a clustered table from the heap table. We know that the data for the clustered table exists at the leaf-level node of the clustered index. Therefore, the aggregation scan and the table scan actually the difference is not big, must say the difference is large, also must see what in the condition is what, later returns the data. In the case of this SQL statement, the efficiency difference is not very large.

You can look at the I/O statistics:

Table Scan:

Clustered Index Scan:

This is beyond the scope of this article, the efficiency is not in the context of this article, this article only consider the differences between the various scans, and why the resulting.

3. Clustered Index Lookup

Clustered Index Lookup: Scans a specific range of rows in a clustered index.

See execute the following SQL statement:

SELECT * from person WHERE Id = ' 73164 '

The implementation plan is as follows:

4. Index Scan

Index Scan: The overall scan for nonclustered indexes.

Let's add a clustered index and execute a query statement:

Create nonclustered index ix_name on person (name)    --Creating nonclustered indexes SELECT Name from person

View the execution plan as follows:

Why is an index scan (nonclustered index) selected here?

Because this nonclustered index can overwrite the required data. What if a nonclustered index cannot be overwritten? For example, we will change select to select * And then take a look.

It is clear that the returned results contain too many records and are not cost-effective with nonclustered indexes. Therefore, a clustered index is used.

If we delete the clustered index at this point, then execute SELECT * To see.

DROP INDEX person.ix_id

There is no clustered index at this time, so only table scans are used.

5. Bookmark Search

Previous learning about indexing we already know that when not overwriting and including all of the required columns in a nonclustered index, SQL Server chooses to directly perform a clustered index scan to get the data, or to go to a nonclustered index to find the clustered index key, and then use the clustered index to find the data.

Let's look at an example of a bookmark lookup:

SELECT * from person WHERE Name = ' chubby '--name column with nonclustered index

The implementation plan is as follows:

The above procedure can be understood as: first find the desired row through a nonclustered index, but this index does not contain all the columns, so also to find these columns in the base table, so to do key lookup, if the base table is organized in a heap, then this key lookup (key lookup) It becomes the RID lookup (RID lookup), the key lookup, and the RID lookup collectively known as the bookmark lookup. However, sometimes when the number of rows returned by a nonclustered index is too large, SQL Server may choose to perform a direct clustered index scan.

Stream aggregation operation 1, Flow aggregation

Flow aggregation: Calculates the summary values for multiple sets of rows in the corresponding sorted stream.

All aggregate functions (such as count (), MAX ()) will have a flow aggregation, but they will not consume Io, only consume the CPU.

For example, execute the following statement:

SELECT MAX (age) from person

View the execution plan as follows:

2. Calculate scalar

Calculate scalar: Calculates the new value based on the existing value in the row. such as the count () function, more than one row, the number of rows is added 1.

Aggregate functions other than Min and Max functions require that the stream aggregation operation be followed by a computed scalar.

SELECT COUNT (*) from person

View the execution plan as follows:

3. Hash aggregation (hash match)

For a GROUP BY clause, because the data needs to be ordered by the column following group by, sort is required to ensure ordering. Note that the sort operation is a memory-intensive operation that consumes tempdb when there is not enough memory. SQL Server always chooses the lowest cost in the sort operation and hash matching .

SELECT Height,count (Id) from person    --to find out the defeat of each height GROUP by height

The implementation plan is as follows:

For a large amount of data, SQL Server chooses a hash match.

After the hash list is established in memory, the value following the group by is used as the key, then each data in the collection is processed sequentially, and when the key does not exist in the hash list, an entry is added to the hash table, and when the key is already present in the hash table, the rule is the aggregate function, such as SUM, Avg something) calculates the value in the hash list (value).

4. Sorting

When the amount of data is less than that, for example, execute the following statement to create a new table with only dozens of records.

SELECT * into Person2 from Person2 WHERE Id < 100

Then execute the same query statement:

SELECT Height,count (Id) from Person2    --only the table is replaced by the table GROUP by Height with a smaller amount of data

The implementation plan is as follows:

Third, the connection

When multiple tables are connected (including bookmark lookups, connections between indexes), SQL Server takes three different types of connections: loop nested connections, merge connections, Hash joins. These kinds of connection formats have the right scenario for you, and there's no better way to say it.

The new two-sheet table is as follows

This is a simple news, column structure.

1. Nested loops

Let's look at a simple inner JOIN query statement

SELECT * from Nx_column as C INNER joins Nx_article as A on a.columnid = C.columnid

The implementation plan is as follows:

The icon for looping nested connections is also very image, in the above external input (Outer input), here is the clustered index scan. And in the following internal input (Inner input), here is the clustered index lookup. The external input is executed only once, and the internal input is looked up based on each row of the external input that satisfies the join condition. This is done 7 times for the internal input because it is 7 rows.

According to the principle of nested loops it is not difficult to see that because the external input is scanned, the internal input is lookup, when the two join table outside the input result set is relatively small, and the internal input to find the table is very large, the query optimizer prefers to choose the loop nesting method.

2. Merge connection

Unlike loop nesting, a merge connection is a single access from each table. From this point of view, merging connections is much faster than looping nesting.

From the principle of merging joins it is not difficult to imagine that merging connections first requires both parties to be orderly. And the condition of the join is equal to the number. Since two input conditions are already in order, it is not difficult to see why the merge join is only allowed to be equal to the join after a row is compared from each input set, equal returns, and unequal discards. We can see this principle from the icon in Figure 11.

SELECT * from Nx_column as C INNER joins    nx_article as A on a.columnid = C.columnid OPTION (MERGE JOIN)

The implementation plan is as follows:

If both sides of the input data are unordered, the Query Analyzer does not select the merge connection, and we can also force the merge connection with the index hint, in order to achieve this, the execution plan must be ordered with a sort step. This is why the SQL statement above is an option (MERGE join). The ColumnID column of the article table is sorted above.

3. Hash connection

Hash joins also only need to access data from both sides of the 1 times. Hash connections are implemented by establishing a hash table in memory. This consumes more memory and consumes tempdb if memory is low. But it's not like merging connections that requires both sides to be orderly.

For the following two implementations, the clustered index of the two columns should not be built in the ColumnID column, otherwise the hash join will not be used.

ALTER TABLE pk_nx_column Drop CONSTRAINT pk_nx_column    --Delete primary key DROP index Nx_column.pk_nx_column    --delete clustered index CREATE C Lustered index ix_columnname on Nx_column (ColumnName)-    -Create a clustered index--you can set it back to the primary key, and with the clustered index, it cannot be built by default with the primary key.

Also delete the clustered index of another table, article.

Then execute the following query:

SELECT * from Nx_column as C INNER joins    nx_article as A on a.columnid = C.columnid

The implementation plan is as follows:

To remove a clustered index, two sequential input SQL Server chooses a lower-cost merge connection. SQL Server uses two of the above inputs to generate a hash table, the following input to probe, which can be seen in the Properties window, as shown in 15.

Typically, a hash match is used when the data is reached when one or both of the conditions are not sorted.

Iv. Parallel

When multiple tables are connected, SQL Server also allows query parallelism in multi-CPU or multi-core scenarios, which undoubtedly improves efficiency.

Reproduced a great God's explanation, originally from: http://www.cnblogs.com/kissdodog/p/3160560.html

"Reprint" SQL execution plan

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More