To understand the execution plan, you have to understand it, the various nouns. Since I don't know much about it. This article is intended as a write-only, do not understand the understand before writing.
At the beginning, it is important to note that the first time you look at the execution plan, theSQL Server Execution plan is viewed from right to left.
Noun Analysis:
Scan: Iterate through the data row by line.
First set up a table and show you what it looks like.
CREATETABLEPerson (IdIntIDENTITY (1,1)NotNull, Namenvarchar50)Null, ageIntNull, HeightIntNull, areanvarchar50)null marryhistory null, Educationalbackground nvarchar ( null nvarchar (50) nullint nullon [primary
The data in the table is around 140,000, presumably similar to the following:
This table, temporarily without any indexes.
One, data access operations1. Table Scan
Table scan occurs when a table scan occurs on a heap table and no available indexes are available, indicating that the entire table is scanned once.
Now, let's execute a simple query on this table:
' childe '
View the execution plan as follows:
The table scan, as the name implies, is the entire table scan, to find the data you need.
2. Clustered Index Scan
Clustered Index Scan: occurs in a clustered table and is equivalent to a full table scan operation, but is more efficient when the conditions for clustered columns such as (WHERE Id > 10) are used.
Below we add a clustered index to this table in the ID column
On person (Id)
Execute the same query statement again:
' childe '
The implementation plan is as follows:
Why is the clustered index built in the ID column, which has an impact on the scan? Not to mention the name condition also does not matter ah?
In fact, after you add a clustered index, the table becomes a clustered table from the heap table. We know that the data for the clustered table exists at the leaf-level node of the clustered index. Therefore, the aggregation scan and the table scan actually the difference is not big, must say the difference is large, also must see what in the condition is what, later returns the data. In the case of this SQL statement, the efficiency difference is not very large.
You can look at the I/O statistics:
Table Scan:
Clustered Index Scan:
This is beyond the scope of this article, the efficiency is not in the context of this article, this article only consider the differences between the various scans, and why the resulting.
3. Clustered Index Lookup
Clustered Index Lookup: Scans a specific range of rows in a clustered index.
See execute the following SQL statement:
'73164'
The implementation plan is as follows:
4. Index Scan
Index Scan: The overall scan for nonclustered indexes.
Let's add a clustered index and execute a query statement:
On (Name) --Create a nonclustered index from the person
View the execution plan as follows:
Why is an index scan (nonclustered index) selected here?
Because this nonclustered index can overwrite the required data. What if a nonclustered index cannot be overwritten? For example, we will change select to select * And then take a look.
It is clear that the returned results contain too many records and are not cost-effective with nonclustered indexes. Therefore, a clustered index is used.
If we delete the clustered index at this point, then execute SELECT * To see.
INDEX person.ix_id
There is no clustered index at this time, so only table scans are used.
5. Bookmark Search
Previous learning about indexing we already know that when not overwriting and including all of the required columns in a nonclustered index, SQL Server chooses to directly perform a clustered index scan to get the data, or to go to a nonclustered index to find the clustered index key, and then use the clustered index to find the data.
Let's look at an example of a bookmark lookup:
' chubby '--name column with nonclustered index
The implementation plan is as follows:
The above procedure can be understood as: first find the desired row through a nonclustered index, but this index does not contain all the columns, so also to find these columns in the base table, so to do key lookup, if the base table is organized in a heap, then this key lookup (key lookup) It becomes the RID lookup (RID lookup), the key lookup, and the RID lookup collectively known as the bookmark lookup. However, sometimes when the number of rows returned by a nonclustered index is too large, SQL Server may choose to perform a direct clustered index scan.
Ii. Flow Aggregation Operations1. Stream Aggregation
Flow aggregation: Calculates the summary values for multiple sets of rows in the corresponding sorted stream.
All aggregate functions (such as count (), MAX ()) will have a flow aggregation, but they will not consume Io, only consume the CPU.
For example, execute the following statement:
From person
View the execution plan as follows:
2. Calculate scalar
Calculate scalar: Calculates the new value based on the existing value in the row. such as the count () function, more than one row, the number of rows is added 1.
Aggregate functions other than Min and Max functions require that the stream aggregation operation be followed by a computed scalar.
COUNT (from person
View the execution plan as follows:
3. Hash aggregation (hash match)
For a GROUP BY clause, because the data needs to be ordered by the column following group by, sort is required to ensure ordering. Note that the sort operation is a memory-intensive operation that consumes tempdb when there is not enough memory. SQL Server always chooses the lowest cost in the sort operation and hash matching .
SELECT height,from person --Find out the height of the defeat by height
The implementation plan is as follows:
For a large amount of data, SQL Server chooses a hash match.
After the hash list is established in memory, the value following the group by is used as the key, then each data in the collection is processed sequentially, and when the key does not exist in the hash list, an entry is added to the hash table, and when the key is already present in the hash table, the rule is the aggregate function, such as SUM, Avg something) calculates the value in the hash list (value).
4. Sorting
When the amount of data is less than that, for example, execute the following statement to create a new table with only dozens of records.
From Person2
Then execute the same query statement:
SELECT Height,from Person2 --only the table is replaced by a table by Height with a smaller amount of data
The implementation plan is as follows:
third, the connection
When multiple tables are connected (including bookmark lookups, connections between indexes), SQL Server takes three different types of connections: loop nested connections, merge connections, Hash joins. These kinds of connection formats have the right scenario for you, and there's no better way to say it.
The new two-sheet table is as follows
This is a simple news, column structure.
1. Nested loops
Let's look at a simple inner JOIN query statement
As C as A = C.columnid
The implementation plan is as follows:
The icon for looping nested connections is also very image, in the above external input (Outer input), here is the clustered index scan. And in the following internal input (Inner input), here is the clustered index lookup. The external input is executed only once, and the internal input is looked up based on each row of the external input that satisfies the join condition. This is done 7 times for the internal input because it is 7 rows.
According to the principle of nested loops it is not difficult to see that because the external input is scanned, the internal input is lookup, when the two join table outside the input result set is relatively small, and the internal input to find the table is very large, the query optimizer prefers to choose the loop nesting method.
2. Merge connection
Unlike loop nesting, a merge connection is a single access from each table. From this point of view, merging connections is much faster than looping nesting.
From the principle of merging joins it is not difficult to imagine that merging connections first requires both parties to be orderly. And the condition of the join is equal to the number. Since two input conditions are already in order, it is not difficult to see why the merge join is only allowed to be equal to the join after a row is compared from each input set, equal returns, and unequal discards. We can see this principle from the icon in Figure 11.
As C as A = c.columnid join)
The implementation plan is as follows:
If both sides of the input data are unordered, the Query Analyzer does not select the merge connection, and we can also force the merge connection with the index hint, in order to achieve this, the execution plan must be ordered with a sort step. This is why the SQL statement above is an option (MERGE join). The ColumnID column of the article table is sorted above.
3. Hash connection
Hash joins also only need to access data from both sides of the 1 times. Hash connections are implemented by establishing a hash table in memory. This consumes more memory and consumes tempdb if memory is low. But it's not like merging connections that requires both sides to be orderly.
For the following two implementations, the clustered index of the two columns should not be built in the ColumnID column, otherwise the hash join will not be used.
CONSTRAINT pk_nx_column --Delete primary key- -delete clustered index
Also delete the clustered index of another table, article.
Then execute the following query:
As C as A = C.columnid
The implementation plan is as follows:
To remove a clustered index, two sequential input SQL Server chooses a lower-cost merge connection. SQL Server uses two of the above inputs to generate a hash table, the following input to probe, which can be seen in the Properties window, as shown in 15.
Typically, a hash match is used when the data is reached when one or both of the conditions are not sorted.
Iv. Parallel
When multiple tables are connected, SQL Server also allows query parallelism in multi-CPU or multi-core scenarios, which undoubtedly improves efficiency.
Understanding of SQL Server Execution plan