Query Optimizer Kernel Analysis article 7: Data Access Operations for execution engines --- Scan
From the beginning of this articleArticleThis section describes the execution engine-related knowledge.
Series of article indexes:
Query Optimizer Kernel Analysis article 1 Query Optimizer Kernel Analysis article 2: generating candidate execution plans & execution plan cost estimation Query Optimizer Kernel Analysis article 3: query execution and plan cache & hint prompt Query Optimizer Kernel Analysis article 4: view the execution plan from an instance
Query Optimizer Kernel Analysis article 5: better understanding of the Execution Plan
Query Optimizer Kernel Analysis article 7: Data Access Operations for execution engines --- Scan Analysis of the query optimizer kernel Article 8: Data Access Operations for execution engines --- seek (I)
The execution engine is composed of a large number of Physical Operations (and these physical operations call storage-related methods). These operations are used by the query processor to efficiently execute our queries.
Do not confuse the query processor with the query optimizer here. They are not the same thing. To give you a clear understanding of related concepts, see the following figure:
Until the relationship between the processor and the query optimizer is queried:
- A database consists of two parts: a relational engine and a storage engine. (I suggest you read these articles carefully: SQL internal mechanism: the lifecycle of a query)
- The "query processor" we call here is the relational engine in the figure, and the query optimizer is only a part of it.
The "execution engine" we will introduce recently is "query executor" in ". In the process of introducing the execution engine, I will focus on several common operations in our queries: data access, aggregation, join, and parallel operations. Of course, there are more operations in the execution engine. If you are interested, you can refer to the SQL Server online series.
We will first introduce several data access operations: scan, seek, and lookup.
I believe you may have some knowledge about these operations.
Scan: This operation reads the entire data structure, which can be a heap table, clustered index, and non-clustered index.
Seek: This operation does not read the entire data structure, but directly locates the row to be read through the index. Therefore, seek operations can only occur on clustered and non-clustered indexes.
Heap table: No clustered index is created, and the data in the table is not stored in order. Once a table has a clustered index, the data in the table will be stored in the order of the clustered index.
In addition, non-clustered indexes can be created on heap tables or on tables with clustered indexes.
Let's make a summary through the following table (to see what physical operations can be supported by different data structures), and then quickly enter the scan operation details.
Data Structure |
Scan |
Seek |
Heap table |
Table Scan |
|
Clustered Index |
Clustered index Scan |
Clustered index seek |
Non-clustered Index |
Index Scan |
Index seek |
As before, we can start with an example. Here we still use the sample database adventureworks. First, we can see a table scan operation (that is, the whole table scan) and see the following query:
View the actual execution plan, for example:
By viewing the definition of the databaselog table, we find that this table is a heap table, that is, this table has no clustered index, as shown below:
Next, let's look at an example of clustered index scan (clustered index scan). We can see the following query:
The execution plan is as follows:
By viewing the definition of the address table, we can see that this table does have clustered indexes:
We can see from the above example:
- Although the scan operation is performed, the specific physical operation scan is also selected because of the different characteristics of the table.
- Whether it is table scan or clustered index scan, they all scan the corresponding data structure. The difference is that the former occurs on the heap table, the latter occurs on a table containing clustered indexes.
Next, let's look at an interesting query:
The execution plan is as follows:
The figure shows index scan, which means that this query does not scan all the data pages of the table, but instead scans the index, which is much lower. This operation is performed because there is a non-clustered index on the table, which is defined as follows:
This non-clustered index contains two fields in our query: [City] and [stateprovinceid]. Because addressid is the clustered index of this table, for a table with clustered indexes, all its non-clustered indexes will contain a reference to the clustered index, this is to speed up data retrieval (you can think about why ?). Therefore, the three operations in the preceding query can be found on the index page, so there is no need to scan the underlying table.