SQL Execution Plan parsing (1)-execution plan BASICS (I)

Source: Internet
Author: User

E-book URL

The execution plan is simply the result calculated by the query optimizer, which indicates the most efficient way to execute the submitted query. The execution plan tells you how the query is executed, so it is the basic means for DBA to diagnose low-performance queries.

1. query what happened after submission

When the query is submitted to SQL Server
After the database, many processes start to work, and the ultimate goal is to return the data to the user or store it as soon as possible, while maintaining data consistency. These processes are for each query server submitted to the server
So there are a lot of different actions happen on the server at the same time, we only focus on T-SQL-related things, roughly divided into two stages:

  • Processes in relational Engine
  • Stored engine Process

In the relational engine, the query optimizer parses and processes the query, generates the execution plan, and then sends the execution plan to the storage engine (in binary mode). The storage engine uses the execution plan to obtain or update data. Locking, index maintenance, and transactions all occur in the storage engine. Since the execution plan is generated by the relational engine, we will focus on the relational engine.

1.1 query resolution

The first place for a T-SQL query to go to the server is the relational engine, which is passed to a process to check the spelling and format. This processing process is the query parsing process. The output of the parsing process is a parsing tree. The parsing tree represents the logical steps for executing the query. If the T-SQL statement is not a data manipulation language (DML) statement, it will not be optimized. For example, the system has only one "correct" way to create a table, there is no chance.
To improve performance. If T-SQL is a DML declaration, the parsing tree is passed to a process named algebrizer. algebrizer parses and queries all referenced objects, tables, and columns.
And identifies the column type (varchar (50) vs nvarchar (25). In addition, an aggregate binding
To determine the aggregation position. The algebrizer process is very important because the query may contain packages or aliases, synonyms, or nonexistent names, which need to be resolved or the query references nonexistent objects. The output of algebrizer is the query processsor tree in binary format, which is then passed to the query optimizer.

1.2 Query Optimizer

The query optimizer determines whether the data can access the index and which connection to use. There are many other things. This kind of decision is based on overhead and requires CPU and Io. The query optimizer will generate and evaluate a lot
Plan (unless the cache already exists). Generally, select the one with the lowest overhead, for example, the one with the fastest running and the least resources, CPU, and I/O used. Execution speed is still the most important factor,
If results are returned faster, the optimizer selects CPU-intensive processes. Sometimes the optimizer also chooses a plan with lower efficiency. if it thinks that it takes time to evaluate a lot of execution plans, it is better to adopt lower efficiency.
. If you submit a very simple query, for example, a single table query, no index, no aggregation, or no computing, the optimizer will not spend time computing and optimization, instead, trival plan is simply used.

If the query is not trival, the optimizer calculates the overhead and selects a plan. Therefore, it depends on the statistics maintained by the SQL Server server. Statistical data is the data collected by the database about columns and indexes. It describes the distribution, uniqueness, and selectivity of data.
(Selecti.pdf ). The information that constitutes the statistical data is represented by a histogram (histogram) and a table (tabulation), which is distributed from the 200 average
Data points indicates the number of occurrences of specific data. This "data on Data" provides the optimizer with the necessary information required for computing.

If the statistics related to columns and indexes exist, the optimizer will use them for calculation. By default, the system displays all the indexes and one of the indexes used as the predicate and where clauses.
Minute and join
Create and update statistical data for columns that are part of the On clause. The table variable does not generate statistical data. The optimizer always assumes that it has only one row and ignores its real size. Temporary tables have statistical data, and permanent tables
The statistical data is stored in the same histogram for the optimizer.

The optimizer uses these statistics and query Processor
Tree together to determine the best execution plan. This means that it needs to test a series of plans, test different Join types, organize the join order, and try different indexes until it recognizes
The fastest execution plan. In this calculation, each step is assigned a value, representing the estimated time overhead (estimated) of the optimizer.
Cost), the overhead of each step is the overhead of the execution plan.

It is worth noting that the estimated overhead is estimated after all. If there is infinite time and complete latest statistics, the optimizer can find the perfect plan to execute the query, however, the optimizer tries
Find the best execution plan, and obviously, the quality of available statistical data is also limited. Therefore, although this overhead estimation is a very useful method, it cannot accurately reflect the reality.

After the optimizer determines the execution plan, the actual execution plan is created and stored in the memory space plan cache, unless the cache of the same execution plan already exists. The optimizer generates possible execution plans (potential plans) and compares them with existing ones in the cache. If they match, they use the ones in the cache.

1.3 query execution

After the execution plan is generated, the operation is transferred to the storage engine, where the query is executed based on the actual execution plan. We will not discuss it in detail here. Apart from one point, the implementation plans and design executions generated by great efforts may not be the same. Let's say the following scenario:

  • The execution plan exceeds the limit of parallel execution.

* Parallel execution improves execution efficiency with multi-processor

  • The statistical data has expired or has changed.

1.4 estimated and actual execution plans

As mentioned above, there are two different execution plans. The first is the estimated execution plan (estimated execution) generated by the optimizer.
Plan), operators and steps are labeled with the logical label, representing the optimizer's point of view, and the other is the actual execution plan (actual execution
Plan.

1.5 reuse execution plan

The execution plan overhead generated by the server is expensive. If possible, the SQL server will try its best to maintain and reuse the execution plan. After the execution plan is generated, it is stored in the memory plan cache.

The execution plan does not reside in the memory permanently. They gradually disappear from the system based on age changes. The age formula is the estimated overhead * the number of times it is used, for example, the overhead of a plan.
It is 10 and has been referenced for 5 times. The age value is 50. The lazywriter process is responsible for releasing all types of cache (including plan
It periodically scans the objects in the cache and deducts a certain age value each time. If the following conditions are met, the execution plan will be cleared from the inner:

1. the system needs more memory.

2. The age value reaches 0.

3. The execution plan is not referenced by any connection.

The execution plan is not unchangeable, and some events or actions will force the execution plan to be re-compiled. Remember this is important because the overhead for re-compiling the execution plan may be very large. The following actions will cause the execution plan to be re-compiled:

1. Change the structure or schema of the table referenced in the query.

2. Changed the index used in the query

3. The indexes used in the query are deleted.

4. Updated the statistical data used for query.

5. Call the sp_recompile function.

6. Perform a large number of insert or delete operations on the keys of the Tables Used for query.

7. Significant growth caused by inserted and deleted for tables with triggers

8. DDL and DML are mixed in a query.

9. The set option is changed during query execution.

10. Changed the structure or schema of the temporary table used for query.

11. Changed the dynamic attempt (Dynamic views) used in the query)

12. Changed the cursor option in the query.

13. Changed the remote row set, just like in the distributed partitioned View

14. The for browse option is changed when the client cursor is used.

1.6 why the estimation may be different from the actual execution plan

Generally, the estimated execution plan you see may be the same as the actual execution plan. However, when the environment changes, the two may be different.

  • Outdated statistics
    The difference between statistical data and actual data is the main cause of the difference between the two execution plans. This usually occurs when data is inserted or deleted, which changes the key value and distribution of the index.
    To reduce operation costs, atomic statistical data operations are sampled on a subset of data. This means that, over time, statistical data is increasingly unable to accurately reflect actual data.
    This will not only lead to the difference between the two execution plans, but also lead to a "bad" execution plan.
  • Illegal estimated execution plan
    In some cases, an estimated execution plan cannot work at all, for example, the following example:
    CREATE TABLE TempTable
    (
    Id INT IDENTITY(1, 1)
    ,Dsc NVARCHAR(50)
    );
    INSERT INTO TempTable ( Dsc )
    SELECT [Name]
    FROM [Sales].[Store];
    SELECT *
    FROM TempTable;
    DROP TABLE TempTable;

    You will get an error

    Msg 208, Level 16, State 1, Line 7
    Invalid object name 'TempTable'.

    The optimizer is used to generate an estimated execution plan and does not execute that T-SQL. When the Declaration is run through algebrizer, the temporary table does not exist because the query is not executed, which leads to errors.

  • Request parallel computing
    When a plan encounters a bottleneck in parallel computing, two plans will be created. The actual execution depends on the query engine. So you may see (or not) Parallel operators in the estimated execution plan. When the plan is actually executed,
    The query engine determines whether it does not support parallel computing or call parallel queries, you may see a completely different plan.

1.7 execution plan format

1. graphic mode

2. Text

Showplan_all

Showplan_text

Statistics Profile

3. xml

Showplan_xml

Statistics_xml

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.