SQL Server execution plan Cache

Source: Internet
Author: User

[Transfer] http://www.cnblogs.com/CareySon/archive/2013/05/04/PlanCacheInSQLServer.html

Introduction

The SQL statements we usually write are essentially the logic for obtaining data, rather than the physical path for obtaining data. When the SQL statements we write are passed to the SQL Server, the query analyzer parses, binds, and optimizes the statements in sequence), Execution (Execution ). In addition to the execution steps, the execution plan is generated after the first three steps, that is, the SQL Server obtains physical data according to the plan, and finally the execution steps are executed according to the execution plan to obtain the results. However, the query optimizer is not the focus of this article. This article mainly describes the mechanism and common problems related to the cache execution plan after the query optimizer generates the execution plan.

 

Why is execution plan cache required?

From the introduction, we know that the process steps used to generate the execution plan occupy a large proportion, which will consume various CPU and memory resources. In fact, the query optimizer generates an execution plan to do more, which is divided into three parts:

  • First, the table name, stored procedure name, and view name are parsed based on the input query statement text. Then, a tree representing the query text is generated based on logical data operations.
  • The second step is optimization and simplification. For example, the subquery is converted to a pair of connections, the filtering conditions are prioritized, and unnecessary connections are deleted (for example, the original table may not need to be referenced if an index exists).
  • Step 3: perform Cost-based evaluation based on the statistical information in the database.

 

After the preceding three steps are completed, multiple candidate execution plans are generated. Although our SQL statement has only one logic, there can be multiple physical data retrieval orders in this logical order. For example, you want to go from Beijing to Shanghai, that is, high-speed trains or planes can be used, but the descriptions from Beijing to Shanghai are logical descriptions. There are multiple implementation paths. Let's look at an example in SQL Server, such as the query in code list 1.

   1: SELECT * 
   2: FROM A INNER JOIN B ON a.a=b.b
   3: INNER JOIN C ON c.c=a.a

Code List 1.

 

For this query, no matter whether A first joins B or B first joins C, the results are the same. Therefore, multiple execution plans can be generated, however, the basic principle is that SQL Server does not necessarily select the best execution plan, but selects a good plan. This is because the cost of evaluating all execution plans should not be too high. In the end, SQL Server evaluates the cost of the execution plan based on the data base and the CPU and IO costs consumed by each step. Therefore, the selection of the execution plan is heavily dependent on the statistics, I will not elaborate on the statistics.

The previous process of querying analyzer to generate an execution plan is not difficult to see, the resource cost consumed by this step is also amazing. Therefore, when the same query is executed once, caching it will greatly reduce the compilation of the execution plan, thus improving the efficiency. This is the original intention of the Execution Plan cache.

 

Objects cached in the execution plan

The objects cached in the execution plan are divided into four categories:

  • Compiled plan: the relationship between the compiled execution plan and execution plan is the same as that between MSIL and C.
  • Execution context: There is a context environment when executing the compiled plan. Because the compiled plan can be shared by multiple users, but the Query Needs to store the SET information and the value of local variables, the context environment needs to be associated with the execution plan. The execution context is also called Executable Plan.
  • Cursor: the stored cursor state is similar to the relationship between execution context and compilation plan. The cursor itself can only be used by a connection, but the execution plan associated with the cursor can be shared by multiple users.
  • Algebra tree: an algebra tree (also known as the parsing tree) represents the query text. As we mentioned earlier, the query analyzer does not directly reference the query text, but rather an algebra tree. Here you may have doubts that the algebra tree is used to generate the execution plan, and the algebra trunk hair is also cached here? This is because views, defaults, and constraints may be used repeatedly by Different queries, and the algebra tree of these objects is cached to save the parsing process.

 

For example, you can use the DMV dm_exec_cached_plans to find the cached execution plan, as shown in 1.

Figure 1. cached execution plan

 

So what about the memory used by these types of object cache? We can see through the DMV dm_ OS _memory_cache_counters that the above several types of cached objects are shown in 2.

Figure 2. Memory occupied by these types of object cache in memory

 

In addition, the execution plan cache is a kind of cache. Objects in the cache will be replaced according to the algorithm. For Execution Plan caching, the replaced algorithm is mainly based on memory pressure. Memory pressure is divided into two types: internal pressure and external pressure. The external pressure is caused by the reduction of the available space of the Buffer Pool to a certain critical value (the critical value varies depending on the size of the physical memory. If the maximum memory is set, it depends on the maximum memory ). The internal pressure is because the objects in the execution plan cache exceed a certain threshold. For example, the 32-bit SQL Server has a threshold of 40000, And the 64-bit SQL Server has a threshold of 160000.

The cached identifier is the query statement, so select * from SchemaName. although TableName and Select * from TableName have the same effect, they need to cache two execution plans. Therefore, a Best Practice should include the schema name When referencing the table name and other object names.

Optimize statements based on the cached execution plan

Cached execution plans are rich in content, including not only the cached execution plans and statements, but also the statistics of cached execution plans, for example, CPU usage and waiting time. However, it is worth noting that the statistics here only count on the execution time, not the Compilation Time. For example, you can use the code in code list 2 to find the 20 longest query statements in the database based on the cached execution plan.

 
 
 
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED 
SELECT TOP 20 
  CAST(qs.total_elapsed_time / 1000000.0 AS DECIMAL(28, 2)) 
                                     AS [Total Duration (s)] 
  , CAST(qs.total_worker_time * 100.0 / qs.total_elapsed_time 
                               AS DECIMAL(28, 2)) AS [% CPU] 
  , CAST((qs.total_elapsed_time - qs.total_worker_time)* 100.0 / 
        qs.total_elapsed_time AS DECIMAL(28, 2)) AS [% Waiting] 
  , qs.execution_count 
  , CAST(qs.total_elapsed_time / 1000000.0 / qs.execution_count 
                AS DECIMAL(28, 2)) AS [Average Duration (s)] 
  , SUBSTRING (qt.text,(qs.statement_start_offset/2) + 1,      
    ((CASE WHEN qs.statement_end_offset = -1 
      THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2 
      ELSE qs.statement_end_offset 
      END - qs.statement_start_offset)/2) + 1) AS [Individual Query 
  , qt.text AS [Parent Query] 
  , DB_NAME(qt.dbid) AS DatabaseName 
  , qp.query_plan 
FROM sys.dm_exec_query_stats qs 
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt 
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp 
WHERE qs.total_elapsed_time > 0 
ORDER BY qs.total_elapsed_time DESC     

Code List 2. Find the 20 query statements with the longest total database time using the execution plan Cache

 

In the preceding statement, you can modify Order By to find the statement you want based on different conditions.

Compared with Trace on the server or Profiler on the client, this method has some advantages. If Trace capture is used for analysis, it is not only time-consuming and laborious, but also brings additional overhead to the server, using this method to find time-consuming query statements is much simpler. However, this statistics is only based on the previous instance restart or after DBCC FreeProcCache is not run. However, this method also has some drawbacks, such:

  • Statements such as index reconstruction and Statistics Update are not cached, and the cost of these statements is very high.
  • The cache may be replaced at any time, so this method cannot see statements that are no longer cached.
  • The statistics only show the execution cost and the compilation cost.
  • A cache without parameterization may present different execution plans for the same statement, so different caches may appear. In this case, the statistical information cannot be accumulated, which may result in inaccurate results.

 

Conflict between execution plan cache and Query Optimizer

Do you remember what we said before? The compilation and selection of execution plans are divided into three steps. The first two steps are only based on the metadata of the query statement, table, and other objects, in the stage of Execution Plan Selection, statistics are heavily dependent. Therefore, the query optimizer generates different execution plans when the same statement is just different parameters, for example, let's look at a simple example, as shown in 3.

Figure 3. the query optimizer selects different execution plans only because of different parameters

 

You may think that this is not quite good. Different execution plans are generated based on parameters. Let's consider another question. If we put the above query in a stored procedure, the parameter cannot be sniffed directly. When the first execution plan is cached, the second execution will reuse the first execution plan! Compilation Time is not required, but the cost of poor execution plans is higher! Let's take a look at this example, as shown in 4.

Figure 4. Different parameters are the same execution plan!

 

Let's take a look at the same example. After reversing the execution order, we will see Figure 5.

Figure 5. The execution plan has completely changed

 

We can see that the statement for the second execution completely reused the first execution plan. There will always be a query sacrifice. For example, if the parameter is 4, there will be more than 5000 entries. At this time, index scanning should be the most efficient, but figure 4 reused the previous execution plan and used more than 5000 searches !!! This is undoubtedly inefficient. In addition, this situation may confuse DBAs, because the execution plan in the cache is uncontrollable and the objects in the cache may be deleted at any time, the performance problems caused by the execution of the first executor may also cause DBA headaches.

From this example, we can see that the query optimizer wants to select an efficient execution plan as much as possible, while the execution plan cache wants to reuse the cache as much as possible. These two mechanisms may conflict in some situations.

In the next article, we will continue to look at the conflicts between the execution plan cache and the query analyzer, as well as the common problems and solutions brought about by compiling the execution plan.

 

Process for taking execution cache into account

The previous article mentioned the process of parsing statements by the query optimizer. when taking into account the plan cache, you must first check whether the plan cache contains the statement cache. If not, will execute the compilation process, if there is, directly use the compiled execution plan. Therefore, the complete process 1 is shown.

Figure 1. process of taking plan cache into account

 

As shown in figure 1, we can see that one step is to find the plan process in the cache. Therefore, it is difficult to guess that as long as this type of search is used, the Hash data structure cannot be run. You can find information about the Hash table through the DMV sys. dm_ OS _memory_cache_hash_tables, as shown in figure 2. It is worth noting that when there are too many execution plans that lead to too many objects in the Same Bucket after the hash, additional buckets are required, which may lead to low cache efficiency of the search plan. The solution is to minimize the number of plans in the Plan cache, which will be discussed later in this article.

Figure 2. HashTable information about the storage plan Cache

 

When this problem occurs, we can see the problem in the buckets_avg_scan_miss_length column. In this case, the Cache Hit rate (SQL Server: Plan Cache-Cache Hit Ratio) is relatively high, but the compilation time is too long.

 

Parameterization and non-parameterization

The only identifier of a query plan is the query statement, but assuming that the statement body is the same, but only the query condition predicates are different, is one execution plan or two execution plans in the execution plan? It's Depends.

Assume the following two statements, as shown in 3.

Figure 3. Only two statements with different predicates

 

Although the execution plan is the same, two execution plans are retained in the execution plan cache, as shown in figure 4.

Figure 4. cache of two different execution plans for the same statement and different conditions

 

We know that the execution plan cache depends on the query statement itself to identify the cache. Therefore, the preceding two statements are considered as two different statements in the execution plan cache. The solution to this problem is to make the query statements in the execution plan cache identical.

 

Parameterization

So that only some parameters are different, and the same query statement can be reused, It is the significance of parameterization. For example, if the statements in Figure 3 enable forced database parameterization or use stored procedures. SQL Server enforces parameters for these statements. For example, we modified the database-level options according to Figure 5.

Figure 5. Database-level options

 

Now let's execute the two statements in Figure 3. by querying the execution plan cache, we find that the variable is parameterized, so that the statements in the Plan cache become consistent, as shown in figure 6, so that they can be reused.

Figure 6. query statement after Parameter

 

However, mandatory parameters may cause some problems. The query optimizer often cannot optimize some specific queries based on statistical information. For example, it cannot apply some indexes or search during the scan. The negative effects have been mentioned in the previous article. I will not elaborate on them here.

Therefore, there are several solutions to the above problems.

 

Balanced parameterization and non-parameterization

Under specific circumstances, parameterization is sometimes good, but sometimes it is the culprit of performance problems. Let's look at several ways to balance the relationship between the two.

 

Use RECOMPILE

When the cost of inaccurate execution plans is higher than that of compilation during queries, if you use the RECOMPILE option in the stored procedure or use the RECOMPILE prompt in the ad hoc query, the execution plan will be regenerated for each query, this parameter prevents the generated execution plan from being inserted into the execution plan cache. For OLAP queries, inaccurate execution plans often cost much more than compilation costs. Therefore, you can consider this parameter or option, you can use Hint as shown in the query in code list 1.

SELECT * FROM Sales.Customer
WHERE CustomerID>20000 AND TerritoryID = 4
OPTION (recompile)

Code List 1. Use Recompile

 

In addition to manual prompts for SQL Server re-compilation, SQL Server will automatically re-compile under the following conditions:

  • Metadata change, for example, indicating that the name is changed, the column is deleted, and the data type is changed.
  • Statistics changes.
  • When the SET parameter of the connection changes, the SET ANSI_NULLS and other values are different, the cache execution plan cannot be reused and re-compiled. This is why we can see that the statements in the cache execution plan are identical, but they are not reused. We also need to have the same related parameters. These parameters can be viewed through sys. dm_exec_plan_attributes.

 

Use the Optimize For Parameter

The RECOMPILE method allows you to skip the cache schedule. However, sometimes the execution plan of the characteristic predicates is used more times. For example, if only those predicates generate a large number of returned result sets For parameter compilation, we can consider the Optimize For parameter. For example, let's look at code list 2.

DECLARE @vari INT
SET @vari=4
SELECT * FROM Sales.Customer
WHERE CustomerID>20000 AND TerritoryID = @vari
OPTION (OPTIMIZE FOR (@vari=4))

Code List 2. Use the optimize for prompt

 

When this parameter is used, the cache execution plan is generated and cached according to the predicates following optimize for. This may also cause inefficient queries that are not included in this parameter, however, this parameter is selected, so we usually know which predicates will be used more.

 

In addition, an optimize for unknown parameter is added since SQL Server 2008, which allows you to detect the value of a local parameter as a predicate condition during the optimization Query Process, instead of testing statistics based on the initial values of local variables.

 

Use local variables instead of stored procedure parameters in Stored Procedures

If you do not use process parameters in a stored procedure, using local variables is equivalent to directly disabling parameter sniffing. After all, the value of a local variable can only be known at runtime, and cannot be known when the execution plan is compiled by the query optimizer, therefore, the query analyzer is forced to use the average value of the condition column for estimation.

Although this method makes the parameter estimation very inaccurate, it will become very stable. After all, the statistical information will not change too frequently. This method is not recommended. If possible, use the Optimizer method as much as possible.

Code listing 3 shows this method.

CREATE PROC TestForLocalVari
@v INT
AS
 
DECLARE @vari INT
SET @vari=@v
SELECT * FROM Sales.Customer
WHERE CustomerID>20000 AND TerritoryID = @vari

Code List 3. Directly reference local variables instead of stored procedure parameters

 

Forced parameterization

We have mentioned forced parameterization before this article, so we will not mention it here.

 

Usage plan guide

In some cases, our environment does not allow us to directly modify SQL statements. For example, we do not want to break the logic of the Code or the application is developed by a third party, therefore, it is unrealistic to add HINT or parameters. In this case, we can use the plan guidance.

Plan guidance allows SQL Server to add a prompt or option to the query statement when the client application is thrown to SQL Server. For example, you can see an example of plan guidance in code list 4.

EXEC sp_create_plan_guide N'MyPlanGuide1',  
 
@stmt=N'SELECT * FROM Sales.Customer WHERE CustomerID>20000 AND TerritoryID=@vari', 
@type=N'sql',  
@module_or_batch=NULL, 
@params=N'@vari int',
@hints=N'OPTION (RECOMPILE)'

Code list 4. Guidance on the preceding query Setting Plan

 

When the plan guidance is added, when the batch processing arrives at SQL Server, the Plan cache will also find whether the plan guidance matches the plan guidance. If yes, a prompt or option appears in the application plan guide. Note that the @ stmt parameter must be exactly the same as the one in the query statement. A space difference is considered mismatched.

 

PARAMETERIZATION SIMPLE

When forced PARAMETERIZATION is enabled at the database level, we do not want to enable forced PARAMETERIZATION for specific statements. We can use the parameterization simple option, as shown in listing 5.

DECLARE @stmt NVARCHAR(MAX)
DECLARE @params NVARCHAR(MAX)
EXEC sp_get_query_template N'SELECT * FROM Sales.Customer WHERE CustomerID>20000 AND TerritoryID=2',
@stmt OUTPUT, @params OUTPUT
PRINT @stmt
PRINT @params
EXEC sp_create_plan_guide N'MyTemplatePlanGuide', @stmt, N'TEMPLATE', NULL,
    @params, N'OPTION(PARAMETERIZATION SIMPLE)'

Code List 5. apply simple parameterization to a single statement through the scheduler Guide

 

Summary

The execution plan cache should reuse the execution plan as much as possible, which will reduce the CPU consumption for compilation and the memory consumed by the execution cache. The query optimizer tries its best to generate more precise execution plans, which will inevitably lead to a large number of execution plans. This may not only cause heavy CPU consumption for recompilation, but also cause memory pressure, even if the execution plan is cached too much than the BUCKET limit, it takes more time to match the execution plan in the cache.

Therefore, it is very important to use the methods described in this article to balance the relationship between the two based on actual conditions.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.