Adjust the SQL Server non-optimal execution plan

Source: Internet
Author: User

The optimizer creates an execution plan for the SQL statement that we issue without the corresponding cache. However, the cost-based optimization process of the optimizer does not consider all the situations in the face of complicated SQL statements. Therefore, an almost efficient execution plan is provided in some cases. At the same time, the optimization process may be less thorough due to the impact of the production environment load. Therefore, we should control the complexity of the statement to reduce the possibility of the optimizer considering various combinations.

When the system performance is faulty, even if your index is created perfectly, sometimes the selection is poor, you also need to consider how to avoid scanning the table when the selection is low. To prevent all queries from waiting, just like a bus, like a highway jam. Although sql2005 has the include function, it breaks the limit of 16 fields and 900 bytes when creating a non-clustered index. However, the cost of containing too many inclue fields is a waste of disk space. Of course, we may not care about the disk space overhead. After all, the customer pays for it. In the face of such a large index data volume, sql2005 also added a backup policy. File or file group. However, the backup of multiple files or file groups is based on multiple backup baselines, which brings some challenges to management. Therefore, we should first consider using existing indexes to optimize queries. There is no way to consider creating an index or adjusting an index field. There is no best one. Only the pursuit of a more suitable index should be minimized to create too many indexes. This will cause a burden on data modification.

When optimizing statements, we must first clarify the purpose of optimization. When appropriate indexes are available, they are how to effectively use them to strike a balance between CPU, memory, and I/O. If your memory has been very tight, we will try to avoid the use of operators that occupy too much memory. Each operator is very efficient in a particular scenario, and nothing remains the same. Only by trying more can we find an optimal balance point.

The following uses an example to demonstrate how to adjust an SQL Server-generated execution plan that is not very efficient. The cost before and after adjustment is, which can improve the system's concurrent operations.

 

Code
Select P. Name, P. Color, PSC. Name as subcategoryname, PC. Name as categoryname,
S. salesorderid, S. orderdate, OD. linetotal
From production. product P
Join production. productsubcategory PSC
On P. productsubcategoryid = psc. productsubcategoryid
Join production. productcategory PC
On PSC. productcategoryid = pc. productcategoryid
Join sales. salesorderdetail OD
On OD. productid = P. productid
Join sales. salesorderheader s
On OD. salesorderid = S. salesorderid
Where S. salespersonid = 275 and PSC. Name = n' road biks'

This query is used to find the sales status of a sub-product of a salesperson and the related product information. We know that only a portion of all the products sold by this salesperson will belong to a certain category of products. Therefore, the final query result is the intersection of the two. As the intersection of the two changes, the overall query logic provided by sqlserver has not changed much, so we should interfere.

 

The following is the execution plan generated by sqlserver for us:

Some graph plans are shown as follows:

We can see that the sales. salesorderheader table is scanned once in the plan, and this table is a fast-growing table. Therefore, scanning such a table is a very time-consuming query. The scan is because there is orderdate in the query, and this field has no index. Therefore, this information can only be obtained at the leaf level of the clustered index, that is, the real data page. At the same time, sales. salesorderdetail corresponds to many order details, which is also a fast-growing table. The clustered index search here is because you want to query linetotal. This is a calculated field, and no index is created above. To calculate this value, it will consume a lot of CPU resources.

We know that the table for querying products and orders is sales. salesorderdetail. If we can use only two query conditions to obtain the intersection of the two conditions at the index level, in the end, querying data that only exists on the data page will reduce a lot of resource waste. The adjusted query process is as follows:

  1. Table variables are used to save the table variables in the intersection of products sold by sales personnel and such products. Table variables can be used to prevent re-compilation during the process. Http://www.cnblogs.com/tom-fu/archive/2008/03/09/1096993.html

    Code
    Declare @ udt_sales table
    (
    Salesorderid int not null,
    Salesorderdetailid int not null
    )

  2. Table variables used to save information about a product

    Code
    Declare @ udt_products table
    (
    Productid int not null,
    [Name] [name] not null,
    Color nvarchar (15) null,
    Subcategoryname [name] not null,
    Categoryname [name] not null
    )

  3. Because the sales. salesorderheader has a non-clustered index in the salespersonid field, 275 of orders can be queried directly in this index. At the same time, we can see that a non-clustered index is created in the productid field of the sales. salesorderdetail table, while the salesorderid and salesorderdetailid are the key value fields of the index as the clustered index. Therefore, only the salesorderid and salesorderdetailid can be queried in this non-clustered index to reduce I/O operations. Code shown in the execution process
    Insert into @ udt_sales
    Select OD. salesorderid, OD. salesorderdetailid
    From sales. salesorderheader s
    Join sales. salesorderdetail OD
    On OD. salesorderid = S. salesorderid and S. salespersonid = 275
    Join (production. productsubcategory PSC
    Join production. product P
    On P. productsubcategoryid = psc. productsubcategoryid and PSC. Name = n'road bik ')
    On P. productid = OD. productid

  1. Store product-related information in table variables to avoid multiple queries to these tables in the join.

    Code
    Insert into @ udt_products
    Select P. productid, P. Name, P. Color, PSC. Name, PC. Name
    From production. product P
    Join production. productsubcategory PSC
    On P. productsubcategoryid = psc. productsubcategoryid
    Join production. productcategory PC
    On PSC. productcategoryid = pc. productcategoryid
    Where psc. Name = n' road bik'

  2. Finally, the intersection part and order and detail table are connected to query S. orderdate, OD. linetotal. In this case, a small intersection is obtained to query the table. Therefore, the table scan of sales. salesorderheader is avoided. Code
    Select up. Name, up. Color, up. subcategoryname, up. categoryname,
    S. salesorderid, S. orderdate, OD. linetotal
    From @ udt_sales us
    Inner join sales. salesorderheader s
    On us. salesorderid = S. salesorderid
    Join sales. salesorderdetail OD
    On us. salesorderid = OD. salesorderid and us. salesorderdetailid = OD. salesorderdetailid
    Join @ udt_products up
    On OD. productid = up. productid

The cost is high, which does not necessarily mean the execution time is slow. If you execute a Statement on a machine, you cannot determine the resource usage of the machine when the statement is executed, if you want to speed up, you can find a way to change it to a parallel method, which will reduce the concurrency of the system. Of course, if the concurrency is not affected, sqlserver will also choose the concurrent mode. Run the statements before and after optimization separately. You will get different execution times. In the end, it depends on the operations of I/O and the operators contained in it. At the same time, if your query can consume less resources, it can improve the concurrency of the system. In general, your system performance will be improved.

Of course, there may be more efficient query methods if you combine some prompts, or you can adjust the execution logic. At the same time, my example is only when the selection degree of the query condition is not high. If the selection degree of the query condition is very high, the whole process of SQL Server execution will not be the same as the current one. Insert into itself is also a big energy-consuming user. It is not appropriate if there is too much data. You can try it on your own. The overhead of the car loss measure will change with different query conditions. We only have to make the worst plan to avoid this performance overhead caused by changes in selection. The basic principle is to avoid scanning large tables that are growing rapidly, and break down complicated queries to reduce the possibility of various combinations when optimizer optimization occurs. Because it does not know what the logic of your query is. The final execution result is as follows.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.