Optimize SQL query performance cases by manually creating statistical information

Source: Internet
Author: User
Tags getdate

From a practical problem in the work,

This is where the combination column data is uneven causing the query to not estimate the number of rows of data, resulting in poor performance when a reasonable execution plan cannot be selected

I'm simplifying the problem here, mostly to illustrate the problem.

The following business table, mainly looking at two "status" Fields, BusinessStatus1 and BusinessStatus2Create Tablebusinesstable (Idint Identity(1,1), Col2varchar( -), Col3varchar( -), Col4varchar( -), BusinessStatus1tinyint, BusinessStatus2tinyint, CreateDateDatetime)GO--to write data to the test table:begin Tran    Declare @i int    Set @i=0     while @i<500000    begin        Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),1,Ten,GETDATE()-RAND()* +)        Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),1, -,GETDATE()-RAND()* +)        Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),1, -,GETDATE()-RAND()* +)                Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),2, -,GETDATE()-RAND()* +)        Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),2, -,GETDATE()-RAND()* +)        Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),2, +,GETDATE()-RAND()* +)        Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),3, -,GETDATE()-RAND()* +)        Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),3, +,GETDATE()-RAND()* +)        Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),3, -,GETDATE()-RAND()* +)        Set @i=@i+1    EndCommit--Insert a special piece of data, which is the actual business scenario:Insert  intoBusinesstableValues(NEWID(),NEWID(),NEWID(),3,Ten,GETDATE()-RAND()* +) 

--The test data is characterized by:--Distribution of BusinessStatus1: three-to -one,--BusinessStatus2 Distribution bit: 10,20,30,40,50--the correspondence between the current data,--note, however, that a particular piece of data is inserted:--the combination of BusinessStatus1 and BUSINESSSTATUS2 is: Businessstatus1=3 and businessstatus2=10, which is the only combination in 451W data--Create the following index:Create Clustered IndexIdx_createdate onbusinesstable (createdate)Create IndexIdx_status onBusinesstable (BUSINESSSTATUS1,BUSINESSSTATUS2)

The following query is to query the so-called special data

SELECT * from businesstable where businessstatus1=3 and businessstatus2=10

Find the execution plan as follows: The whole table scan, the IO cost is not small,

In this case, obviously there is only one data, but to go full-table scan

(There is not only one such coincidence in the actual business, but in the Tens table, there is little data that meets similar criteria,

Make an analogy to understand, like the order form, the order is unsubscribe status, and has not been refunded, this data distribution is very little

Just for example, don't be serious)

The IO information queried above

Again by forcing the index hint to find the same query, IO has a very large drop

Why doesn't the above SQL go through the index? Because after all, the criteria of the data only one, go full table scan cost is too big, especially in the actual situation, the business table is bigger, the logic is not so straightforward

This also starts with index statistics, where index statistics are only statistical leading columns, and for the distribution of the combined columns, SQL Server cannot be estimated, which can be found through the execution plan of the first query

SQL Server is only able to estimate the distribution of data in the case of BusinessStatus1 =3, but it is not possible to estimate the distribution of data in this combination of businessstatus1=3 and businessstatus2=10

Of course, statistics can also be seen, statistical information only records the BUSINESSSTATUS1 column of the data distribution, but the actual implementation of the process, it is not possible to predict the exact distribution of businessstatus1=3 and businessstatus2=10

Find the cause of the problem, it is easy to solve, since SQL Server can not be estimated to businessstatus1=3 and businessstatus2=10 This combination of data distribution,

Then create a filtering statistic that allows SQL Server to accurately understand the distribution of the data under this condition, making it easier to make a relatively accurate execution plan.

Create a statistic for the condition by using the following statement

Create Statistics   on businesstable (BUSINESSSTATUS1,BUSINESSSTATUS2)where BusinessStatus1=  3 and BusinessStatus2=ten-- after the statistics are created, be aware of the update  UPDATESTATISTICS with Fullscan

After the statistics are created, a statistic that you just created is added to the table.

Now look at the execution plan for this query, and find that it goes through the index as expected

While observing the IO situation, there is also a significant drop

Summarize:

The above by manually creating statistics, to promote SQL Server in the generation of execution plan, accurate knowledge of the distribution of data, to make a more optimized execution plan, in some special cases, can be considered as a direction of optimization

Postscript:

Perhaps some people think that this problem should be attributed to the parameter sniff problem, in fact, this problem and parameter sniff is not quite the same (of course, there is a bit like)

Generally speaking, the parameter sniff problem is that single-row data distribution is uneven, because the execution plan reuse leads to a phenomenon of underground performance, focusing on the unreasonable reuse of the execution plan

The problem here is that, because of the way statistics are calculated, SQL Server simply cannot predict the exact distribution of eligible data to make a reasonable execution plan.

Of course, this is also a special case, in addition to forcing the index hint, you can manually create statistics to achieve the purpose of optimization

Optimize SQL query performance cases by manually creating statistical information

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.