Optimize SQL query performance cases by manually creating statistical information

Last Update:2017-08-02 Source: Internet

Author: User

Tags rand

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The essential reason is that SQL Server statistics contains only the information for the first column of the composite index, not the composite index data combination

From a practical problem in the work,

This is where the combination column data is uneven causing the query to not estimate the number of rows of data, resulting in poor performance when a reasonable execution plan cannot be selected

I'm simplifying the problem here, mostly to illustrate the problem.

As the next business table, the main two "status" Fields, BusinessStatus1 and BusinessStatus2CreateTableBusinesstable (IdIntIdentity1,1), Col2varchar50), Col3varchar50), Col4varchar50), BusinessStatus1tinyint, BusinessStatus2tinyint, CreateDateDatetime)GO--To write data to the test table:BeginTranDeclare@iIntSet@i=0While@i<500000BeginInsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),1,10,GETDATE ()-RAND ()*1000)InsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),1,20,GETDATE ()-RAND ()*1000)InsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),1,30,GETDATE ()-RAND ()*1000)InsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),2,20,GETDATE ()-RAND ()*1000)InsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),2,30,GETDATE ()-RAND ()*1000)InsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),2,40,GETDATE ()-RAND ()*1000)InsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),3,30,GETDATE ()-RAND ()*1000)InsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),3,40,GETDATE ()-RAND ()*1000)InsertInto businesstableVALUES (NEWID (),NEWID (),NEWID (),3,50,GETDATE ()-RAND ()*1000)Set@i=@i+ 1 endcommit-- insert a special data, which is the actual business scenario: insert into businesstable values (newid (), newid (), newid (), 3,10,getdate () -rand () *1000

--The test data is characterized by:--Distribution of BusinessStatus1: three-to-one,--businessstatus2 distribution bit: 10,20,30,40,50------businessstatus1 and BusinessStatus2 The combination of: Businessstatus1=3 and businessstatus2=10, is the only combination of the 451W data -- Create the following index: create clustered index idx_createdate on  Businesstable (createdate) create index idx_status on businesstable (BUSINESSSTATUS1,BUSINESSSTATUS2)

The following query is to query the so-called special data

123	`select*` `from` `BusinessTable` `where` `BusinessStatus1=3` `andBusinessStatus2=10`

Find the execution plan as follows: The whole table scan, the IO cost is not small,

In this case, obviously there is only one data, but to go full-table scan

(There is not only one such coincidence in the actual business, but in the Tens table, there is little data that meets similar criteria,

Make an analogy to understand, like the order form, the order is unsubscribe status, and has not been refunded, this data distribution is very little

Just for example, don't be serious)

The IO information queried above

Again by forcing the index hint to find the same query, IO has a very large drop

Why doesn't the above SQL go through the index? Because after all, the criteria of the data only one, go full table scan cost is too big, especially in the actual situation, the business table is bigger, the logic is not so straightforward

This also starts with index statistics, where index statistics are only statistical leading columns, and for the distribution of the combined columns, SQL Server cannot be estimated, which can be found through the execution plan of the first query

SQL Server is only able to estimate the distribution of data in the case of BusinessStatus1 =3, but it is not possible to estimate the distribution of data in this combination of businessstatus1=3 and businessstatus2=10

Of course, statistics can also be seen, statistical information only records the BUSINESSSTATUS1 column of the data distribution, but the actual implementation of the process, it is not possible to predict the exact distribution of businessstatus1=3 and businessstatus2=10

Find the cause of the problem, it is easy to solve, since SQL Server can not be estimated to businessstatus1=3 and businessstatus2=10 This combination of data distribution,

Then create a filtering statistic that allows SQL Server to accurately understand the distribution of the data under this condition, making it easier to make a relatively accurate execution plan.

Create a statistic for the condition by using the following statement

Statistics on businesstable (businessstatus1,businessstatus2)where BusinessStatus1=and BusinessStatus2=Ten-- after the statistics are created, be careful to make an update with fullscan

After the statistics are created, a statistic that you just created is added to the table.

Now look at the execution plan for this query, and find that it goes through the index as expected

While observing the IO situation, there is also a significant drop

Summarize:

The above by manually creating statistics, to promote SQL Server in the generation of execution plan, accurate knowledge of the distribution of data, to make a more optimized execution plan, in some special cases, can be considered as a direction of optimization

Postscript:

Perhaps some people think that this problem should be attributed to the parameter sniff problem, in fact, this problem and parameter sniff is not quite the same (of course, there is a bit like)

Generally speaking, the parameter sniff problem is that single-row data distribution is uneven, because the execution plan reuse leads to a phenomenon of underground performance, focusing on the unreasonable reuse of the execution plan

The problem here is that, because of the way statistics are calculated, SQL Server simply cannot predict the exact distribution of eligible data to make a reasonable execution plan.

Of course, this is also a special case, in addition to forcing the index hint, you can manually create statistics to achieve the purpose of optimization

Optimize SQL query performance cases by manually creating statistical information

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More