The processing method of the statistic information of the self-growth key column

The processing method of the statistic information of the self-growth key column _mssql

Last Update:2017-01-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article explains how to handle statistics with a self-growing key column in the form of a text code. As we all know, every statistic object in SQL Server has an associated histogram. Histograms use multiple step lengths to describe the distribution of column data. In a histogram, SQL Server supports a maximum of 200 steps, but when you query the range of data in the histogram last step, this is a problem. Let's look at the following code to reproduce the situation:

 --Create a simple Orders table
 CREATE table orders
 (
  OrderDate DATE is not NULL, Col2 INT is not
  null,
  Col3 INT not NULL
 ]
 go
 
 --Create a Non-unique Clustered index to the table
 create Clustered index idx_ci on Or DERs (OrderDate) go to
 
 insert 31465 rows from the ADVENTUREWORKS2008R2 database
 insert INTO Orders ( OrderDate, Col2, Col3) SELECT OrderDate, CustomerID, TerritoryID from AdventureWorks2008R2.Sales.SalesOrderHeader
 Go
 
 -Rebuild the Clustered Index, so we get fresh statistics.
 --The last value in the histogram is 2008-07-31.
 ALTER INDEX idx_ci on Orders REBUILD
 go
 
 --Insert additional rows *after* The last step in the Histogram
   insert into Orders (OrderDate, Col2, Col3)
 VALUES (' 20100101 ', 1, 1) Go
 200

After the index was rebuilt, we looked at the histogram and we found that the last step was 2008-07-31.

Copy Code code as follows:

DBCC show_statistics (' dbo. Orders ', ' idx_ci ') with histogram

As you can see, we inserted 200 additional records after the last step into the table. In this case, the histogram does not actually feed the actual data distribution, but SQL Server still has to do cardinality calculations. Let's look at how SQL Server handles this issue in different versions.

Copy Code code as follows:

SQL Server SP1-SQL Server 2012

Before SQL Server 2014, the cardinality calculation dealt with this issue very simply: SQL Server estimated the number of rows to be 1, as you can see from the picture below.

Click on the toolbar display to include the actual execution plan and execute the following query:

Copy Code code as follows:

SELECT * FROM dbo. Orders WHERE orderdate= ' 2010-01-01 '

Since SQL Server SP1, the query optimizer can mark 1 as a self-growth (ascending) to overcome the limitations just described. If you update the Statistics object 3 times with a self-growing column value, the column will be marked as a self-growing column. To see if there are any columns marked for self growth, you can use the trace tag 2388. When you enable this trace flag, the output of the DBCC show_statistics is changed and there are additional columns to return.

Copy Code code as follows:

DBCC Traceon (2388)
DBCC show_statistics (' dbo. Orders ', ' idx_ci ')

Now the following code updates the statistics 3 times, inserting rows at the end of our clustered index each time with the Growth key column value.

 --=> 1st update the Statistics on the table with a fullscan
  update Statistics Orders with fullscan go
  
  -I  Nsert Additional rows *after* The last step in the histogram
  inserts into Orders (OrderDate, Col2, Col3)
 VALUES (' 20100201 ', 1, 1)
  Go
  
 --=> 2nd update the Statistics on the table with a fullscan
 update Statistics Orders with Fullscan
 Go
 
 --insert additional rows *after*
 the ' last step ' histogram insert into Orders (OrderDate, Col2, Col 3
 VALUES (' 20100301 ', 1, 1) Go
 
 --=> 3rd update the Statistics on the table with a fullscan up
 DATE STATISTICS Orders with Fullscan go

Then, when we execute the DBCC SHOW_STATISTICS command, you will see that the column is labeled Ascending by SQL Server.

Copy Code code as follows:

DBCC Traceon (2388)
DBCC show_statistics (' dbo. Orders ', ' idx_ci ')

Now when you execute the query again that is not a histogram range of data, there is no change. To use the labeled Self-growth key column, you enable another trace tag-2389. If you enable this tracking tag, the query optimizer is the density vector (density vector) for cardinality calculations.

--Now we query the newly inserted range which be currently not present in the histogram.
--with Trace Flag 2389, the Query Optimizer uses the density Vector to make the cardinality estimation.
SELECT * from to Orders
WHERE OrderDate = ' 20100401 '
OPTION (RECOMPILE, Querytraceon 2389)
go

Look at the density of the table now:

Copy Code code as follows:

DBCC Traceoff (2388)
DBCC show_statistics (' dbo. Orders ', ' idx_ci ')

The table density is now 0.0008873115, so the estimated number of rows for the query optimizer is 28.4516:0.0008873115* (32265-200).

This is not the best result, but it's a lot better than the estimated line of 1!

(There is a problem, I am a local SQL Server 2008r2, the test estimated line number or 1, I do not know why, hope to know that the friend explained, thank you!) )

SQL Server 2014
A new feature introduced in SQL Server 2014 is the new cardinality calculation. The new cardinality calculation is very simple for the self-growth key problem: By default, no trace tag is used to calculate cardinality using the density vectors of the statistical information objects. The following query enables the base calculation of the 2312 trace flag to run the same query.

1--With the new cardinality estimator SQL Server estimates 28.4516 rows in the Clustered Index seek operator.
2 SELECT * from Orders
3 WHERE OrderDate = ' 20100401 '
4 OPTION (RECOMPILE, Querytraceon 2312)
5 Go

Let's look at the cardinality calculation here, and you'll see that the query optimizer once again estimates that the number of rows is 28.4516, but this time it's not growing on the table. This is a self-contained feature of SQL Server 2014.

(The SQL Server 2014 test failed and the estimated number of rows is 1 ...). ）

In this article, I show you how SQL Server's query optimizer handles the issue of self-growth keys. Before SQL Server 2014, you would need to enable a 2389 trace flag to get a better cardinality calculation-so that the column would be labeled as ascending. SQL Server 2014, the query optimizer uses a density vector for cardinality calculations by default, which is much more convenient. I hope you've learned a lot about this, and you'll have a better idea of how to handle the self-growing key column problem in SQL Server.

I hope to enlighten you, thank you.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The processing method of the statistic information of the self-growth key column _mssql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The processing method of the statistic information of the self-growth key column _mssql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support