Processing Methods for statistics of auto-increment key columns and statistical methods

Source: Internet
Author: User
Tags sql server query

Processing Methods for statistics of auto-increment key columns and statistical methods

This article explains how to handle the statistics of auto-incrementing key columns in the form of text code. We all know that each statistical object in SQL Server has an associated histogram. The histogram uses multiple steps to describe the distribution of specified column data. In a histogram, SQL Server supports a maximum step size of 200, but this is a problem when the data range you query is after the last step of the histogram. Let's look at the following code to reproduce this situation:

 -- Create a simple orders table CREATE TABLE Orders (  OrderDate DATE NOT NULL,  Col2 INT NOT NULL,  Col3 INT NOT NULL ) GO  -- Create a Non-Unique Clustered Index on the table CREATE CLUSTERED INDEX idx_CI ON Orders(OrderDate) GO  -- Insert 31465 rows from the AdventureWorks2008r2 database INSERT INTO Orders (OrderDate, Col2, Col3) SELECT OrderDate, CustomerID, TerritoryID FROM AdventureWorks2008R2.Sales.SalesOrderHeader GO  -- Rebuild the Clustered Index, so that we get fresh statistics. -- The last value in the Histogram is 2008-07-31. ALTER INDEX idx_CI ON Orders REBUILD GO  -- Insert 200 additional rows *after* the last step in the Histogram INSERT INTO Orders (OrderDate, Col2, Col3) VALUES ('20100101', 1, 1) GO 200

After the index is re-built, let's look at the histogram again. We find that the last step is 2008-07-31.

Copy codeThe Code is as follows:
DBCC SHOW_STATISTICS ('dbo. Orders ', 'idx _ cies') WITH HISTOGRAM

As you can see, we inserted 200 additional records after the last step into the table. In this case, the histogram does not actually reflect the actual data distribution, but the SQL Server still needs to perform base calculation. Let's take a look at how SQL Server handles this issue in different versions.

Copy codeThe Code is as follows:
SQL Server 2005 SP1-SQL Server 2012

Before SQL Server 2014, the base computing process is very simple: SQL Server estimates that the number of rows is 1. You can see the following picture.

Click the toolbar to display the actual execution plan and execute the following query:

Copy codeThe Code is as follows:
SELECT * FROM dbo. Orders WHERE OrderDate = '2017-01-01'

Since SQL Server 2005 SP1, the query optimizer can mark column 1 as Ascending to overcome the limitations just introduced. If you use the auto-increment column value to update the statistics object three times, the column will be marked as the auto-increment column. To see if any column is marked as auto-incrementing, you can use the trace tag 2388. When this trace flag is enabled, the output of DBCC SHOW_STATISTICS is changed and an additional column is returned.

Copy codeThe Code is as follows:
Dbcc traceon (2388)
DBCC SHOW_STATISTICS ('dbo. Orders ', 'idx _ cies ')

The following code updates the statistics three times. each time you use the auto-increment key column value to insert rows at the end of the clustered index.

 -- => 1st update the Statistics on the table with a FULLSCAN  UPDATE STATISTICS Orders WITH FULLSCAN  GO    -- Insert 200 additional rows *after* the last step in the Histogram  INSERT INTO Orders (OrderDate, Col2, Col3) VALUES ('20100201', 1, 1)  GO 200   -- => 2nd update the Statistics on the table with a FULLSCAN UPDATE STATISTICS Orders WITH FULLSCAN GO  -- Insert 200 additional rows *after* the last step in the Histogram INSERT INTO Orders (OrderDate, Col2, Col3) VALUES ('20100301', 1, 1) GO 200  -- => 3rd update the Statistics on the table with a FULLSCAN UPDATE STATISTICS Orders WITH FULLSCAN GO

Then, when we run the DBCC SHOW_STATISTICS command, you will see that SQL Server has marked the column as Ascending.

Copy codeThe Code is as follows:
Dbcc traceon (2388)
DBCC SHOW_STATISTICS ('dbo. Orders ', 'idx _ cies ')

Now, when you re-query the data that is not in the histogram range, there is no change. To use the flag as the auto-increment key column, you must enable another trace flag-2389. If you enable this trace tag, the query optimizer uses the Density Vector for base computing.

-- Now we query the newly inserted range which is currently not present in the Histogram.-- With Trace Flag 2389, the Query Optimizer uses the Density Vector to make the Cardinality Estimation.SELECT * FROM OrdersWHERE OrderDate = '20100401'OPTION (RECOMPILE, QUERYTRACEON 2389)GO

Let's take a look at the current table density:

Copy codeThe Code is as follows:
Dbcc traceoff (2388)
DBCC SHOW_STATISTICS ('dbo. Orders ', 'idx _ cies ')

The current table density is 0.0008873115, so the estimated number of rows in the query optimizer is 28.4516: 0.0008873115*(32265-200 ).

This is not the best result, but it is much better than the estimated number of rows 1!

(Here is a problem. I am using SQL Server 2008r2 to test whether the estimated number of lines is 1. I don't know why. Thank you for your explanation !)

SQL Server 2014
A new feature introduced in SQL Server 2014 is the new base computing. The new base number calculation process is very simple for the auto-growth key problem: by default, no trace tag is used to calculate the base using the density vector of the statistics object. The following query enables base calculation of 2312 trace flag to run the same query.

1 -- With the new Cardinality Estimator SQL Server estimates 28.4516 rows at the Clustered Index Seek operator.2 SELECT * FROM Orders3 WHERE OrderDate = '20100401'4 OPTION (RECOMPILE, QUERYTRACEON 2312)5 GO


Let's look at the base calculation here. You will see that the query optimizer estimated the number of rows to be 28.4516 again, but this time there was no table growth. This is the built-in function of SQL Server 2014.

(The SQL Server 2014 test failed. The estimated number of rows is also 1 ......)


In this article, I show you how the SQL Server Query Optimizer handles auto-increment keys. Before SQL Server 2014, You need to enable the 2389 trace tag for better base computing-in this case, the column will be marked as ascending ). In SQL Server 2014, the query optimizer uses the density vector for base calculation by default, which is much easier. I hope you have some gains in this regard. You will have a better idea about how to handle the auto-growth key column problem in SQL Server.

Thank you for your inspiration.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.