This article explains how to handle statistics with a self-growing key column in the form of a text code. As we all know, every statistic object in SQL Server has an associated histogram. Histograms use multiple step lengths to describe the distribution of column data. In a histogram, SQL Server supports a maximum of 200 steps, but when you query the range of data in the histogram last step, this is a problem. Let's look at the following code to reproduce the situation:
--Create a simple Orders table
CREATE table orders
(
OrderDate DATE is not NULL, Col2 INT is not
null,
Col3 INT not NULL
]
go
--Create a Non-unique Clustered index to the table
create Clustered index idx_ci on Or DERs (OrderDate) go to
insert 31465 rows from the ADVENTUREWORKS2008R2 database
insert INTO Orders ( OrderDate, Col2, Col3) SELECT OrderDate, CustomerID, TerritoryID from AdventureWorks2008R2.Sales.SalesOrderHeader
Go
-Rebuild the Clustered Index, so we get fresh statistics.
--The last value in the histogram is 2008-07-31.
ALTER INDEX idx_ci on Orders REBUILD
go
--Insert additional rows *after* The last step in the Histogram
insert into Orders (OrderDate, Col2, Col3)
VALUES (' 20100101 ', 1, 1) Go
200
After the index was rebuilt, we looked at the histogram and we found that the last step was 2008-07-31.
Copy Code code as follows:
DBCC show_statistics (' dbo. Orders ', ' idx_ci ') with histogram
As you can see, we inserted 200 additional records after the last step into the table. In this case, the histogram does not actually feed the actual data distribution, but SQL Server still has to do cardinality calculations. Let's look at how SQL Server handles this issue in different versions.
Copy Code code as follows:
SQL Server SP1-SQL Server 2012
Before SQL Server 2014, the cardinality calculation dealt with this issue very simply: SQL Server estimated the number of rows to be 1, as you can see from the picture below.
Click on the toolbar display to include the actual execution plan and execute the following query:
Copy Code code as follows:
SELECT * FROM dbo. Orders WHERE orderdate= ' 2010-01-01 '
Since SQL Server SP1, the query optimizer can mark 1 as a self-growth (ascending) to overcome the limitations just described. If you update the Statistics object 3 times with a self-growing column value, the column will be marked as a self-growing column. To see if there are any columns marked for self growth, you can use the trace tag 2388. When you enable this trace flag, the output of the DBCC show_statistics is changed and there are additional columns to return.
Copy Code code as follows:
DBCC Traceon (2388)
DBCC show_statistics (' dbo. Orders ', ' idx_ci ')
Now the following code updates the statistics 3 times, inserting rows at the end of our clustered index each time with the Growth key column value.
--=> 1st update the Statistics on the table with a fullscan
update Statistics Orders with fullscan go
-I Nsert Additional rows *after* The last step in the histogram
inserts into Orders (OrderDate, Col2, Col3)
VALUES (' 20100201 ', 1, 1)
Go
--=> 2nd update the Statistics on the table with a fullscan
update Statistics Orders with Fullscan
Go
--insert additional rows *after*
the ' last step ' histogram insert into Orders (OrderDate, Col2, Col 3
VALUES (' 20100301 ', 1, 1) Go
--=> 3rd update the Statistics on the table with a fullscan up
DATE STATISTICS Orders with Fullscan go
Then, when we execute the DBCC SHOW_STATISTICS command, you will see that the column is labeled Ascending by SQL Server.
Copy Code code as follows:
DBCC Traceon (2388)
DBCC show_statistics (' dbo. Orders ', ' idx_ci ')
Now when you execute the query again that is not a histogram range of data, there is no change. To use the labeled Self-growth key column, you enable another trace tag-2389. If you enable this tracking tag, the query optimizer is the density vector (density vector) for cardinality calculations.
--Now we query the newly inserted range which be currently not present in the histogram.
--with Trace Flag 2389, the Query Optimizer uses the density Vector to make the cardinality estimation.
SELECT * from to Orders
WHERE OrderDate = ' 20100401 '
OPTION (RECOMPILE, Querytraceon 2389)
go
Look at the density of the table now:
Copy Code code as follows:
DBCC Traceoff (2388)
DBCC show_statistics (' dbo. Orders ', ' idx_ci ')
The table density is now 0.0008873115, so the estimated number of rows for the query optimizer is 28.4516:0.0008873115* (32265-200).
This is not the best result, but it's a lot better than the estimated line of 1!
(There is a problem, I am a local SQL Server 2008r2, the test estimated line number or 1, I do not know why, hope to know that the friend explained, thank you!) )
SQL Server 2014
A new feature introduced in SQL Server 2014 is the new cardinality calculation. The new cardinality calculation is very simple for the self-growth key problem: By default, no trace tag is used to calculate cardinality using the density vectors of the statistical information objects. The following query enables the base calculation of the 2312 trace flag to run the same query.
1--With the new cardinality estimator SQL Server estimates 28.4516 rows in the Clustered Index seek operator.
2 SELECT * from Orders
3 WHERE OrderDate = ' 20100401 '
4 OPTION (RECOMPILE, Querytraceon 2312)
5 Go
Let's look at the cardinality calculation here, and you'll see that the query optimizer once again estimates that the number of rows is 28.4516, but this time it's not growing on the table. This is a self-contained feature of SQL Server 2014.
(The SQL Server 2014 test failed and the estimated number of rows is 1 ...). )
In this article, I show you how SQL Server's query optimizer handles the issue of self-growth keys. Before SQL Server 2014, you would need to enable a 2389 trace flag to get a better cardinality calculation-so that the column would be labeled as ascending. SQL Server 2014, the query optimizer uses a density vector for cardinality calculations by default, which is much more convenient. I hope you've learned a lot about this, and you'll have a better idea of how to handle the self-growing key column problem in SQL Server.
I hope to enlighten you, thank you.