Source: Statistics using self-growing key column values
in today's article I would like to talk about the very common problem in SQL Server: How to handle statistics with self-growing key columns. As we all know, there is a histogram associated with each statistic object in SQL Server. The histogram specifies the distribution of the column data using multiple step descriptions. In a histogram, SQL Server supports up to 200 steps, but it is a problem when you query the data range in the last step of the histogram. Let's look at the following code to reproduce the situation:
1 --Create A simple Orders table2 CREATE TABLEOrders3 (4OrderDate DATE not NULL,5Col2INT not NULL,6Col3INT not NULL7 )8 GO9 Ten --Create a non-unique Clustered Index on the table One CREATE CLUSTERED INDEXIdx_ci onOrders (OrderDate) A GO - - --Insert 31465 rows from the ADVENTUREWORKS2008R2 database the INSERT intoOrders (OrderDate, Col2, Col3)SELECTOrderDate, CustomerID, TerritoryID fromAdventureWorks2008R2.Sales.SalesOrderHeader - GO - - --Rebuild the Clustered Index, so, we get fresh statistics. + --The last value of the histogram is 2008-07-31. - ALTER INDEXIdx_ci onOrders REBUILD + GO A at --Insert Additional rows *after* The last step in the histogram - INSERT intoOrders (OrderDate, Col2, Col3) - VALUES('20100101',1,1) - GO $
After the index is rebuilt, we look at the histogram and we find that the last stepping value is 2008-07-31.
1 DBCC Show_statistics ('dbo. Orders'idx_ci' with histogram
As you've seen, we've inserted 200 additional records after the last step into the table. In this case, the histogram has no real feedback on the actual data distribution, but SQL Server is still doing cardinality calculations. Let's look at how SQL Server handles this problem in different versions.
SQL Server 2005 Sp1-sql Server
Before SQL Server 2014, cardinality calculations handled this problem very simply: SQL Server estimates the number of rows is 1, which you can see from the image below.
Click the toolbar display to include the actual execution plan and execute the following query:
SELECT * from WHERE OrderDate='2010-01-01'
Since SQL Server 2005 SP1, the query optimizer can mark the 1 column as self-growth (ascending) to overcome the limitations just described. If you update the Statistics object 3 times with the self-growing column value, the column is marked as self-growing. To see if there are no columns marked as self-growing, you can use trace flag 2388. When you enable this trace flag, the output of theDBCC show_statistics is changed, with additional columns returned.
DBCC TRACEON (2388)DBCC show_statistics ('dbo. Orders'idx_ci')
now the following code updates the statistics 3 times, inserting rows at the end of our clustered index with the self-growth key column value each time.
1 --= 1st Update the Statistics on the table with a FULLSCAN2 UPDATE STATISTICSOrders withFULLSCAN3 GO4 5 --Insert Additional rows *after* The last step in the histogram6 INSERT intoOrders (OrderDate, Col2, Col3)7 VALUES('20100201',1,1)8 GO $9 Ten --= 2nd Update the Statistics on the table with a FULLSCAN One UPDATE STATISTICSOrders withFULLSCAN A GO - - --Insert Additional rows *after* The last step in the histogram the INSERT intoOrders (OrderDate, Col2, Col3) - VALUES('20100301',1,1) - GO $ - + --= 3rd Update the Statistics on the table with a FULLSCAN - UPDATE STATISTICSOrders withFULLSCAN + GO
Then, when we execute the DBCC SHOW_STATISTICS command, you will see that the SQL Server has said that the column is marked as ascending.
DBCC TRACEON (2388)DBCC show_statistics ('dbo. Orders'idx_ci')
2389 . If you enable this trace flag, the query optimizer is density vector (Density vector) for cardinality calculations.
-- Now we query the newly inserted range which is currently not present in the histogram. -- with Trace Flag 2389, the Query Optimizer uses the Density Vector to make the cardinality E Stimation. select * from Orders where OrderDate = " " option (RECOMPILE, Querytraceon ) go
Now look at the table density:
DBCC Traceoff (2388)DBCC show_statistics ('dbo. Orders'idx_ci')
The table density is now 0.0008873115, so the estimated number of rows for the query optimizer is 28.4516:0.0008873115* (32265-200).
This is not the best result, but it is much better than the estimated number of rows 1!
(Here is the problem, I am a local SQL Server 2008r2, test estimate number of rows or 1, I do not know why, hope to know the explanation of friends, thank you!
)
SQL Server
a new feature introduced in SQL Server 2014 is the new cardinality calculation . The new cardinality calculation is very simple to handle the self-growth key problem: By default, no trace flag is used to calculate cardinality using the density vectors of the statistics object. The following query enables cardinality calculations for 2312 trace flags to run the same query.
1 -- 2 select * from Orders 3 = " 20100401 '
let's look at the cardinality calculation here, and you'll see that the query optimizer estimates the number of rows again to be 28.4516, but this time there's no self-growth on the table. This is the self-bringing feature of SQL Server 2014.
(SQL Server 2014 test failed with estimated number of rows is also 1 ...) )
Summary
In this article, I show you how SQL Server's query optimizer handles self-growing key issues. Before SQL Server 2014, you would need to enable 2389 trace flags to get a better cardinality calculation-so that column would be marked as self-growing (ascending). SQL Server 2014, the query optimizer uses density vectors by default for cardinality calculations, which makes it much easier. I hope you've learned something about it, and you'll have a better idea of how to handle self-growing key columns in SQL Server.
Thanks for your attention!
Statistics using self-growing key column values