The statistical information is an important reference for the SQL Server optimizer to generate the execution plan, it needs the database developer, the database administrator has certain understanding to it, thus the reasonable and efficient application, management.
The first part of the concept
Statistics (Statistics): Describes the data distribution of an index in a (some) field. The optimizer generates an efficient execution plan with its evaluation.
Density (density): a ratio value that shows how many unique values are in a (group) column. (The smaller the value in the actual application, the better)
Density = 1/number of distinct values for column (s)
Histogram (histogram): divides the data into different segments (steps), which are used to describe and record the specifics of each data distribution (sample creation). Up to a maximum of £ steps
DBCC Show_statistics (Object_name,column_name)
Header (information header) contains a series of metadata about statistics
Density (density) contains the density information of columns (column groups) and the average column (group) length
The histogram (histogram) contains histograms describing information.
Histogram (histogram)
Range_hi_key: The upper value of the Histogram column (the multi-column case-first record) segment. Column values are key values
Range_rows: The number of rows whose corresponding column values are in this segment (with no upper bound) (estimated)
Eq_rows: The number of rows equal to the upper value of its column value
Number of distinct values in distinct_range_rows:range_rows
Avg_range_rows: The average number of rows in the histogram segment (not including the upper limit)
Distinct_range_rows > 0 for Range_rows/distinct_range_rows
The second part of the principle, application
Statistics update
Statistics can be manually maintained or updated by the optimizer when it confirms that the execution plan is valid: The recompile threshold (recompilation threshold/rt) to determine whether statistics are out of date.
Trigger conditions
When you create a table that is empty, adding a piece of data updates
Record update ID (modification Counters) greater than 500 update when table data is less than 500 rows
Record update identity is greater than 500 and 20% rows Change (rowcnt) When table data is greater than 500 rows
Note: The Temporary table table is small (0 rows or less than 6 rows). 6 changes trigger the update.
Table Variable No statistics
About record update identification (modification Counters)
Rowmodctr sql2000 and used before. Recorded in Sys.sysindexes.
Note: Although this parameter is highly dependent on its use, Microsoft still maintains this parameter change as a reference
colmodctr sql2005 and later use recorded in Sys.sysrscols. Rcmodified (DAC access required)
In addition the DMV Sys.system_internals_partition_columns's modified_count also records
COLMODCTR (DAC not required) but does not provide up-to-date compatibility. sql2012 still support!
COLMODCTR Record Rules:
Insert each row to add all columns colmodctr+1=colmodctr
Delete every row of all columns colmodctr+1=colmodctr
Update target column colmodctr+1=colmodctr (sql2008+) per modified row
Sql2005 Update Target columns
When you modify the column Nokey columns colmodctr+1=colmodctr
When the modification is listed as key columns colmodctr+2=colmodctr
Bulk Insert is similar to n Rows insert
Truncate table is similar to n Rows delete
Note: Modification counters non-transactional (such as when inserting 1000 data, and then rollback. COLMODCTR will be added)
The filter statistics (filtered statistics) trigger updates to the overall data range rather than the filter interval. Manual Maintenance once created
The optimizer applies statistics.
How the optimizer applies statistics is a more complex approach. SQL Server is applied even differently between versions. Here is a brief introduction.
Optimizer uses statistics "preferences" to prioritize the latest, full scan statistics
A brief description of the inferior single predicate prediction.
DBCC SHOW_STATISTICS (' votes ', ' ix_multicolumn1 ')
Select * from votes where topic_id=40
The estimated number of rows is eq_rows when the predicate hits the boundary value
Select * from votes where topic_id=10000
When the predicate value is within a range, the non-hit boundary value is avg_range_rows. That is, all predicate key values between 9042 and 16234 are estimated to be 2.482208
DECLARE @topic_id int
Set @topic_id =1000
Select * from votes where topic_id= @topic_id
When a predicate is a variable, the optimizer does not know the value of the parameter. Estimates are made in the form of density * rows.
namely topic_id= @topic_id for 0.000004936151*1943794=9.59486
Part III Maintenance
To view statistical information
Sp_autostats ' Votesbak '---view statistics update information
Sp_helpstats ' Votesbak ', ' all '---view statistics corresponding columns/Key values
DBCC SHOW_STATISTICS (' votes ', ' ix_multicolumn1 ')-View details of specific statistics
Statistical information Related Settings
Auto_create_statistics----Automatically create statistics
Auto_update_statistics---Automatically update statistics
Auto_update_statistics_async-Automatically asynchronously updates statistics (the optimizer uses the old statistics without recompiling and executes immediately.)
Example:
ALTER DATABASE [BitautoReview2] SET Auto_update_statistics_async on
Trace Flag 2371 elasticity Updates statistics based on conditions
Statistical information operations
Create Statistics--creating statistical information
Update Statistics--updating statistics
Drop Statistics--delete statistics
When maintaining statistics, you can select a sample size to create, update
Update statistics [votes] (IX_MULTICOLUMN1) with fullscan--manually update the statistics for the specified index (typically use a full scan for specific statistical information issues.) Longer update times may affect performance
Update STATISTICS votes with sample Percent;-----manual updating of full-table statistics, sample sampling 10% (typically used for updating an entire object, sample size and accuracy depending on the environment)
Update STATISTICS votes with ROWCOUNT =1000000,pagecount=100000---Specify the number of rows/pages updated (typically used to simulate what kind of execution plan the optimizer will take when the amount of table data becomes large)
Best Practices for statistical information
Auto_create_statistics, auto_update_statistics generally no special application automatically opens.
Auto_update_statistics_async when the table that triggers the update is huge, the statistics update will significantly affect the current performance, and the old statistics should be set to ON when there is no significant impact on the original update
Trace Flag 2371 According to the appropriate environment settings
A temporary table is used instead of a table variable to cause a bad execution plan due to no statistics
You should create the appropriate statistics before the read-only library (Readonly) configuration
Avoid variables as much as possible, using the proc parameter format when using the Exec sp_executesql form with dynamic SQL
Filter statistics should be maintained by people
After the expression (CTE) participates in the join operation, the exact statistics may not be obtained because of the execution logic, and when this problem occurs, it is replaced by a temporary table or a rewrite.
When the data is tilted so much that it causes problems such as parameter sniffing, filter statistics (filtered index) should be created, or processed by using query Hint, rewriting logical processing, and so on.