SQL Server Statistics (Statistics)-Concepts, principles, applications, maintenance

Source: Internet
Author: User
Tags bulk insert

The statistical information is an important reference for the SQL Server optimizer to generate the execution plan, it needs the database developer, the database administrator has certain understanding to it, thus the reasonable and efficient application, management.

The first part of the concept

Statistics (Statistics): Describes the data distribution of an index in a (some) field. The optimizer generates an efficient execution plan with its evaluation.

Density (density): a ratio value that shows how many unique values are in a (group) column. (The smaller the value in the actual application, the better)

Density = 1/number of distinct values for column (s)

Histogram (histogram): divides the data into different segments (steps), which are used to describe and record the specifics of each data distribution (sample creation). Up to a maximum of £ steps

DBCC Show_statistics (Object_name,column_name)

Header (information header) contains a series of metadata about statistics

Density (density) contains the density information of columns (column groups) and the average column (group) length

The histogram (histogram) contains histograms describing information.

Histogram (histogram)

Range_hi_key: The upper value of the Histogram column (the multi-column case-first record) segment. Column values are key values

Range_rows: The number of rows whose corresponding column values are in this segment (with no upper bound) (estimated)

Eq_rows: The number of rows equal to the upper value of its column value

Number of distinct values in distinct_range_rows:range_rows

Avg_range_rows: The average number of rows in the histogram segment (not including the upper limit)

Distinct_range_rows > 0 for Range_rows/distinct_range_rows

The second part of the principle, application

Statistics update

Statistics can be manually maintained or updated by the optimizer when it confirms that the execution plan is valid: The recompile threshold (recompilation threshold/rt) to determine whether statistics are out of date.

Trigger conditions

When you create a table that is empty, adding a piece of data updates

Record update ID (modification Counters) greater than 500 update when table data is less than 500 rows

Record update identity is greater than 500 and 20% rows Change (rowcnt) When table data is greater than 500 rows

Note: The Temporary table table is small (0 rows or less than 6 rows). 6 changes trigger the update.

Table Variable No statistics

About record update identification (modification Counters)

Rowmodctr sql2000 and used before. Recorded in Sys.sysindexes.

Note: Although this parameter is highly dependent on its use, Microsoft still maintains this parameter change as a reference

colmodctr sql2005 and later use recorded in Sys.sysrscols. Rcmodified (DAC access required)

In addition the DMV Sys.system_internals_partition_columns's modified_count also records

COLMODCTR (DAC not required) but does not provide up-to-date compatibility. sql2012 still support!

COLMODCTR Record Rules:

Insert each row to add all columns colmodctr+1=colmodctr

Delete every row of all columns colmodctr+1=colmodctr

Update target column colmodctr+1=colmodctr (sql2008+) per modified row

Sql2005 Update Target columns

When you modify the column Nokey columns colmodctr+1=colmodctr

When the modification is listed as key columns colmodctr+2=colmodctr

Bulk Insert is similar to n Rows insert

Truncate table is similar to n Rows delete

Note: Modification counters non-transactional (such as when inserting 1000 data, and then rollback. COLMODCTR will be added)

The filter statistics (filtered statistics) trigger updates to the overall data range rather than the filter interval. Manual Maintenance once created

The optimizer applies statistics.

How the optimizer applies statistics is a more complex approach. SQL Server is applied even differently between versions. Here is a brief introduction.

Optimizer uses statistics "preferences" to prioritize the latest, full scan statistics

A brief description of the inferior single predicate prediction.

DBCC SHOW_STATISTICS (' votes ', ' ix_multicolumn1 ')

Select * from votes where topic_id=40

The estimated number of rows is eq_rows when the predicate hits the boundary value

Select * from votes where topic_id=10000

When the predicate value is within a range, the non-hit boundary value is avg_range_rows. That is, all predicate key values between 9042 and 16234 are estimated to be 2.482208

DECLARE @topic_id int

Set @topic_id =1000

Select * from votes where topic_id= @topic_id

When a predicate is a variable, the optimizer does not know the value of the parameter. Estimates are made in the form of density * rows.

namely topic_id= @topic_id for 0.000004936151*1943794=9.59486

Part III Maintenance

To view statistical information

Sp_autostats ' Votesbak '---view statistics update information

Sp_helpstats ' Votesbak ', ' all '---view statistics corresponding columns/Key values

DBCC SHOW_STATISTICS (' votes ', ' ix_multicolumn1 ')-View details of specific statistics

Statistical information Related Settings

Auto_create_statistics----Automatically create statistics

Auto_update_statistics---Automatically update statistics

Auto_update_statistics_async-Automatically asynchronously updates statistics (the optimizer uses the old statistics without recompiling and executes immediately.)

Example:

ALTER DATABASE [BitautoReview2] SET Auto_update_statistics_async on

Trace Flag 2371 elasticity Updates statistics based on conditions

Statistical information operations

Create Statistics--creating statistical information

Update Statistics--updating statistics

Drop Statistics--delete statistics

When maintaining statistics, you can select a sample size to create, update

Update statistics [votes] (IX_MULTICOLUMN1) with fullscan--manually update the statistics for the specified index (typically use a full scan for specific statistical information issues.) Longer update times may affect performance

Update STATISTICS votes with sample Percent;-----manual updating of full-table statistics, sample sampling 10% (typically used for updating an entire object, sample size and accuracy depending on the environment)

Update STATISTICS votes with ROWCOUNT =1000000,pagecount=100000---Specify the number of rows/pages updated (typically used to simulate what kind of execution plan the optimizer will take when the amount of table data becomes large)

Best Practices for statistical information

Auto_create_statistics, auto_update_statistics generally no special application automatically opens.

Auto_update_statistics_async when the table that triggers the update is huge, the statistics update will significantly affect the current performance, and the old statistics should be set to ON when there is no significant impact on the original update

Trace Flag 2371 According to the appropriate environment settings

A temporary table is used instead of a table variable to cause a bad execution plan due to no statistics

You should create the appropriate statistics before the read-only library (Readonly) configuration

Avoid variables as much as possible, using the proc parameter format when using the Exec sp_executesql form with dynamic SQL

Filter statistics should be maintained by people

After the expression (CTE) participates in the join operation, the exact statistics may not be obtained because of the execution logic, and when this problem occurs, it is replaced by a temporary table or a rewrite.

When the data is tilted so much that it causes problems such as parameter sniffing, filter statistics (filtered index) should be created, or processed by using query Hint, rewriting logical processing, and so on.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.