Understanding SQL Server statistics and SQL statistics

Last Update:2015-01-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface

SQL Server queries are based on overhead queries. When an execution plan is generated for the first time, a better execution plan is determined based on multi-stage analysis optimization. The base estimation of these overhead is determined based on the statistical information. The statistical information is the segmentation distribution of the overall data of each field of the table. By default, the database automatically maintains the data.

Both tables and views have statistical information. Statistical information objects are created based on the index or table column list. When a column is queried for the first time with the most conditions, the statistical information of a single column is created. When an index is created, statistics with the same name will be created. In the index, statistics are only collected on the first column. Therefore, in addition to storing data in the first column, the statistics are calculated on the first column, therefore, the first column defined during index settings is very important. Each statistical information object is created on a list containing one or more columns and contains a histogram showing the distribution of values in the first column.

Next, let's take a look at the statistics ~~ ^

View statistics:

As you can see, statistical information is also an object of a table.

-- List all statistics in the table select * from sys. stats where object_id = OBJECT_ID (n' [Sales]. [SalesOrderDetail] ')

-- View the statistical information and its column SELECT s. name AS statistics_name, c. name AS column_name, SC. stats_column_idFROM sys. stats AS sINNER JOIN sys. stats_columns AS SC ON s. object_id = SC. object_id AND s. stats_id = SC. stats_idINNER JOIN sys. columns AS c ON SC. object_id = c. object_id AND c. column_id = SC. column_idWHERE s. object_id = OBJECT_ID (n' [Sales]. [SalesOrderDetail] '); -- view all statistics. Update Time: exec sp_helpstats n' [Sales]. [SalesOrderDetail] ', 'all'

Statistical information attributes:

Right-click statistics and select "properties" to view statistics settings and distribution.

You can also run the DBCC SHOW_STATISTICS command to view the information. The following is a column. DBCC SHOW_STATISTICS ('[Sales]. [SalesOrderDetail]', 'ix _ SalesOrderDetail_ProductID ')

It consists of three parts: "Statistical Information Header", "density vector", and "histogram"

1. Statistics Header

Column name	Description
Name	Name of the statistics object (IX_SalesOrderDetail_ProductID)
Updated	Date and time of the last update of statistics (Mar 14 2012)
Rows	The total number of rows in the table or index view during the last update of statistics (121317 ). If the filtering statistics or statistics match the filtering index, the number of rows may be smaller than the number of rows in the table.
Rows Sampled	The total number of samples used for statistical calculation (121317 ). If Rows Sampled <Rows, the displayed histogram and density are estimated based on the sample row.
Steps	Number of cascade operations in the histogram (200 ). Each cascade step spans a column value range followed by the upper limit column value. Histogram cascade is defined based on the first key column in the statistics. The maximum number of cascade operations is 200.
Density	Calculation formula: 1. distinct values of all values (excluding histogram boundary values) in the first key column of the statistical information object. (0.0078125) the query optimizer does not use this Density value. It is displayed to be backward compatible with versions earlier than SQL Server 2008.
Average Key Length	Average number of bytes for each value of all key columns in the statistics object (12: Three int types. ProductID, SalesOrderID, SalesOrderDetailID)
String Index	(NO) Yes indicates that the statistic object contains string summary statistics to improve the base estimation of the query predicates using the LIKE operator. For example, WHERE ProductName LIKE '% Bike '. String summary statistics are stored separately from histograms. If the statistical information objects are char, varchar, nchar, nvarchar, varchar (max), nvarchar (max), text, or ntext. type, then the string summary statistics are created based on the first key column.
Filter Expression	The predicate of the table row subset contained in the statistics object. NULL = unfiltered statistics.
Unfiltered Rows	The total number of rows in the table before the filter expression is applied (121317 ). If Filter Expression is NULL, Unfiltered Rows is equal to Rows.

2. density information

Column name	Description
All Density	The Density is 1/distinct values. The result shows the density of each prefix of each column in the statistics object, and each density shows a row. A non-repeated value is a non-repeated list of column values of each row prefix and column prefix. Selectichoice) "Selectivity" indicates the repeated data volume in the dataset, or conversely, the unique data volume. If the data of a field is rarely repeated, its selectivity is relatively high. For example, the ID card number cannot be repeated. Even if you want to query the identity records of China, only one record can be returned when an ID card number is entered, the result set that can effectively filter out a large amount of data returned is relatively small. For example, gender. There are only two types of people, not male or female. The repeatability of this field is very high and the selectivity is very low. One filter condition can only filter out at most half of the records. By calculating the "selectivity", you can predict how many records can be returned to the Density after a filter condition is completed: density = 1/cardinality of index keys if the value is smaller than 0.1, the index is generally highly selective. If it is greater than 0.1, its selectivity is not high. (See Microsoft sqlserver enterprise-level platform management practices.)
Average Length	The average length (in bytes) of the column Value List that stores the column prefix ).
Columns	Display the name of the column in the prefix of All density and Average length

Here, why are there three rows? Because [ProductID] is a non-clustered index, [SalesOrderID, SalesOrderDetailID] is a clustered index, and each non-clustered index contains the key value of the clustered index, therefore, three options are available for the statistics here.

Current statistics[All Density]Calculation method:

Select count (*) from (select count (*) a from [Sales]. [SalesOrderDetail] group by ProductID) as Tselect count (*) from (select count (*) a from [Sales]. [SalesOrderDetail] group by ProductID, SalesOrderID) as Tselect count (*) from (select count (*) a from [Sales]. [SalesOrderDetail] group by ProductID, SalesOrderID, SalesOrderDetailID) as T -- statistics by different groups: group by ProductID -- 266 rows group by ProductID, SalesOrderID -- 121317 rows group by ProductID, SalesOrderID, salesOrderDetailID -- 121317 rows select 1.0/266 as [all density] union allselect 1.0/121317 as [all density] union allselect 1.0/121317 as [all density]

2 Histogram

Column name	Description
RANGE_HI_KEY	The maximum column value of the histogram cascade. The column value is also called a key value. (Distribution by ProductID range)
RANGE_ROWS	The number of rows whose column values are within the histogram cascade (excluding the upper limit. (How many rows are there between two ProductID values)
EQ_ROWS	The number of rows whose column value is equal to the upper limit of the histogram cascade. (Number of rows equal to the ProductID value of the current row)
DISTINCT_RANGE_ROWS	The number of rows with non-repeated column values within the histogram cascade (excluding the upper limit. (How many duplicate key values are there between two ProductID values? ProductID)
AVG_RANGE_ROWS	The duplicate column value is within the histogram cascade (excluding the upper limit) of the average number of rows (if DISTINCT_RANGE_ROWS> 0, it is RANGE_ROWS/DISTINCT_RANGE_ROWS ).

Importance of statistical information:

In SQL Server, when executing a batch processing statement, the query optimizer in the relational engine will first estimate the generation of a better execution plan, and the execution executor will then install the request data according to the execution plan. That is, during the execution plan generation, sqlserver estimates the number of rows based on the statistical information in the table, determines the physical operation steps according to the script semantics to generate the execution plan, and then accesses the data according to the execution plan. For tables with large data, the number of rows estimated by statistical information is often inaccurate, which means that the query uses an inaccurate execution plan and is slow. For example, "parameter sniffing" may cause an estimation error because the passed parameter value cannot be determined. When using Table variables, no statistical information is provided or the number of rows is estimated.

-- Take the column statistics of this table as an example [Sales]. [SalesOrderDetail] (SpecialOfferID) DBCC SHOW_STATISTICS ('[Sales]. [SalesOrderDetail]', '_ WA_Sys_0000000B_44CA3770 ')

If the query date range is'2017-07-01 '<ModifiedDate <= '2017-08-01 ',Check that the estimated number of rows returned by the query is896.7728 (190.2021 + 706.5707)

SELECT COUNT( ModifiedDate )FROM [Sales].[SalesOrderDetail] WHERE ModifiedDate >'2005-07-01 00:00:00'AND ModifiedDate<='2005-08-01 00:00:00'

The estimated number of rows is 896.773, consistent with the histogram information of the statistical information. In fact, it is calculated based on the histogram. If the estimated number of rows is inaccurate, it must be because the statistical information does not have the correct histogram information. Therefore, the statistical information must be updated. Sometimes, even if the statistical information is updated, the results are the same. Because the data volume is too large and the estimated data is incomplete, You can see Rows Sampled. Therefore, you can also use the full table row count statistics when updating the statistical information, however, scanning table data also consumes performance. Even so, there may be some differences, because the maximum step size of the histogram is 200, the same and different data columns are too large, and the 200 segment distribution is also uneven, you cannot use more detailed data histograms.

Update settings of statistics:

Sqlserver automatically maintains statistics by default. You can set the options for automatically creating and updating statistics at the database level.

SET the script as follows: alter database [databaseName] SET AUTO_CREATE_STATISTICS on with NO_WAITALTER DATABASE [databaseName] SET AUTO_UPDATE_STATISTICS on with NO_WAITALTER DATABASE [databaseName] SET login on with NO_WAIT

AUTO_CREATE_STATISTICS:

When AUTO_CREATE_STATISTICS = ON is used as a condition query, the system automatically creates a single column of statistics for each condition column. When an index is created, corresponding statistics are automatically created. When the query optimizer uses the AUTO_CREATE_STATISTICS option to create statistics, the statistics name starts with _ WA.

AUTO_UPDATE_STATISTICS:

When AUTO_UPDATE_STATISTICS = ON, the query optimizer determines when the statistics may expire and then updates them when the query uses these statistics. Statistics will expire after the data distribution in the table or index view is changed by the insert, update, delete, or merge operations. The query optimizer calculates the number of data modifications after the last statistical information update and compares the number of modifications with a threshold to determine when the statistical information may expire. This threshold is based on the number of rows in the table or index view. The query optimizer checks for expired statistics before compiling and querying and executing cache query plans. Before compiling a query, the query optimizer uses the columns, tables, and index views in the query predicates to determine which statistics may expire. Before executing a cache query plan, the database engine confirms that the query plan references the latest statistics.

AUTO_UPDATE_STATISTICS_ASYNC:

The asynchronous Statistics Update option AUTO_UPDATE_STATISTICS_ASYNC determines whether the query optimizer uses synchronous statistics updates or asynchronous statistics updates. By default, the asynchronous Statistics Update option is disabled, and the query optimizer updates statistics synchronously. The AUTO_UPDATE_STATISTICS_ASYNC option is applicable to the statistic object created for the index, the single column in the query predicate, and the STATISTICS created using the create statistics statement. Statistics update can be synchronous (default) or asynchronous. For synchronous statistics updates, the query will always be compiled and executed with the latest statistics. When the statistics expire, the query optimizer will wait for the updated statistics before compiling and executing the query. For Asynchronous Statistics update, the query will be compiled with the existing statistics even if the existing statistics have expired. If the statistics expire during query compilation, the query optimizer can select a non-optimal query plan. After the asynchronous update is complete, compiled queries will benefit from the use of updated statistics.

Automatic maintenance and update of statistics:

Sqlserver sets AUTO_UPDATE_STATISTICS = ON to automatically maintain statistics. sqlserver will automatically update the statistics in the table when it meets certain conditions. We can see that the rowmodctr column in the system table sysindexes records the total number of inserted, deleted, and updated rows since the last update of statistics. The system automatically updates statistics that meet the requirements for updating statistics.

SELECT name, rows, rowmodctrFROMsys. sysindexes

Automatically update statistical rules:

• The row range in the table is rows = 0 rows increasing rows> 0 rows;

• The row range in the table is 0 <rows <500 rows, as long as the number of changes rowmodctr> 500;

• The row range in the table is rows> 500 rows, as long as the number of changes rowmodctr> 500 + 20% rows;

• Number of rows in the temporary table rows <6, as long as the number of changes rowmodctr> 6;

You need to manually update the statistics:

The query takes a long time to run.

The insert operation is performed on the ascending or descending key columns.

After the maintenance operation.

-- Create test table test (id int identity (), name char (20), value numeric (), meno varchar (50 )) create clustered index IX_test on test (name) alter table test add constraint PK_test primary key nonclustered (id) -- use [dbo]. the [test] Table is used as an example. First, check select I. name, rows, rowmodctr, stats_date (s. object_id, s. stats_id) AS update_datefrom sys. sysindexes I inner join sys. stats s on I. name = s. namewhere s. object_id = OBJECT_ID (' [Dbo]. [test] ') -- view the histograms of the two indexes at this time, and there is no DBCC SHOW_STATISTICS (' [dbo]. [test] ', 'ix _ test') DBCC SHOW_STATISTICS (' [dbo]. [test] ', 'pk _ test') -- insert one row of data. Is the statistics not updated? Insert into test (name, value, meno) select 'name', 0, 'meno'

-- 506*2 rows are added and deleted. No insert into test (name, value, meno) select 'name', 0, 'meno' go 500 delete from test is generated for statistics.

-- Retest: Delete the table again. Each field is searched once before data is inserted. Statistics are automatically generated for non-indexed fields. select * from test where id = 1 select * from test where name = ''select * from test where value = 0 select * from test where meno =''

-- Insert a row of data. Otherwise, the insert into test (name, value, meno) select 'name', 0, 'meno' go -- the preceding query does not automatically update statistics, update test set name = 'name' where name = 'name' update test set value = 0 where value = 0 update test set meno = 'meno' where id = 1 delete from test where meno = 'meno'

-- Re-insert the data, prepare the test to use insert into test (name, value, meno) select 'name', 0, 'meno' go -- view the statistics select I. name, rows, rowmodctr, stats_date (s. object_id, s. stats_id) AS update_datefrom sys. sysindexes I inner join sys. stats s on I. name = s. namewhere s. object_id = OBJECT_ID ('[dbo]. [test] ')

-- When I update the index statistics to row rowmodctr = 500, the statistics do not update test set name = 'name' where name = 'name' go 497.

-- Z is updated again to make the Index update accumulate rowmodctr = 501 rows update test set name = 'name' where name = 'name' go

The results showed that the index statistics were updated, and rowmodctr was reset to 1 row. The value is updated 501 times in the same way. Non-key columns are not updated! Update test set value = 0 where value = 0

That is, this condition is met: the row range in the table is 0 <rows <500 rows, as long as the number of changes rowmodctr> 500; -- insert data to 501 rows insert into test (name, value, meno) select 'name', 0, 'meno' go 500

When data exceeds 500 rows and reaches 501 rows, rowmodctr exceeds 500 rows and no index statistics are updated. -- Currently, 501 of the 20% rows of data is updated, and the statistics are not updated. With tab as (select top 20 percent * from test) update tab set name = 'name'

Currently, the number of rows is 602. Theoretically, more than 601.2 (501 + 501*0.2) rows will be updated. Currently, this is an update. If the statistics are automatically updated

As you can see, it is actually updated! Therefore, this condition is met: the row range in the table is rows> 500, as long as the number of changes rowmodctr> 500 + 20% rows;

The update of statistics is summarized as follows:

• Range of rows in the Table: rows = 0 rows increasing rows> 0 rows (no update during insertion, update and delete rows );

• The row range in the table is 0 <rows <500 rows, as long as the number of changes rowmodctr> 500;

• The row range in the table is rows> 500 rows, as long as the number of changes rowmodctr> 500 + 20% rows;

• Number of rows in the temporary table rows <6, as long as the number of changes rowmodctr> 6 (not tested );

The same name statistics are automatically generated when an index is created.

Non-index columns automatically generate statistics when the table has data for the first time as a condition Query

Statistics are not automatically updated for how much data is inserted to the table for the first time.

Rows without key columns are always not updated (because data is not stored)

Statistics related operations:

-- View index-only statistics. Update Time: SELECT name AS index_name, STATS_DATE (object_id, index_id) AS update_dateFROM sys. indexes WHERE object_id = OBJECT_ID ('[Sales]. [SalesOrderDetail] '); -- view all statistics. Update Time: select s. name, STATS_DATE (s. object_id, stats_id) AS update_datefrom sys. stats s WHERE s. object_id = OBJECT_ID ('[Sales]. [SalesOrderDetail] '); -- view all statistics. Update Time: exec sp_helpstats n' [Sales]. [SalesOrderDetail] ', 'all' GO -- CREATE statistical information create statistics [_ wa_user_000000000000001] ON [Sales]. [SalesOrderDetail] (ProductID, SalesOrderDetailID) -- view a statistical information DBCC SHOW_STATISTICS ('[Sales]. [SalesOrderDetail] ',' _ wa_user_0000000000000000001 ') -- updates one statistical item update statistics [Sales]. [SalesOrderDetail] [_ wa_user_0000000000000000001] with fullscan -- UPDATE all STATISTICS of a table update statistics [Sales]. [SalesOrderDetail] -- updates all available statistics in the database EXEC sys. sp_updatestats -- delete statistics drop statistics [Sales]. [SalesOrderDetail]. [_ wa_user_0000000000000000001]

Refer:

Statistics (SQL Server 2014 ):

Http://msdn.microsoft.com/zh-cn/library/ms190397 (v = SQL .120). aspx

DBCC SHOW_STATISTICS (Transact-SQL) (SQL Server 2014 ):

Http://msdn.microsoft.com/zh-cn/library/ms174384.aspx

Update statistics (Transact-SQL) (SQL Server 2014 ):

Http://msdn.microsoft.com/zh-cn/library/ms187348.aspx

Microsoft sqlserver enterprise platform management practices

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Understanding SQL Server statistics and SQL statistics

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Understanding SQL Server statistics and SQL statistics

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support