Selection of SQL Server statistics Maintenance policy
Problem Description:
After archiving a tens of millions of tables in an OLTP system, the business application receives a timeout alert when the source table data is deleted in batches, as follows:
V1.1.1.1: ****process-querytransactionfor****: 23075129Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. This failure occured and attempting to connect to the Principle server.
Query the current active process and find a very slow Statman query:
Select Statman ([SC0], [SC1], [SB0000]) from (select TOP of PERCENT [SC0], [SC1], step_direction ([SC0]) over (order by NUL L) as [SB0000] from (SELECT [TransactionID] as [Sc0],[id] as [SC1] from [dbo].[ Product] tablesample SYSTEM (8.340078e-001 PERCENT) with (readuncommitted)) as Ms_updstats_tbl_helper ORDER by [SC0], [SC 1], [SB0000]) asms_updstats_tbl OPTION (MAXDOP 1)
This is a statistical information maintenance task, to look at all the statistics of the table.
EXEC sp_autostats ' dbo. Product '; GO
The result set is displayed in descending order of time, as follows:
[ix_product_transactinid] on 2015-11-12 14:15:14 [ix_product_createtime] on 2015-11-12 10:17:50 [IX_Product_Number] ON 2015-10-23 12:10:51 [pk_product] on 2015-08-14 20:03:41 [_wa_sys_0000000e_693ca210] on 2015-07-03 10:39:36 [_wa_sys_00000025_693ca210] on 2014-12-05 16:22:20 [_WA_Sys_0000002A_693CA210] on 2014-12-05 14:54:53 [_wa_sys_0000000b_ 693ca210] on 2014-07-01 10:52:54 [_ Wa_sys_00000018_693ca210] &nBsp; on 2013-01-24 02:16:11 [_wa_sys_00000023_ 693ca210] on 2012-12-20 13:17:27 [_ wa_sys_00000026_693ca210] on 2012-12-20 13:17:26 [_wa_sys_00000004_693ca210] on 2012-12-20 13:17:25 [_WA_Sys_00000006_693CA210] ON 2012-12-20 13:17:24 [_wa_sys_00000022_693ca210] on 2012-12-20 13:17:23 [_WA_Sys_0000001B_693CA210] on 2012-12-20 13:17:22 [_wa_sys_0000001d_ 693ca210] on 2012-12-20 13:17:21 [_ Wa_sys_0000000f_693ca210] &nBsp; on 2012-12-20 13:17:20 [_wa_sys_00000013_ 693ca210] on 2012-12-20 13:17:18
See the index [Ix_product_transactinid] at the current point in time to update the statistics.
While monitoring the Statman process, we also see that a query that causes business alarms is extremely slow.
SELECT P.[id], P.[name], P.[price], P.[amount], P.[transactionid], t.[action] , T.[transactionstatus], T.[transactionresult] from [Product] as P left joins [Transactioninfo] as T on P.[TR Ansactionid] = T.[id] WHERE [TransactionID] = @transactionId
It is clear that the data for the Product table is deleted, so that the [TransactionID] column index [Ix_product_transactinid] reaches the threshold to update statistics, triggering index maintenance.
Problem handling:
In order for the problem query to be restored as soon as possible, I created a new [TransactionID] column index, disabled [Ix_product_transactinid], and the issue was temporarily resolved.
Principle Analysis:
Statistical Information Maintenance Strategy
When SQL Server needs to estimate the complexity of an operation, it must try to find the appropriate statistics to support it. The database administrator cannot predict what operations SQL Server will run, so it is not possible to estimate what kind of statistics SQL Server might need. It would be a very complicated project to build and maintain statistical information by manpower. Fortunately, SQL Server is not designed like this. In the vast majority of cases, SQL Server itself will maintain and update statistics very well, the user basically does not feel, the database administrator does not have the additional burden.
This is mainly because there are two default open settings in the SQL Server database properties: Auto Create Statistics and auto Update Statistics. They enable SQL Server to automatically create the statistics that are used when needed, as well as to
When the current statistics are out of date, automatically update it.
Under what circumstances does SQL Server create statistical information? There are 3 main types of cases:
1. When the index is created, SQL Server automatically creates statistics on the column where the index is located
So in some ways, the role of the index is twofold, and it can help SQL Server find the data quickly. And the statistics above it can also tell the distribution of SQL Server data.
2. Administrators can also manually create statistics that he deems necessary through statements such as create statistics
If you open auto create Statistics, you rarely need to create them manually.
3. When SQL Server wants to use statistics on some columns and finds no, "auto Create Statistics" causes SQL Server to automatically create statistics.
For example, when a statement is to filter on a (or some) field, or to make a connection (join) with another table, SQL Server estimates how many records will be returned from this table, requiring a statistical support. If not, SQL Server automatically creates one.
We can try on the salesorderheader_test.
Sp_helpstats salesorderheader_test Go--Returns the table without statistics (except on the index) SELECT COUNT (*) from dbo. salesorderheader_test where OrderDate = ' 2004-06-11 00:00:00.000 ' Go-run a query with filters on OrderDate sp_helpstats S Alesorderheader_test Go--Returns the table already has a new statistic Statistics_name Statistics_keys----------------------- ----------------------------------_wa_sys_00000003_1a34df26 OrderDate
Therefore, on the database where auto Create statistics is turned on, it is generally not necessary to worry that SQL Server does not have enough statistics to select the execution plan, which is entirely left to SQL Server management.
SQL Server not only to establish appropriate statistical information, but also to update them in a timely manner, so that they can reflect the data in the table changes, data insertion, deletion, modification may cause statistical information updates. However, updating the statistics itself is a resource-intensive thing, especially for larger tables. If there is a little bit of change in SQL Server to update the statistics, perhaps SQL Server will have to work this out, too late to do anything else. SQL Server is also a balance between the accuracy of statistics and the rational consumption of resources. The conditions for automatic update of the triggering statistics are:
1. If the statistic is defined on a normal table, the statistics are considered obsolete when one of the following changes occurs, and an update action is automatically triggered the next time it is used.
(1) The table has no data to be greater than or equal to 1 data.
(2) for a table with a data volume less than 500 rows, when the first field data of the statistics cumulative change is greater than 500.
(3) For a table with a data volume greater than 500 rows, when the first field data of the statistic has a cumulative change of more than + (20%x table data total).
So for larger tables, only 1/5 of the data has changed, and SQL Server will not be going to re-calculate the statistics.
2. Statistics can be found on temporary tables (temp table), and their maintenance policies are basically consistent with normal tables, but statistics cannot be built on table variables (tables Variable)
Such a maintenance strategy can guarantee a small cost and ensure that the statistics are basically correct. This case, reflecting this maintenance strategy in the data distribution of a special table, may also cause some negative effects, just a regular manual (or do a task) to update the statistics of the table.
After SQL Server 2005, the database properties are one more "Auto Update Statistics asynchronously". When SQL Server discovers that a statistic is out of date, it will continue to compile the query with the old statistics, but it will start a task in the background and update this statistic. The next time the statistic is used, it is already an updated version. The disadvantage of this is that there is no guarantee that the current execution plan of this query accuracy, everything has advantages and disadvantages, the database administrator can choose according to the actual situation.
Of course, there are some exceptions. Due to the particularity of the data, the SQL Server Auto Update statistics algorithm does not meet the need to ensure the accuracy of the execution plan, and in practice, the performance of the database may suddenly slow down.
Experienced administrators schedule an index rebuild task, which is often useful for performance. It is often explained that because index rebuilds eliminate data fragmentation and improve performance, index rebuilding is another important task, and it uses full scan to re-update the statistics on the table, making the statistics very accurate. This can be very useful for performance.
Trace flag 2371
SQL Server R2 SP1 Describes a very good feature that can modify the automatic statistics algorithm (auto stats algorithm), from the default 20%+500 line to a range value (sliding scale). This property can be opened only by turning on trace flag 2371, not by default.
If you press the automatic statistics algorithm everything works well, there is no performance problem to worry about. Now just take into account that the table cardinality is too large like a threshold value of 50000 rows or 1000000. In this case, for high-cardinality tables, these thresholds may not be good enough. For example, I have a table with a table cardinality of 50000. I inserted a new line of 20% (10000). Based on the previous standard thresholds, these 20% of newly inserted rows do not meet the triggering auto-update statistics. This may be one of the reasons for the performance problem in your situation. If you face the same problem, you don't need to worry because Microsoft provides trace flag 2371. By using this tag, SQL Server determines dynamic thresholds for automatically updating statistics on tables that have more than 25000 rows. For automatic Update statistics, a higher row cardinality will use a lower threshold value.
2371:sql automatically modifies the thresholds for statistics updates as needed, rather than following the default algorithm
http://www.sqlservergeeks.com/sql-server-trace-flag-2371/
Policy selection
The trace flag is turned on on a large database, obviously to use automatic statistics, so you need to turn on automatic statistics. In addition, we need to open "Auto Update Statistics asynchronously" for a database with a super large table. The reason for turning on the asynchronous Update Statistics feature is that you do not want to see the query time-out when automatic statistics is triggered.
We know that automatic statistics update STATISTICS when the actual number of rows in the table changes. When the Super large table triggers an automatic statistics run, if the update STATISTICS command is run for more than 30 seconds, the query statement that triggers the automatic statistics times out, causing the transaction to roll back, which means that the automatic statistics command is also rolled back. Therefore, the next query also triggers an automatic statistics update, and this process repeats the loop.
You'll see the query time-out randomly in SQL Server, even if the execution plan looks overall normal. You will also see a large amount of IO generated on the disk where the database is stored, because automatic statistics makes a large number of table queries, and automatic statistics are constantly circulating.
When auto-update statistics is turned on on the database, the query does not wait for the UPDATE STATISTICS command to complete when automatic statistics is triggered by SQL Server. The UPDATE STATISTICS command runs in the background to keep the query running. Now the query will run with the old and possibly available statistics, in which case they are available 2 seconds, so it's no big deal if they're used for a few seconds.
It is not recommended to turn on the asynchronous update STATISTICS setting on each database. All small databases will update statistics well in the timeout period.
For tables that need to turn on asynchronous update statistics, you can periodically update synchronization information manually.
Setting up Asynchronous Update statistics
SELECT name, is_auto_update_stats_on, is_auto_update_stats_async_on from Sys.databasesalter DATABASE yourdbname SET Auto_update_statistics_async onalter DATABASE yourdbname SET auto_update_statistics on
Turn on trace flag 2371
Use mastergo--The following example opens trace flag 2371 globally. DBCC TRACEON (2371, -1) godbcc tracestatus (2371,-1) GO
Other operation references are as follows:
UPDATE STATISTICS
Https://msdn.microsoft.com/zh-cn/library/ms187348%28v=sql.105%29.aspx
sp_updatestats
Https://msdn.microsoft.com/zh-cn/library/ms173804%28v=sql.105%29.aspx
sp_autostats
Https://msdn.microsoft.com/en-us/library/ms188775.aspx
DBCC show_statistics
Https://msdn.microsoft.com/en-us/library/ms174384.aspx
stats_date
Https://msdn.microsoft.com/zh-cn/library/ms190330%28v=sql.105%29.aspx
Subsequent processing:
Modify statistics update policy to Auto_update_statistics_async
Perform archive Delete
Perform Update archive table statistics
To modify the statistics update policy to synchronize updates
Turn on trace flag 2371
Given the real-time nature of the library and the high demand for business accuracy, after the archive deletion is complete, change the policy to synchronize the update and turn on trace flag 2371 to let SQL Server dynamically determine the smaller update threshold value.
If the library does not have this high requirement, consider setting up an asynchronous update for a library with a super large table, updating the table statistics after the archive is deleted, and creating a maintenance job to update the table statistics periodically.
This article is from the SQL Server deep dives blog, so be sure to keep this source http://ultrasql.blog.51cto.com/9591438/1712455
Selection of SQL Server statistics Maintenance policy