Welcome to the 4th month of Performance tuning training . This month is all about statistics in SQL Server and how they can help the query optimizer generate a good enough execution plan. The statistics are primarily used by the query optimizer to estimate the number of rows returned by the query. It's just an estimate, nothing else.
Statistical Information Overview
SQL Server uses what is called a histogram (histogram) in the Statistics object, which describes the distribution of data for the maximum number of steps (Steps) of the given column. One of the biggest limitations is that the statistical information in SQL Server is limited by 200 steps (using filtering statistics can exceed this step, which was introduced in SQL Server 2008).
Another limitation is the Automatic update of statistics (auto Update) mechanism: For tables larger than 500 rows, statistics are updated if the 500+20% column values change. This means that as soon as the table grows, the frequency of Automatic Updates to your statistics will be less (each triggering of automatic Updates requires more record modifications).
Suppose you have a table of 100,000 records, in which case the statistics will be updated automatically if you modify the data for 20500 (20%+500). If you have a table of 1 million records, you need to change the data of 200500 (20%+500), and the statistics will be updated automatically. The algorithm used here is exponential, not linear. A 2371 trace flag in SQL Server can also affect this behavior.
This behavior can be a huge problem when you are searching for a navigation bookmark in your execution plan. As you know, based on the current statistics, if the estimated number of rows for a query is very small, the query optimizer chooses the bookmark lookup operator. If your statistics expire and your execution plan is valid, SQL Server will blindly reuse the cache plan and your page reads will skyrocket. Let's take a look at specific examples of this problem.
statistics on Distortion (Stale Statistics)
The following script creates a table with 1500 records, with an average distribution of data in the Column2 column. In addition, we define a nonclustered index on the column2 column.
1 CREATE TABLETable12 (3Column1INT IDENTITY,4Column2INT5 )6 GO7 8 --Insert records into Table19 SELECT TOP the IDENTITY(INT,1,1) asN into#NumsTen from One Master.dbo.syscolumns SC1 A - INSERT intoTable1 (COLUMN2) - SELECTN from#nums the - DROP TABLE#nums - GO - + CREATE nonclustered INDEXIdx_table1_colum2 onTable1 (COLUMN2) - GO
When you make a simple select * query on a table, you get an execution plan with the bookmark lookup operator:
1 SELECT * from WHERE Column2='9'
From the index lookup (Non Clustered) operator, you can see that the SQL Server estimated number of rows is 1 (the estimated number of rows (estimated) property), in fact the SQL The server also processes 1 records (the actual number of rows (Actual) property). This means that the statistics we use here are accurate and the query itself produces 3 logical reads.
Our table now has 1500 records, so when 20% + 500 records change, SQL Server automatically updates the statistics for the nonclustered indexes. To calculate, we need to modify 800 data (1500 * 20% + 500).
Next we do the following: We do a little action on SQL Server and insert only 799 new records. However, the 2nd column value of the 799 record is 2. That means we completely change the average data distribution in column 2nd. The statistics will assume that only 1 records with the 2nd column value 2 are returned, but there are actually 800 records returned (1 existing, 799 newly inserted):
1 SELECT TOP 799 IDENTITY(INT,1,1) asN into#Nums2 from3 Master.dbo.syscolumns SC14 5 INSERT intoTable1 (COLUMN2)6 SELECT 2 from#nums7 8 DROP TABLE#nums9 GO
Now let's execute the following query, look for a record with a 2nd column value of 2, and open the execution plan display and IO statistics.
1 SET STATISTICS on 2 SELECT * from WHERE = ' 2 '
SQL Server reuses an execution plan that has a bookmark lookup. This means that the bookmarks found in the execution plan were executed 1500 times-one time for all records! This consumes a lot of logical reads--SQL server here Reports 806 page reads.
As you can see, the actual line count (Actual number of rows) is now well above the estimated number of rows (estimated numbersof rows).
in SQL Server Distortion the statistical information that will bring about this problem.
Summary
Today's performance Tuning Training I'll give you a brief introduction to the statistics in SQL Server. As you can see, the statistical information about the distortion, the execution plan for the cache, the reuse, can lead to serious performance problems.
I hope that now you have a good understanding of the statistics in SQL Server, and when they expire, it will bring side effects to your execution plan. I'll discuss the statistics further next week, and what they are like inside SQL Server. Please keep your eye on it.
13th/24-Week statistical information