Introduction
The SQL Server Query analyzer is overhead-based. Generally, the query analyzer determines how to select an efficient query route based on the predicate, such as the index to be selected. When the analyzer looks for a path, it does not count the number of rows and ranges of values contained in the index every time, instead, the information is created and updated based on certain conditions and saved to the database. This is the so-called statistical information.
How to view statistics
It is very simple to view SQL server statistics. Run the following command:
DBCC show_statistics ('table name', 'index name ')
The result 1 is displayed.
Figure 1. Statistics
How statistics affect queries
The following is a simple example to show how Statistics affect the query analyzer. I have created a test table with two int-value columns, where ID is auto-incrementing, ref is used to create non-clustered indexes, and 100 pieces of data are inserted, from 1 to 100, insert 9900 data records equal to 100. The statistical information in Figure 1 is the statistical information of the sample data.
Now, I use the ref value after where as the query condition, but given different values, we can see that the query Analyzer made different choices based on the statistical information, as shown in figure 2.
Figure 2. The query optimizer makes different choices based on different predicates
In fact, for the query analyzer, the bar chart is very useful for directly identifiable predicates, such:
Where date = getdate ()
Where id = 12345.
Where monthly_sales: <10000/12
Where name like "careyson" + "%"
However,
Where price = @ VARI
Where total_sales> (select sum (qty) from sales)
Where a. ID = B. ref_id
Where col1 = 1 and col2 = 2
This type of query can only know the value at runtime, and the sampling step is obviously not so easy to use. In addition, if the predicate in the fourth row is two query conditions, the sampling step is not easy to use. No matter how many columns the index has, the sampling step only stores the first column of the index. When the bar chart is no longer used, SQL server uses density to determine the optimal query route.
The formula for density is: 1/The number of unique values in the table. The smaller the density, the easier the index to be selected. For the second table in ratio 1, we can use the following formula to calculate the density:
Figure 3. density of a column
According to the formula, when the data volume in the table increases gradually, the density will decrease.
For queries that cannot be selected based on the sampling step, the query analyzer uses density to estimate the number of rows. The formula is: Estimated number of rows = number of rows in the table * density.
Then, based on this formula, if I make a query, the estimated number of rows will be 4.
Figure 4. Estimated number of rows
Let's verify this conclusion, as shown in Figure 5.
Figure 5. Estimated number of rows
Therefore, we can see that the estimated number of rows is different from the actual number of rows. when the data is evenly distributed or the data volume is large, this error will be very small.
Update statistics
As shown in the preceding example, because the query analyzer depends on statistics for queries, outdated statistics may lead to inefficient queries. Statistics can be managed by SQL Server, updated manually, or updated manually when updated.
When automatic update is enabled, data changes in the SQL Server monitoring table are automatically updated when the critical value is reached. This standard is:
- When inserting data into an empty table
- The number of tables with less than 500 rows increases by 500 rows or more.
- When the number of rows in a table exceeds 500, the amount of data changes is greater than 20%.
If the preceding conditions are met, the statistics are updated.
Of course, you can also use the following statement to manually update statistics.
Update statistics table name [index name]
Column-level statistics
SQL Server can also create statistics for columns that do not belong to any index to help the query analyzer obtain the "Estimated number of rows". when the database-level option "auto create statistics" 6 is enabled.
Figure 6. automatically create statistics
When this option is set to true, when the where predicate specifies a column not on any index, the column statistics will be created, but there are two exceptions:
- The cost of creating statistics exceeds the cost of generating a query plan
- Statistics are not automatically generated when SQL Server is busy.
You can view these statistics in the System View SYS. Stats, as shown in figure 7.
Figure 7. View statistics in the System View
You can also use the following statement to manually create statistics:
Create Statistics statistical name on table name (column name [,... n])
Summary
This article briefly introduces the impact of statistics on the selection of query paths. Outdated statistics can easily reduce query performance. Therefore, regular updating of statistics is one of the important tasks of DBA.
From: http://www.cnblogs.com/CareySon/archive/2012/05/14/HowStatisticImpactQuery.html Introduction
The SQL Server Query analyzer is overhead-based. Generally, the query analyzer determines how to select an efficient query route based on the predicate, such as the index to be selected. When the analyzer looks for a path, it does not count the number of rows and ranges of values contained in the index every time, instead, the information is created and updated based on certain conditions and saved to the database. This is the so-called statistical information.
How to view statistics
It is very simple to view SQL server statistics. Run the following command:
DBCC show_statistics ('table name', 'index name ')
The result 1 is displayed.
Figure 1. Statistics
How statistics affect queries
The following is a simple example to show how Statistics affect the query analyzer. I have created a test table with two int-value columns, where ID is auto-incrementing, ref is used to create non-clustered indexes, and 100 pieces of data are inserted, from 1 to 100, insert 9900 data records equal to 100. The statistical information in Figure 1 is the statistical information of the sample data.
Now, I use the ref value after where as the query condition, but given different values, we can see that the query Analyzer made different choices based on the statistical information, as shown in figure 2.
Figure 2. The query optimizer makes different choices based on different predicates
In fact, for the query analyzer, the bar chart is very useful for directly identifiable predicates, such:
Where date = getdate ()
Where id = 12345.
Where monthly_sales: <10000/12
Where name like "careyson" + "%"
However,
Where price = @ VARI
Where total_sales> (select sum (qty) from sales)
Where a. ID = B. ref_id
Where col1 = 1 and col2 = 2
This type of query can only know the value at runtime, and the sampling step is obviously not so easy to use. In addition, if the predicate in the fourth row is two query conditions, the sampling step is not easy to use. No matter how many columns the index has, the sampling step only stores the first column of the index. When the bar chart is no longer used, SQL server uses density to determine the optimal query route.
The formula for density is: 1/The number of unique values in the table. The smaller the density, the easier the index to be selected. For the second table in ratio 1, we can use the following formula to calculate the density:
Figure 3. density of a column
According to the formula, when the data volume in the table increases gradually, the density will decrease.
For queries that cannot be selected based on the sampling step, the query analyzer uses density to estimate the number of rows. The formula is: Estimated number of rows = number of rows in the table * density.
Then, based on this formula, if I make a query, the estimated number of rows will be 4.
Figure 4. Estimated number of rows
Let's verify this conclusion, as shown in Figure 5.
Figure 5. Estimated number of rows
Therefore, we can see that the estimated number of rows is different from the actual number of rows. when the data is evenly distributed or the data volume is large, this error will be very small.
Update statistics
As shown in the preceding example, because the query analyzer depends on statistics for queries, outdated statistics may lead to inefficient queries. Statistics can be managed by SQL Server, updated manually, or updated manually when updated.
When automatic update is enabled, data changes in the SQL Server monitoring table are automatically updated when the critical value is reached. This standard is:
- When inserting data into an empty table
- The number of tables with less than 500 rows increases by 500 rows or more.
- When the number of rows in a table exceeds 500, the amount of data changes is greater than 20%.
If the preceding conditions are met, the statistics are updated.
Of course, you can also use the following statement to manually update statistics.
Update statistics table name [index name]
Column-level statistics
SQL Server can also create statistics for columns that do not belong to any index to help the query analyzer obtain the "Estimated number of rows". when the database-level option "auto create statistics" 6 is enabled.
Figure 6. automatically create statistics
When this option is set to true, when the where predicate specifies a column not on any index, the column statistics will be created, but there are two exceptions:
- The cost of creating statistics exceeds the cost of generating a query plan
- Statistics are not automatically generated when SQL Server is busy.
You can view these statistics in the System View SYS. Stats, as shown in figure 7.
Figure 7. View statistics in the System View
You can also use the following statement to manually create statistics:
Create Statistics statistical name on table name (column name [,... n])
Summary
This article briefly introduces the impact of statistics on the selection of query paths. Outdated statistics can easily reduce query performance. Therefore, regular updating of statistics is one of the important tasks of DBA.