SQL Server statistics impact on queries

Source: Internet
Author: User
Tags sql server query
Introduction

The SQL Server Query analyzer is overhead-based. Generally, the query analyzer determines how to select an efficient query route based on the predicate, such as the index to be selected. When the analyzer looks for a path, it does not count the number of rows and ranges of values contained in the index every time, instead, the information is created and updated based on certain conditions and saved to the database. This is the so-called statistical information.

 

How to view statistics

It is very simple to view SQL server statistics. Run the following command:

DBCC show_statistics ('table name', 'index name ')

 

The result 1 is displayed.

Figure 1. Statistics

 

How statistics affect queries

The following is a simple example to show how Statistics affect the query analyzer. I have created a test table with two int-value columns, where ID is auto-incrementing, ref is used to create non-clustered indexes, and 100 pieces of data are inserted, from 1 to 100, insert 9900 data records equal to 100. The statistical information in Figure 1 is the statistical information of the sample data.

Now, I use the ref value after where as the query condition, but given different values, we can see that the query Analyzer made different choices based on the statistical information, as shown in figure 2.

Figure 2. The query optimizer makes different choices based on different predicates

 

In fact, for the query analyzer, the bar chart is very useful for directly identifiable predicates, such:

Where date = getdate ()
Where id = 12345.
Where monthly_sales: <10000/12
Where name like "careyson" + "%"

 

However,

Where price = @ VARI
Where total_sales> (select sum (qty) from sales)
Where a. ID = B. ref_id

Where col1 = 1 and col2 = 2

This type of query can only know the value at runtime, and the sampling step is obviously not so easy to use. In addition, if the predicate in the fourth row is two query conditions, the sampling step is not easy to use. No matter how many columns the index has, the sampling step only stores the first column of the index. When the bar chart is no longer used, SQL server uses density to determine the optimal query route.

The formula for density is: 1/The number of unique values in the table. The smaller the density, the easier the index to be selected. For the second table in ratio 1, we can use the following formula to calculate the density:

Figure 3. density of a column

 

According to the formula, when the data volume in the table increases gradually, the density will decrease.

For queries that cannot be selected based on the sampling step, the query analyzer uses density to estimate the number of rows. The formula is: Estimated number of rows = number of rows in the table * density.

Then, based on this formula, if I make a query, the estimated number of rows will be 4.

Figure 4. Estimated number of rows

 

Let's verify this conclusion, as shown in Figure 5.

Figure 5. Estimated number of rows

 

Therefore, we can see that the estimated number of rows is different from the actual number of rows. when the data is evenly distributed or the data volume is large, this error will be very small.

 

Update statistics

As shown in the preceding example, because the query analyzer depends on statistics for queries, outdated statistics may lead to inefficient queries. Statistics can be managed by SQL Server, updated manually, or updated manually when updated.

When automatic update is enabled, data changes in the SQL Server monitoring table are automatically updated when the critical value is reached. This standard is:

    • When inserting data into an empty table
    • The number of tables with less than 500 rows increases by 500 rows or more.
    • When the number of rows in a table exceeds 500, the amount of data changes is greater than 20%.

If the preceding conditions are met, the statistics are updated.

Of course, you can also use the following statement to manually update statistics.

Update statistics table name [index name]

 

Column-level statistics

SQL Server can also create statistics for columns that do not belong to any index to help the query analyzer obtain the "Estimated number of rows". when the database-level option "auto create statistics" 6 is enabled.

Figure 6. automatically create statistics

 

When this option is set to true, when the where predicate specifies a column not on any index, the column statistics will be created, but there are two exceptions:

    • The cost of creating statistics exceeds the cost of generating a query plan
    • Statistics are not automatically generated when SQL Server is busy.

 

You can view these statistics in the System View SYS. Stats, as shown in figure 7.

Figure 7. View statistics in the System View

 

You can also use the following statement to manually create statistics:

Create Statistics statistical name on table name (column name [,... n])

 

Summary

This article briefly introduces the impact of statistics on the selection of query paths. Outdated statistics can easily reduce query performance. Therefore, regular updating of statistics is one of the important tasks of DBA.

From: http://www.cnblogs.com/CareySon/archive/2012/05/14/HowStatisticImpactQuery.html Introduction

The SQL Server Query analyzer is overhead-based. Generally, the query analyzer determines how to select an efficient query route based on the predicate, such as the index to be selected. When the analyzer looks for a path, it does not count the number of rows and ranges of values contained in the index every time, instead, the information is created and updated based on certain conditions and saved to the database. This is the so-called statistical information.

 

How to view statistics

It is very simple to view SQL server statistics. Run the following command:

DBCC show_statistics ('table name', 'index name ')

 

The result 1 is displayed.

Figure 1. Statistics

 

How statistics affect queries

The following is a simple example to show how Statistics affect the query analyzer. I have created a test table with two int-value columns, where ID is auto-incrementing, ref is used to create non-clustered indexes, and 100 pieces of data are inserted, from 1 to 100, insert 9900 data records equal to 100. The statistical information in Figure 1 is the statistical information of the sample data.

Now, I use the ref value after where as the query condition, but given different values, we can see that the query Analyzer made different choices based on the statistical information, as shown in figure 2.

Figure 2. The query optimizer makes different choices based on different predicates

 

In fact, for the query analyzer, the bar chart is very useful for directly identifiable predicates, such:

Where date = getdate ()
Where id = 12345.
Where monthly_sales: <10000/12
Where name like "careyson" + "%"

 

However,

Where price = @ VARI
Where total_sales> (select sum (qty) from sales)
Where a. ID = B. ref_id

Where col1 = 1 and col2 = 2

This type of query can only know the value at runtime, and the sampling step is obviously not so easy to use. In addition, if the predicate in the fourth row is two query conditions, the sampling step is not easy to use. No matter how many columns the index has, the sampling step only stores the first column of the index. When the bar chart is no longer used, SQL server uses density to determine the optimal query route.

The formula for density is: 1/The number of unique values in the table. The smaller the density, the easier the index to be selected. For the second table in ratio 1, we can use the following formula to calculate the density:

Figure 3. density of a column

 

According to the formula, when the data volume in the table increases gradually, the density will decrease.

For queries that cannot be selected based on the sampling step, the query analyzer uses density to estimate the number of rows. The formula is: Estimated number of rows = number of rows in the table * density.

Then, based on this formula, if I make a query, the estimated number of rows will be 4.

Figure 4. Estimated number of rows

 

Let's verify this conclusion, as shown in Figure 5.

Figure 5. Estimated number of rows

 

Therefore, we can see that the estimated number of rows is different from the actual number of rows. when the data is evenly distributed or the data volume is large, this error will be very small.

 

Update statistics

As shown in the preceding example, because the query analyzer depends on statistics for queries, outdated statistics may lead to inefficient queries. Statistics can be managed by SQL Server, updated manually, or updated manually when updated.

When automatic update is enabled, data changes in the SQL Server monitoring table are automatically updated when the critical value is reached. This standard is:

    • When inserting data into an empty table
    • The number of tables with less than 500 rows increases by 500 rows or more.
    • When the number of rows in a table exceeds 500, the amount of data changes is greater than 20%.

If the preceding conditions are met, the statistics are updated.

Of course, you can also use the following statement to manually update statistics.

Update statistics table name [index name]

 

Column-level statistics

SQL Server can also create statistics for columns that do not belong to any index to help the query analyzer obtain the "Estimated number of rows". when the database-level option "auto create statistics" 6 is enabled.

Figure 6. automatically create statistics

 

When this option is set to true, when the where predicate specifies a column not on any index, the column statistics will be created, but there are two exceptions:

    • The cost of creating statistics exceeds the cost of generating a query plan
    • Statistics are not automatically generated when SQL Server is busy.

 

You can view these statistics in the System View SYS. Stats, as shown in figure 7.

Figure 7. View statistics in the System View

 

You can also use the following statement to manually create statistics:

Create Statistics statistical name on table name (column name [,... n])

 

Summary

This article briefly introduces the impact of statistics on the selection of query paths. Outdated statistics can easily reduce query performance. Therefore, regular updating of statistics is one of the important tasks of DBA.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.