Analysis of the effect of statistics on query in SQL Server _mssql

Source: Internet
Author: User
And every time the Query Analyzer looking for the path, and not every time to statistics the number of rows included in the index, the range of values, but based on certain conditions to create and update the information after the save to the database, which is called statistical information.

How to view statistics
Viewing SQL Server statistics is simple, using the following directives:
DBCC show_statistics (' table name ', ' Index name ')

The resulting results are shown in Figure 1.

Figure 1. Statistical information

How statistical information affects queries

Let's use a simple example to see how the statistics affect the Query Analyzer. I set up a test table with two int values of columns, where ID is self increment, nonclustered index is set up on ref, insert 100 data, from 1 to 100, and then insert 9,900 data equal to 100. The statistics in Figure 1 are statistical information for the sample data.

At this point, where I use the ref value as the query condition, but given the different values, we can see that the Query Analyzer has made a different choice based on the statistics, as shown in Figure 2.

Figure 2. According to different predicates, the query optimizer makes different choices

In fact, for Query Analyzer, a histogram is useful for predicate predicates that are straightforward to determine, such as:

where date = GetDate ()
where Id= 12345
where Monthly_sales < 10000/12
Where name like "Careyson" + "%"

But for example

where Price = @vari
where Total_sales > (select SUM (qty) from Sales)
where a.ID =b.ref_id

where col1 =1 and col2=2

This kind of query at runtime to know the value, sampling step size is obviously not so useful. In addition, line fourth above if the predicate is two query criteria, using the sampling step size is also not good. Because no matter how many columns the index has, the sampling step only stores the first column of the index. When the histogram no longer works, SQL Server uses the density to determine the best query route.

The formula for density is: the number of unique values in the 1/table. When the density is more than an hour, the index is easier to select. For example, the second table in Figure 1, we can calculate the density by using the following formula:

Fig. 3. Density of a column

It can be inferred from the formula that when the amount of data in the table grows, the density becomes smaller.

For queries that cannot be selected according to the sampling step size, the Query Analyzer uses density to estimate the number of rows, which is: estimated number of rows = number of rows in the table * density

So, according to this formula, if I do a query, the estimated number of rows will be the number shown in Figure 4.

Figure 4. Estimated Number of rows

Let's verify this conclusion as shown in Figure 5.

Figure 5. Estimated Number of rows

Therefore, we can see that the estimated number of rows is not the actual number of rows, when the data distribution uniformity, or the large amount of data, this error will become very small.

Update of statistics

As you can see from the example above, the Query Analyzer relies on statistical information to query, so outdated statistics can lead to inefficient queries. Statistics can be managed either by SQL Server, by manual updating, or manually by SQL Server when updates are managed.

When Automatic Updates is turned on, the data in the SQL Server Monitor table changes and the data is automatically updated when the critical value is reached. This standard is:

When inserting data into an empty table, the table with less than 500 rows adds 500 rows or more when the rows in the table are more than 500 rows, the data changes more than 20%

The satisfaction of the above conditions will cause the statistics to be updated.

Of course, we can also manually update the statistics using the following statement.

UPDATE STATISTICS table name [index name]

Column-level statistical information

SQL Server can also create statistics for columns that do not belong to any index to help Query Analyzer get the estimated number of rows. When we open the database-level option, "CREATE statistics automatically" as shown in Figure 6.

Figure 6. Automatically create statistics

When this option is set to true, column statistics are created when our where predicate specifies a column that is not on any index, but there are two exceptions:

The cost of creating statistics exceeds the cost of generating query plans when SQL Server is busy, statistics are not generated automatically

We can view these statistics through the system view sys.stats, as shown in Figure 7.

Figure 7. View statistics from the system view

Of course, you can also create statistics manually by using the following statement:

CREATE STATISTICS statistic name on table name (column name [,... N])

Summary

This paper briefly discusses the influence of statistical information on query path selection. Outdated statistics can easily result in lower query performance. Therefore, updating statistics regularly is one of the important tasks for DBAs.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.