Analysis Statistics

Analysis Statistics < third >

Last Update:2015-05-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Statistics is a set of information that is stored as a histogram. A histogram is a statistical structure that shows the frequency of data falling into a non-pass classification. The SQL Server-stored histogram includes data distribution samples for 200 rows of columns and index keys (or the first column of multiple column index keys). The information on the range of index key values between two consecutive sampled values is called a step. These steps consist of a non-sizing interval between 200 stored values.

One step provides the following information:

The highest value of the given step (Range_hi_key);
The number of rows equal to Range_hi_key (eq_rows);
The range of rows between the previous highest value and the current highest value, these samples are not counted (range_rows);
The number of non-peers in the range (distinct_range_rows), if all values in the range are unique, then range_rows equals distinct_range_rows;
equals the average number of rows of a key value within the range (avg_range_rows);

As we build the index, statistics start counting. Let's create a table as follows: where the second column Column2 only the first row is 1, the remaining 10,000 rows are all 2. We create an index in the Column2 column and execute the statement:

CREATE nonclustered INDEX  on ta1 (Column2) DBCC Show_statistics (TA1,IX_COLUMN2)

We see the following statistics:

Here's how to run the query: We see two of the above queries, SQL ServerIt is found in statistics that the first returns only a small number of rows, so using an index is more economical. Instead, the second returns too many rows, rather than scanning the clustered index directly.

In addition to the information on the steps, other useful information in the statistics includes:

The last updated event statistics;
The number of rows in the table;
Average index key length;
The number of rows sampled in a histogram;
The density of the column combinations;

The last update time information can help you decide whether you should update statistics manually. The average keyword length represents the average data size in the index key column, which helps you understand the width of the index key, which is an important indicator when determining index validity.One, density

When you create an execution plan, the query optimizer parses statistics for the columns in the filter and join clauses. A highly selective filtering condition restricts the number of rows from a table to a small result set, and helps the optimizer maintain a low query overhead. A column with a unique index will have a very high selectivity because it can limit the number of matching rows to 1.

On the other hand, a low-selectivity filter will return a large result set from the table. Very low-selectivity filtering conditions make nonclustered indexes on columns invalid. Navigating to a base table for a large result set from a nonclustered index is typically more expensive than directly scanning the base table, because the bookmark lookup associated with nonclustered indexes is too expensive.

Statistics track the selectivity of a column in the form of a density (density) ratio. High selectivity (or uniqueness) of the column will have a very low density. Low-density columns (high selectivity) apply to nonclustered indexes because it helps the optimizer retrieve a small number of rows quickly. This is also the primary basis for filtering index operations, because the goal of the filter is to improve the selectivity or density of the index.

The density can be expressed as:

Density =1/ number of different values in the column

The density is a value between 0-1. The lower the column density, the more suitable for nonclustered indexes. You can calculate your own calculations to determine the density of the columns in the index and statistics.

For example, the density on the Column2 column in the TA1 table above

SELECT 1.0 / COUNT (DISTINCT from Ta1

Of course, you can also see the real data in the all density column in the output of DBCC SHOW_STATISTICS. The high density value on this column makes it unsuitable for indexing, even for filtered indexes. However, the index key-value statistic maintained in the step helps the query optimizer use the assertion c1=1 to work with the index.

statistics on multiple-column indexes

In the case where the index has only one column, the statistic contains a histogram and a density value for that column. The statistics for composite indexes with multiple columns consist of a histogram containing only the first column and several density values . This is why it is wise to place the column with the lowest density in the first place when establishing a qualifying index or a compliance statistic. The density value contains the density of the first column and the prefix combination of each index key column. When assertions in the WHERE and join clauses refer to multiple columns, multiple density values help the optimizer find selectivity that conforms to the index. Although the first column can help determine the histogram, the final density of the column itself will be the same regardless of the order of the columns.

Modify the previous index to add a column:

CREATE nonclustered INDEX  on  with Drop_existing

Take a look at the statistics:

As you can see, there are two density values under the all density column:

The density value of the 1th column;
(1th + 2nd) The density of the column;

For multi-column indexes with 3 columns, the index statistic also contains the density value of the column (1th + 3rd). This statistic does not contain density values for any other combination of columns. Therefore, this index (IX_COLUMN2) is not useful for filtering rows only in the 2nd column (column1: Assuming this column has no other index), because the Column1 column's individual density values are not maintained in the statistics.third, the filter index on the statistics

The purpose of the filtered index is to change the data that makes up the index, thereby altering the histogram and density to make the index performance better.

We make an index on the name column on a person table, 10,000 rows of data, 3000 rows name is null: In another way, the index is indexed by filtering out null:

We see a sharp drop in the number of rows that make up the statistics because there is a filter that filters 3000 rows and the number of rows is certainly down. The average keyword length is increased because it is not in the processing length 0 name.

The density measurements are very interesting and the two values are very close, but the filtered density is slightly lower, indicating fewer unique values. This is because the filtered data, although slightly less selective, is actually more accurate, eliminating the control that does not contribute to the search. The second value density represents the clustered index pointer, which is the same as the individual density value of name, because both represent the same number of unique values. The density of the attached clustered index in the previous column is a much smaller number, because the null value is eliminated and all the ID unique values that are not included in the filtered data are removed.

Another open option is to create a filtered index, which allows you to create a more finely tuned histogram on a partitioned table. This is necessary because the statistics are not automatically created on the partitioned table and cannot be created by using create statistics itself. You can create filtered indexes and get statistics or create specific filtering statistics by partitioning.

Analysis Statistics < third >

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis Statistics < third >

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support