Translation: Index column, column selection rate, and equality predicates

Source: Internet
Author: User

This articleArticleIt can also be called "The column should be placed at the top of the index ".

Generally, the selection of index columns is based on the highest selection rate (Note: The select rate refers to the ratio of the number of times used as the selection condition in the WHERE clause) the column is placed at the beginning. I am not going to say that this criterion is incorrect because it is correct. However, when this criterion is given, it does not show why the columns with the highest selection rate should be used as index columns and the order of index columns.

To sum up, this may lead to misunderstanding of the selection of index columns. For example, in extreme cases, if someone experiences the above suggestions, set all the index keys of non-clustered indexes to the keys of clustered indexes (because this column has a high selection rate ), then I began to figure out why the database performance began to be terrible.

The above extreme situation occurs because SQL server stores statistics for each index, but this statistics only records the statistical distribution of the first column of the index, which means that the index only knows the data distribution of the first column, if the first column is not used as a predicate, the index may still be used, but this is not all.

In addition to the statistical distribution chart, SQL Server also stores the density of all subsets of the index column. For three columns as the composite index key, SQL Server will store the density of the first column, the composite density of the first and second columns, and the composite density of the entire three columns. The word density indicates the probability of the data stored in the column. The formula is 1/unique value. This value on each index can be viewed by using the DBCC show_statistics plus the density_vector option.

This also means that although SQL Server knows the data distribution in the first column, it also knows the average value of the data contained in other key combinations in the index column.

So what should we do for the order of index columns? If you want to place the column with the highest choice rate in the first place, the sequence of the remaining columns does not matter.

The following table shows what this means.

 
Create Table consideringindexorder (ID int identity, somestring varchar (100), somedate datetime default getdate ());

Assume that the above table has 10000 rows without clustered indexes, so it is based on Heap Storage. Then, the somestring column contains 100 different values, and the somedate column contains 5000 different values, the ID column is unique because it is auto-increment.

Create a non-clustered index that contains the preceding hashes in the order of ID, somedate, and somestring.

The index created above can only be used when the predicate is as follows:

 
... Where id = @ ID and somedate = @ DT and somestring = @ STR... Where id = @ ID and somedate = @ DT... Where id = @ ID

In other words, the subsets of these three columns can be used by where and join only when they are included in the predicates in the order from left to right.

If you only use the somedate column after the WHERE clause as the filter condition, you cannot use an index for search. This is like finding a phone number based on the person's name rather than the last name in the phone book. to find this person, you cannot use a directory, but you can only go through the entire phone version.

In addition, the column with the highest choice rate is placed on the leftmost, but this column is rarely used in the predicate. If a large number of where statements are used to filter other columns, only index scanning is required, and scanning costs are very high.

It is concluded that the criteria for selecting the columns to be placed in the first column of the index are not unique, but are based on the most frequently used queries in the database. If these columns are filtered by an equal sign after the WHERE clause, then there is no doubt that selecting the column with the highest selection rate should be put first, so that SQL server will have more chances to know that this index is useful. If not, place the most frequently used column after the WHERE clause to the first column, so that this index can be applied to more queries.

The following queries are based on the table created earlier in the article.

 

Scenario 1: Use Id as the predicate and use equal sign for filtering

This is the simplest scenario, because it directly matches the first column of the index, and the index only finds the data by searching.

 

Scenario 2: Use the ID and somedate columns as the predicates and use the equal sign for filtering.

This scenario is also very simple. It uses subsets with the same sequence of non-clustered indexes as filter conditions, so data can be searched.

Scenario 3: Use the ID and somestring columns as the predicates and use the equal sign for filtering.

This scenario is a bit interesting. You can only use ID as the index search condition, because the somestring column is not the second column of the index. The second column of this index is somedate, but the query is not filtered according to somedate. Therefore, this query first uses the ID for filtering, and then the filtered columns for string comparison to find matching rows. Although this operation is implemented by searching, SQL Server only uses searching to find the ID, and then compares the filtered rows with strings.

 

Scenario 4: Use the somedate and somestring columns as predicates and use the equal sign for filtering

In this scenario, SQL Server cannot use lookup. The first column of the index is not included in the predicate of the query. This query can only be satisfied by scanning. In fact, SQL Server needs to compare each row in the two columns in the table with the given value to find the required rows.

 

I think the above knowledge has basically covered equality predicates related to indexes. Maybe I am confused about this article. But at least I Want You To better understand the index and equal sign matching.

 

 

Link: http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

 

Translated by careyson

Click here to download the PDF.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.