A more incisive index article

Source: Internet
Author: User
Tags date end getdate key sql query sort first row
Compare | index | Index In fact, you can interpret an index as a special directory. Microsoft's SQL Server provides two indexes: a clustered index (clustered index, also known as a clustered, clustered index) and a nonclustered index (nonclustered index, also known as a nonclustered and nonclustered index). For example, let's take a look at the difference between a clustered index and a nonclustered index:

In fact, the text of our Chinese dictionary is itself a clustered index. For example, if we look at the word "Ann", we will naturally open the first few pages of the dictionary, because the pinyin of "an" is "an", and the dictionary of Chinese characters according to Pinyin is the beginning of the English letter "a" and ending with "Z", then the word "Ann" is naturally ranked in the front of the dictionary. If you end up with all the "a" parts you still can't find the word, so you don't have the word in your dictionary; Similarly, if you look at the word "Zhang", you will also turn your dictionary to the last part, because "Zhang" Pinyin is "Zhang". That is, the body part of the dictionary itself is a directory, and you don't need to look up other catalogs to find what you need to find.

We refer to the content of the body as a "clustered index", which is arranged according to certain rules.

If you know a word, you can quickly find the word from automatic. But you may also encounter words you don't know, do not know its pronunciation, at this time, you can not follow the way you have just found the word you want to search, and need to go according to the "radical" to find the word you are looking for, and then according to the number of the word after the page directly to find the word you are looking for. But the sort of word you find in conjunction with the "Radical catalog" and "CJK ideographs table" is not really the sort of method of body text, for example, you look up the "Zhang" word, we can see in the CJK ideographs table after the radical, "Zhang" page number is 672 pages, CJK ideographs Table "Zhang" above is "Chi" word, but the page number is 63 pages, "Zhang" below is "crossbow "Word, the page is 390 pages. Obviously, these words are not really in the "Zhang" word of the upper and lower side, now you see the continuous "Chi, Zhang, crossbow" three words is actually their sorting in the nonclustered index, is the dictionary body of words in the nonclustered index mapping. We can find the word you need in this way, but it takes two processes to find the result in the TOC and then turn to the page number you want.

We refer to this catalogue as purely a directory, and the text is simply a sort of text that is called a nonclustered index.

From the above example, we can understand what is "clustered index" and "nonclustered index".

Further, we can easily understand that each table can have only one clustered index, because the catalog can only be sorted in one way.

(ii) When to use clustered or nonclustered indexes

The following table summarizes when to use clustered or nonclustered indexes (very important).

Action Description
Using Clustered Indexes
Using Nonclustered indexes

Columns are often sorted in groups
Should
Should

Returns data in a range
Should
should not be

One or very few different values
should not be
should not be

A small number of different values
Should
should not be

A large number of different values
should not be
Should

frequently updated columns
should not be
Should

FOREIGN key columns
Should
Should

Primary key columns
Should
Should

Frequently modify index columns
should not be
Should


In fact, we can understand the table above by using examples of the definitions of the previous clustered and nonclustered indexes. For example, returns a range of data items. For example, if you have a table with a time column that happens when you set up the aggregate index in that column, this speed will be quick when you query all the data from January 1, 2004 to October 1, 2004, because the text of your dictionary is sorted by date, The clustering index only needs to find the beginning and end data in all the data to be retrieved, and unlike nonclustered indexes, you must first look up the page number of each item in the table of contents, and then find the specific content based on the page number.

(three) combining with practice, talking about the misunderstanding of index use

The purpose of the theory is to apply. Although we have just listed when a clustered index or nonclustered index should be used, the above rules are easily overlooked in practice or cannot be analyzed in the light of actual circumstances. Below we will talk about the use of the index based on the actual problems encountered in practice, so as to facilitate the understanding of the method of index establishment.

1, the primary key is the clustered index

This idea is extremely wrong and is a waste of the clustered index. Although SQL Server creates a clustered index on the primary key by default.

Typically, we create an ID column in each table to distinguish each piece of data, and the ID column is automatically enlarged, and the step size is typically 1. This is the case for the column GID in our example of office automation. At this point, if we set this column as the primary key, SQL Server will think of this Lieme as a clustered index. The good thing about this is that you can physically sort your data in the database by ID, but I don't think it makes much sense.

Obviously, the advantages of clustered indexes are obvious, and each table can have only one clustered index rule, which makes the clustered index more valuable.

From the definition of the clustered index we talked about earlier, we can see that the biggest benefit of using a clustered index is the ability to quickly narrow the scope of the query to avoid full table scans, based on query requirements. In practice, because the ID number is generated automatically, we do not know the ID number of each record, so it is very difficult to use the ID number for the query. This makes the ID number the primary key as a clustered index a waste of resources. Second, having a field with a different ID number as a clustered index does not conform to the rule that the aggregate index should not be established in the case of a large number of different values; Of course, this is only a negative effect on the user's frequent modification of the record, especially the index, but has no effect on the query speed.

In the office automation system, whether the system home page to show the need for the user to sign the file, the meeting or the user file query, and so on in any case data query can not be separated from the field is "date" and the user's own "user name."

Usually, the home page of office automation will display documents or meetings that have not been signed by each user. Although our where statement can only limit the situation that the current user has not signed up for, if your system has been built for a long time and has a large amount of data, it does not make sense to have a full table scan every time a user opens the home page, The vast majority of users have been browsing the files 1 months ago, doing so only to increase the cost of the database. In fact, we can allow users to open the System home page, the database only to query the user nearly 3 months of unread files, through the "Date" field to limit the table scan, improve query speed. If your office automation system has been established for 2 years, then your home page display speed will theoretically be 8 times times faster, or even quicker.

The word "theoretically" is mentioned here because if your clustered index is blindly built on the ID this primary key, your query speed is not so high, even if you set up an index (not an aggregate index) on the Date field. Now let's take a look at the speed performance of the various queries in the case of 10 million data volumes (250,000 for 3 months):
(1) A clustered index is established on the primary key and does not divide the time period:

Select Gid,fariqi,neibuyonghu,title from Tgongwen

Spents: 128470 milliseconds (ie: 128 seconds)

(2) A clustered index is established on the primary key and a nonclustered index is established on the Fariq:

Select Gid,fariqi,neibuyonghu,title from Tgongwen

where Fariqi> DateAdd (Day,-90,getdate ())

Spents: 53763 milliseconds (54 seconds)

(3) Set up the aggregate index on the date column (Fariqi):

Select Gid,fariqi,neibuyonghu,title from Tgongwen

where Fariqi> DateAdd (Day,-90,getdate ())

Spents: 2423 milliseconds (2 seconds)

Although each statement extracts 250,000 data, the differences in the various situations are enormous, especially when the clustered index is built on a date column. In fact, if your database really has 10 million capacity, the primary key is built on the ID column, as in the 1th or 2 cases above, the performance on the Web page is timed out, simply cannot be displayed. This is one of the most important factors that I discard the ID column as a clustered index.

The way to get the above speed is to add: Declare @d datetime before each SELECT statement

Set @d=getdate ()

and add after the SELECT statement:

Select statement execution takes time (ms)]=datediff (Ms,@d,getdate ())

2, as long as the establishment of the index can significantly improve query speed

In fact, we can see that in the above example, the 2nd and 3 statements are exactly the same, and the fields indexed are the same; the difference is that the first is a Fariqi index created on the field, which is an aggregated index, but the query speed is a different one. Therefore, it is not easy to build indexes on any field to improve query speed.

From the statement in the table, we can see that there are 5,003 different records in the Fariqi field in the table with 10 million data. It is more appropriate to establish an aggregated index on this field. In reality, we send a few files every day, and the documents are issued on the same date, which is exactly the same as setting up a clustered index: "Neither the vast majority nor the very few are the same" rule. As a result, it is important that we build an "appropriate" aggregate index to improve our query speed.

3, all need to improve the query speed of the fields are added to the clustered index to improve query speed

As mentioned above, the data query can not be separated from the field is "date" and the user's own "user name." Now that these two fields are so important, we can combine them to create a composite index (compound index).

Many people think that as long as you add any field to the clustered index, you can increase the speed of the query, and others are puzzled: if the composite clustered index field separate query, then the query speed will slow down? With this question, let's take a look at the following query speed (result set is 250,000 data): (Date column Fariqi first row in the composite clustered index, the user name Neibuyonghu row in the back column)

(1) Select Gid,fariqi,neibuyonghu,title from Tgongwen where fariqi> ' 2004-5-5 '

Query speed: 2513 ms

(2) Select Gid,fariqi,neibuyonghu,title from Tgongwen where fariqi> ' 2004-5-5 ' and neibuyonghu= ' office '

Query speed: 2516 ms

(3) Select Gid,fariqi,neibuyonghu,title from Tgongwen where neibuyonghu= ' office '

Query speed: 60280 ms

From the above experiment, we can see that if only the starting column of the clustered index is used as the query and the query speed of all the columns using the composite clustered index is almost the same, even faster than using all the composite indexed columns (in the case of the number of query result sets) , and this index has no effect if only the non-starting column of a composite clustered index is used as a query condition. Of course, the query speed of statements 1 and 2 is the same as the number of entries in the query, if all the columns of the composite index are used and the query results are few, this will result in an "index overlay", thus achieving optimal performance. Also, keep in mind that no matter how often you use other columns that aggregate indexes, the leading columns must be the most frequently used columns.

(iv) Summary of the experience of indexing used in other books

1, the use of aggregate index than the index is not aggregated faster than the primary key

Here is the instance statement: (all 250,000 data extracted)

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '

Usage Time: 3326 ms

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid<=250000

Usage Time: 4470 ms

Here, use the aggregate index



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.