SQL SERVER2000 index structure and its using method

Source: Internet
Author: User
Tags date getdate sql query range sort first row
server| Index

an understanding of the index structure

In fact, you can interpret the index as a special kind of directory. Microsoft's SQL Server provides two indexes: a clustered index (clustered index, also known as a clustered, clustered index) and a nonclustered index (nonclustered index, also known as a nonclustered and nonclustered index). For example, let's take a look at the difference between a clustered index and a nonclustered index:

In fact, the text of our Chinese dictionary is itself a clustered index. For example, if we look at the word "Ann", we will naturally open the first few pages of the dictionary, because the pinyin of "an" is "an", and the dictionary of Chinese characters according to Pinyin is the beginning of the English letter "a" and ending with "Z", then the word "Ann" is naturally ranked in the front of the dictionary. If you end up with all the "a" parts you still can't find the word, so you don't have the word in your dictionary; Similarly, if you look at the word "Zhang", you will also turn your dictionary to the last part, because "Zhang" Pinyin is "Zhang". That is, the body part of the dictionary itself is a directory, and you don't need to look up other catalogs to find what you need to find. We refer to the content of the body as a "clustered index", which is arranged according to certain rules.

If you know a word, you can quickly find the word from automatic. But you may also encounter words you don't know, do not know its pronunciation, at this time, you can not follow the way you have just found the word you want to search, and need to go according to the "radical" to find the word you are looking for, and then according to the number of the word after the page directly to find the word you are looking for. But the sort of word you find in conjunction with the "Radical catalog" and "CJK ideographs table" is not really the sort of method of body text, for example, you look up the "Zhang" word, we can see in the CJK ideographs table after the radical, "Zhang" page number is 672 pages, CJK ideographs Table "Zhang" above is "Chi" word, but the page number is 63 pages, "Zhang" below is "crossbow "Word, the page is 390 pages. Obviously, these words are not really in the "Zhang" word of the upper and lower side, now you see the continuous "Chi, Zhang, crossbow" three words is actually their sorting in the nonclustered index, is the dictionary body of words in the nonclustered index mapping. We can find the word you need in this way, but it takes two processes to find the result in the TOC and then turn to the page number you want. We refer to this catalogue as purely a directory, and the text is simply a sort of text that is called a nonclustered index.

From the above example, we can understand what is "clustered index" and "nonclustered index". Further, we can easily understand that each table can have only one clustered index, because the catalog can only be sorted in one way.

Ii. When to use clustered or nonclustered indexes

The following table summarizes when to use clustered or nonclustered indexes (very important):

       
        
         
        Action description Using a             clustered index with a nonclustered     index 
        
         
Columns are often grouped sorted should be
Returning data within a range should not be
One or very few different values should not be
A small number of different values should not be
A large number of different values should not be
Columns that are frequently updated should not be
The foreign key column should be
Primary key columns should be
Frequently modifying index columns should not be

In fact, we can understand the table above by using examples of the definitions of the previous clustered and nonclustered indexes. For example, returns a range of data items. For example, if you have a table with a time column that happens when you set up the aggregate index in that column, this speed will be quick when you query all the data from January 1, 2004 to October 1, 2004, because the text of your dictionary is sorted by date, The clustering index only needs to find the beginning and end data in all the data to be retrieved, and unlike nonclustered indexes, you must first look up the page number of each item in the table of contents, and then find the specific content based on the page number.

third, combining with practice, talking about the misunderstanding of index use

The purpose of the theory is to apply. Although we have just listed when a clustered index or nonclustered index should be used, the above rules are easily overlooked in practice or cannot be analyzed in the light of actual circumstances. Below we will talk about the use of the index based on the actual problems encountered in practice, so as to facilitate the understanding of the method of index establishment.

1, the primary key is the clustered index

This idea is extremely wrong and is a waste of the clustered index. Although SQL Server creates a clustered index on the primary key by default.

Typically, we create an ID column in each table to distinguish each piece of data, and the ID column is automatically enlarged, and the step size is typically 1. This is the case for the column GID in our example of office automation. At this point, if we set this column as the primary key, SQL Server will think of this Lieme as a clustered index. The good thing about this is that you can physically sort your data in the database by ID, but I don't think it makes much sense.

Obviously, the advantages of clustered indexes are obvious, and each table can have only one clustered index rule, which makes the clustered index more valuable.

From the definition of the clustered index we talked about earlier, we can see that the biggest benefit of using a clustered index is the ability to quickly narrow the scope of the query to avoid full table scans, based on query requirements. In practice, because the ID number is generated automatically, we do not know the ID number of each record, so it is very difficult to use the ID number for the query. This makes the ID number the primary key as a clustered index a waste of resources. Second, having a field with a different ID number as a clustered index does not conform to the rule that the aggregate index should not be established in the case of a large number of different values; Of course, this is only a negative effect on the user's frequent modification of the record, especially the index, but has no effect on the query speed.

In the office automation system, whether the system home page to show the need for the user to sign the file, the meeting or the user file query, and so on in any case data query can not be separated from the field is "date" and the user's own "user name."

Usually, the home page of office automation will display documents or meetings that have not been signed by each user. Although our where statement can only limit the situation that the current user has not signed up for, if your system has been built for a long time and has a large amount of data, it does not make sense to have a full table scan every time a user opens the home page, The vast majority of users have been browsing the files 1 months ago, doing so only to increase the cost of the database. In fact, we can allow users to open the System home page, the database only to query the user nearly 3 months of unread files, through the "Date" field to limit the table scan, improve query speed. If your office automation system has been established for 2 years, then your home page display speed will theoretically be 8 times times faster, or even quicker.

The word "theoretically" is mentioned here because if your clustered index is blindly built on the ID this primary key, your query speed is not so high, even if you set up an index (not an aggregate index) on the Date field. Now let's take a look at the speed performance of the various queries in the case of 10 million data volumes (250,000 for 3 months):

(1) A clustered index is established on the primary key and does not divide the time period:

Select gid,fariqi,neibuyonghu,title from Tgongwen: 128470 MS (that is, 128 seconds)

(2) A clustered index is established on the primary key and a nonclustered index is established on the Fariq:

Select Gid,fariqi,neibuyonghu,title from Tgongwen

where Fariqi> DateAdd (Day,-90,getdate ()) Spents: 53763 milliseconds (54 seconds)

(3) Set up the aggregate index on the date column (Fariqi):

Select Gid,fariqi,neibuyonghu,title from Tgongwen

where Fariqi> DateAdd (Day,-90,getdate ()) Spents: 2423 milliseconds (2 seconds)

Although each statement extracts 250,000 data, the differences in the various situations are enormous, especially when the clustered index is built on a date column. In fact, if your database really has 10 million capacity, the primary key is built on the ID column, as in the 1th or 2 cases above, the performance on the Web page is timed out, simply cannot be displayed. This is one of the most important factors that I discard the ID column as a clustered index. The way to get the above speed is to add the preceding individual SELECT statements:

DECLARE @d datetime

Set @d=getdate () and adds after the SELECT statement:

Select statement execution takes time (ms)]=datediff (Ms,@d,getdate ())

2, as long as the establishment of the index can significantly improve query speed

In fact, we can see that in the above example, the 2nd and 3 statements are exactly the same, and the fields indexed are the same; the difference is that the first is a Fariqi index created on the field, which is an aggregated index, but the query speed is a different one. Therefore, it is not easy to build indexes on any field to improve query speed.

From the statement in the table, we can see that there are 5,003 different records in the Fariqi field in the table with 10 million data. It is more appropriate to establish an aggregated index on this field. In reality, we send a few files every day, and the documents are issued on the same date, which is exactly the same as setting up a clustered index: "Neither the vast majority nor the very few are the same" rule. As a result, it is important that we build an "appropriate" aggregate index to improve our query speed.

3, all need to improve the query speed of the fields are added to the clustered index to improve query speed

As mentioned above, the data query can not be separated from the field is "date" and the user's own "user name." Now that these two fields are so important, we can combine them to create a composite index (compound index).

Many people think that as long as you add any field to the clustered index, you can increase the speed of the query, and others are puzzled: if the composite clustered index field separate query, then the query speed will slow down? With this question, let's take a look at the following query speed (the result set is 250,000 data): (Date column Fariqi first row at the beginning of the composite clustered index, user name Neibuyonghu in the latter column):

(1) Select Gid,fariqi,neibuyonghu,title from Tgongwen where fariqi> ' 2004-5-5 ' query speed: 2513 ms

(2) Select Gid,fariqi,neibuyonghu,title from Tgongwen

where fariqi> ' 2004-5-5 ' and neibuyonghu= ' ' Office ' query speed: 2516 ms

(3) Select Gid,fariqi,neibuyonghu,title from Tgongwen where neibuyonghu= ' office ' query speed: 60280 ms

From the above experiment, we can see that if only the starting column of the clustered index is used as the query and the query speed of all the columns using the composite clustered index is almost the same, even faster than using all the composite indexed columns (in the case of the number of query result sets) , and this index has no effect if only the non-starting column of a composite clustered index is used as a query condition. Of course, the query speed of statements 1 and 2 is the same as the number of entries in the query, if all the columns of the composite index are used and the query results are few, this will result in an "index overlay", thus achieving optimal performance. Also, keep in mind that no matter how often you use other columns that aggregate indexes, the leading columns must be the most frequently used columns.

Iv. Summary of indexes used in other books

1, the use of aggregate index than the index is not aggregated faster than the primary key

Here is the instance statement: (all 250,000 data extracted)

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 ' usage time: 3326 ms

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid<=250000 used: 4470 ms

In this case, it is nearly 1/4 faster to use an aggregate index than to use a primary key that is not an aggregate index.

2, with the aggregate index than the normal primary key for the order by when the speed, especially in small amount of data

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen to Fariqi: 12936

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen the ORDER by GID: 18843

In this case, the aggregate index is 3/10 faster than the normal primary key. In fact, if the amount of data is small, a clustered index is much faster than using a nonclustered index, and if the amount of data is large, such as over 100,000, the speed difference between the two is not obvious.

3. Use the time period within the aggregation index to reduce the search time by the percentage of the data in the entire table, regardless of the number of aggregated indexes used:

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-1-1 ': 6343 MS (extract 1 million)

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-6-6 ': 3170 MS (extract 500,000)

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 ': 3326 milliseconds (identical to the result of the previous sentence.) If the number of samples is the same, then the greater than and equal number is the same.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-1-1 ' and fariqi< ' 2004-6-6 ' Spents: 3280 ms

4, the date column will not be due to the minutes and seconds of input to slow down the query speed

In the following example, a total of 1 million data, after January 1, 2004, there are 500,000, but only two different dates, the date is accurate to the day; there are 500,000 data, 5,000 different dates, and the date is accurate to seconds.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen

where fariqi> ' 2004-1-1 ' ORDER by Fariqi: 6390 ms

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen

where fariqi< ' 2004-1-1 ' ORDER by Fariqi: 6453 ms

V. Other matters of concern

"Water can carry a boat, can also overturn", the index is the same. Indexing can help improve retrieval performance, but too much or improper indexing can lead to inefficient systems. Because the user adds an index to each table, the database does more work. Too many indexes can even cause index fragmentation.

So, we want to build a "proper" indexing system, especially for the creation of aggregated indexes, but also should strive for excellence, so that your database can be high-performance play.

Of course, in practice, as a conscientious database administrator, you need to test a few more scenarios to find out which is the most efficient and effective.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.