Go from: Clustered and nonclustered indexes (collation)
Official statement:
A clustered index is an index in which the logical order of key values in the index determines the physical order of the corresponding rows in the table.
The clustered index determines the physical order of the data in the table. A clustered index is similar to a phone book, which arranges data by last name. Because a clustered index specifies the order in which data is physically stored in a table, a table can contain only one clustered index. However, the index can contain multiple columns (combined indexes), just as the phone book is organized by last name and first name.
Clustered indexes are particularly effective for columns that are frequently searched for range values. When you use a clustered index to find the row that contains the first value, you can ensure that the rows that contain the subsequent index values are physically adjacent. For example, if a query executed by an application frequently retrieves records from a range of dates, using a clustered index can quickly find the row that contains the start date, and then retrieve all adjacent rows in the table until the end date is reached. This helps improve the performance of such queries. Similarly, if a column is often used to sort data retrieved from a table, you can save costs by aggregating (physically sorting) the table on that column, avoiding sorting each time the column is queried.
When the index value is unique, it is also efficient to use a clustered index to find a particular row. For example, the fastest way to find a specific employee with a unique employee ID column emp_id is to create a clustered index or PRIMARY KEY constraint on the emp_id column.
Nonclustered indexes
An index in which the logical order of the indexes is different from the physical storage order of the upstream of the disk.
The index is described by the data structure of the binary tree, and we can understand it this way: the leaf node of the index is the data node. The leaf node of the non-clustered index is still an index node, but one pointer points to the corresponding data block. Such as:
(Nonclustered index)
(Clustered index)
First, understand the index structure in a comprehensible sense
In fact, you can interpret an index as a special kind of directory. Microsoft's SQL Server provides two types of indexes: Clustered indexes (clustered index, also called clustered indexes, clustered indexes), and nonclustered indexes (nonclustered index, also called nonclustered indexes, non-clustered indexes). Let's take a look at the differences between clustered and nonclustered indexes, for example:
In fact, the body of our Chinese dictionary is itself a clustered index. For example, we have to check the word "Ann", it will be very natural to open the first few pages of the dictionary, because "ann" Pinyin is "an", and alphabetical order of Chinese characters in the dictionary is the English letter "a" beginning and "Z", then the word "Ann" naturally ranked in the front of the dictionary. If you have turned over all the parts that begin with "a" and still cannot find the word, then it means that you do not have the word in your dictionary, and if you look up the word "Zhang", you will also turn your dictionary into the last part, because the pinyin of "Zhang" is "Zhang". That is, the body part of the dictionary is itself a directory, and you do not need to look up other directories to find what you need to find. We refer to this body of content itself as a directory of certain rules, called a "clustered index."
If you know a word, you can quickly check it out automatically. But you may also encounter the words you do not know, do not understand its pronunciation, at this time, you can not follow the method to find the word you want to check, and need to go to the "radicals" to find the word you are looking for, and then according to the page number after the word directly to a page to find the word you are looking for. But the sort of words you find in combination with the "radicals" and "gept" is not really the sort method of the body, for example, you check the word "Zhang", we can see in the Gept table after the Radicals "Zhang" page number is 672 pages, gept table "Zhang" above is "Chi" word, but the page number is 63 pages, "Zhang" below is "crossbow "Word, page is 390 pages. Obviously, these words are not really in the "Zhang" the word of the upper and lower side, now you see the continuous "Chi, Zhang, crossbow" three words is actually their order in the nonclustered index, is the dictionary body of words in the non-clustered index mapping. We can find the words you need in this way, but it takes two procedures to find the results in the catalog and then turn to the page numbers you need. We put this kind of directory purely as a directory, the body is purely the sort of body is called "nonclustered index".
From the example above, we can understand what is "clustered index" and "nonclustered index". Further, we can easily understand that each table can have only one clustered index, because the catalog can only be sorted in one way.
Second, when to use clustered or nonclustered indexes
The following table summarizes when to use clustered or nonclustered indexes (it is important):
Action Description |
Using Clustered Indexes |
Using Nonclustered indexes |
Columns are often sorted by grouping |
Should |
Should |
Return data in a range |
Should |
should not |
One or very few different values |
should not |
should not |
A small number of different values |
Should |
should not |
A large number of different values |
should not |
Should |
Columns that are frequently updated |
should not |
Should |
FOREIGN key columns |
Should |
Should |
Primary key columns |
Should |
Should |
Frequently modifying index columns |
should not |
Should |
In fact, we can understand the above table through examples of the previous clustered index and the definition of a nonclustered index. For example, to return data in a range. For example, if you have a table with a time column and you have the aggregate index in that column, you will be very fast when you query the entire data from January 1, 2004 to October 1, 2004, because the body of your dictionary is sorted by date, A clustered index only needs to find the beginning and end data in all the data to be retrieved, rather than a nonclustered index, you must first look up the page number for each item in the table of contents, and then find the specific content based on the page number.
Third, the practical, the misunderstanding of the use of the index
The purpose of the theory is to apply. Although we have just listed when clustered or nonclustered indexes should be used, in practice the above rules are easily overlooked or cannot be analyzed in the light of the actual situation. Below we will be based on the actual problems encountered in the practice of the index used in the misunderstanding, so that you can master the method of index establishment.
1, the primary key is the clustered index
The idea, I think, is an extreme mistake, a waste of a clustered index. Although SQL Server defaults to establishing a clustered index on the primary key.
In general, we will create an ID column in each table to differentiate each piece of data, and this ID column is automatically incremented, and the stride size is typically 1. This is true of the column GID in our example of office automation. At this point, if we set this column as the primary key, SQL Server will think of this Lieme as a clustered index. The benefit is that your data can be physically sorted in the database by ID, but I don't think it makes much sense.
Obviously, the advantage of a clustered index is obvious, and there can be only one rule for a clustered index in each table, which makes the clustered index more valuable.
From the definition of the clustered index we've talked about, we can see that the biggest benefit of using a clustered index is the ability to quickly narrow the query based on query requirements and avoid full table scans. In practice, because the ID number is automatically generated, we do not know the ID number of each record, so it is difficult to use the ID number to query. This makes the ID number the primary key as a clustered index a waste of resources. Second, a field that has a different ID number as a clustered index does not conform to the "Aggregate index should not be established" rule for a "large number of different values"; Of course, this situation is only for the user to modify the record content, especially when the index entry is negative, but for the query speed does not affect.
In the office automation system, whether it is the System home page display needs the user to sign the document, the meeting or the user carries on the file query and so on any circumstance to carry on the data inquiry to be inseparable from the field is "the date" and the user's own "user name".
Typically, the home page of office automation displays files or meetings that each user has not yet signed up for. Although our where statement can only limit the current user has not yet signed the case, but if your system has been established for a long time, and the amount of data is large, then every time each user opens the first page of a full table scan, it is not meaningful to do so, The vast majority of users have browsed through the files 1 months ago, which can only increase the cost of the database. In fact, we can allow users to open the system first page, the database only query the user for nearly 3 months not to read the file, through the "date" this field to limit the table scan, improve query speed. If your office automation system has been established for 2 years, then your homepage display speed will theoretically be 8 times times faster than the original speed.
The word "theoretically" is mentioned here because if your clustered index is still blindly built on the primary key of the ID, your query speed is not so high, even if you set the index (non-aggregated index) on the "Date" field. Let's take a look at the speed performance of various queries in the case of 10 million data volumes (data in 3 months is 250,000):
(1) The clustered index is established only on the primary key, and the time period is not divided:
Select Gid,fariqi,neibuyonghu,title from Tgongwen
Spents: 128470 milliseconds (i.e.: 128 seconds)
(2) Set up a clustered index on the primary key and a nonclustered index on Fariq:
Select Gid,fariqi,neibuyonghu,title from Tgongwen
where Fariqi> DateAdd (Day,-90,getdate ())
Spents: 53763 milliseconds (54 seconds)
(3) Set up the aggregation index on the date column (Fariqi):
Select Gid,fariqi,neibuyonghu,title from Tgongwen
where Fariqi> DateAdd (Day,-90,getdate ())
Spents: 2423 milliseconds (2 seconds)
Although each statement extracts 250,000 data, the differences in the various cases are enormous, especially when the clustered index is set in the Date column. In fact, if your database really has 10 million capacity, the primary key is built on the ID column, as in the 1th and 2 cases above, the performance on the Web page is timed out and cannot be displayed at all. This is also one of the most important factors that I discard the ID column as a clustered index. The method for the above speed is: Before each SELECT statement, add:
DECLARE @d datetime
Set @d=getdate ()
and add it after the SELECT statement:
Select [Statement execution takes time (milliseconds)]=datediff (Ms,@d,getdate ())
2, as long as the index can significantly improve the query speed
In fact, we can see that in the example above, the 2nd and 3 statements are identical, and the indexed fields are the same; only the non-aggregated indexes that were established on the Fariqi field, the latter set up in the aggregate index on this field, but the query speed is vastly different. Therefore, not simply indexing on any field can improve query speed.
From the statement in the table, we can see that there are 5,003 different records for the Fariqi field in the table with 10 million data. It is more appropriate to establish an aggregate index on this field. In reality, we send a few documents every day, these documents are issued in the same date, which is fully in line with the requirements of the establishment of a clustered index: "Neither the vast majority of the same, but not only a very few of the same" rule. As a result, it is important for us to build an "appropriate" aggregate index to improve query speed.
3. Add all fields that need to increase query speed to the clustered index to improve query speed
As already mentioned above: in the data query can not be separated from the field is the "date" and the user's own "user name." Since both of these fields are so important, we can merge them together to create a composite index (compound index).
Many people think that as long as you add any field to the clustered index, you can improve the query speed, and some people are puzzled: if the composite clustered index field is queried separately, then the query speed will slow? With this problem, let's look at the following query speed (the result set is 250,000 data): (the date column Fariqi first in the composite clustered index starting column, the user name Neibuyonghu in the following column):
(1) Select Gid,fariqi,neibuyonghu,title from Tgongwen where fariqi> ' 2004-5-5 '
Query speed: 2513 ms
(2) Select Gid,fariqi,neibuyonghu,title from Tgongwen
where fariqi> ' 2004-5-5 ' and neibuyonghu= ' office '
Query speed: 2516 ms
(3) Select Gid,fariqi,neibuyonghu,title from Tgongwen where neibuyonghu= ' office '
Query speed: 60280 ms
From the above experiment, we can see that if you use only the starting column of the clustered index as the query condition and the query speed of all columns that are used in the composite clustered index at the same time, it is even faster than using all of the composite index columns (in the same case as the number of query result sets) This index has no effect if only the non-starting column of the composite clustered index is used as the query condition. Of course, the query speed of statements 1, 2 is the same as the number of entries queried, if all the columns of the composite index are used, and the query results are small, so that will form an "index overlay", thus the performance can be achieved optimally. Also, keep in mind that no matter if you use other columns of the aggregated index frequently, the leading columns must be the most frequently used columns.
Iv. Summary of index use experience not available on other books
1. Using aggregate index is faster than primary key with not aggregate index
Here is the instance statement: (all extracts 250,000 data)
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '
Usage Time: 3326 ms
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid<=250000
Usage Time: 4470 ms
Here, the aggregate index is nearly 1/4 faster than the primary key speed that is not an aggregated index.
2, using the aggregate index than the general primary key for the order by when the speed, especially in the case of small data volume
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen ORDER by Fariqi
Spents: 12936
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen ORDER by GID
Spents: 18843
Here, it is 3/10 faster to use the aggregate index than the general primary key for order by. In fact, if the amount of data is very small, it is much faster to use the clustered index as the rank sequence than the non-clustered index, and if the data volume is large, such as more than 100,000, the speed difference between the two is not obvious.
3. Using the time period within the aggregation index, the search time is reduced proportionally to the percentage of the data in the data table, regardless of how many of the aggregated indexes are used:
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-1-1 '
Spents: 6343 milliseconds (extract 1 million)
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-6-6 '
Spents: 3170 milliseconds (extract 500,000)
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '
Time: 3326 milliseconds (identical to the result of the previous sentence.) If the number of acquisitions is the same, then the greater than and equals sign are the same)
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen
where fariqi> ' 2004-1-1 ' and fariqi< ' 2004-6-6 '
Spents: 3280 milliseconds
4. The date column will not slow down the query speed because there is a minute or seconds input
In the following example, there are 1 million data, 500,000 data after January 1, 2004, but only two different dates, the date is accurate to the day, before the data 500,000, there are 5,000 different dates, the date is accurate to the second.
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen
where fariqi> ' 2004-1-1 ' ORDER by Fariqi
Spents: 6390 milliseconds
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen
where fariqi< ' 2004-1-1 ' ORDER by Fariqi
Spents: 6453 milliseconds
V. Other matters of note
"The water can carry the boat, also overturn it", the index is the same. Indexes can help improve retrieval performance, but too many or improper indexes can cause system inefficiencies. Because the user adds an index to the table, the database will do more work. Too many indexes can even cause index fragmentation.
So, we want to build an "appropriate" index system, especially for the creation of aggregate indexes, should be better, so that your database can be high performance.
Of course, in practice, as a dedicated database administrator, you should also test a number of scenarios to find out which is the most efficient and effective.
Please also refer to: clustered index
Clustered index vs. nonclustered index