Analysis and explanation of indexing principle in MySQL database

Analysis and explanation of indexing principle in MySQL database _mssql

Last Update:2017-01-19 Source: Internet

Author: User

Tags getdate

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

For example, let's take a look at the difference between a clustered index and a nonclustered index:
In fact, the text of our Chinese dictionary is itself a clustered index. For example, if we look at the word "Ann", we will naturally open the first few pages of the dictionary, because the pinyin of "an" is "an", and the dictionary of Chinese characters according to Pinyin is the beginning of the English letter "a" and ending with "Z", then the word "Ann" is naturally ranked in the front of the dictionary. If you end up with all the "a" parts you still can't find the word, so you don't have the word in your dictionary; Similarly, if you look at the word "Zhang", you will also turn your dictionary to the last part, because "Zhang" Pinyin is "Zhang". that is, the body part of the dictionary itself is a directory, and you don't need to look up other catalogs to find what you need to find.
we refer to the content of the body as a "clustered index", which is arranged according to certain rules.
If you know a word, you can quickly find the word from automatic. But you may also encounter words you do not know, do not understand its pronunciation, at this time, you can not follow just the MethodFind the word you want to search, and you need to find the word according to the "radical", and then turn to a page directly after the word page to find the word you are looking for. But the sort of word you find in conjunction with the "Radical catalog" and "CJK ideographs table" is not really the sort of method of body text, for example, you look up the "Zhang" word, we can see in the CJK ideographs table after the radical, "Zhang" page number is 672 pages, CJK ideographs Table "Zhang" above is "Chi" word, but the page number is 63 pages, "Zhang" below is "crossbow "Word, the page is 390 pages. Obviously, these words are not really in the "Zhang" word of the upper and lower side, now you see the continuous "Chi, Zhang, crossbow" three words is actually their sorting in the nonclustered index, is the dictionary body of words in the nonclustered index mapping. we can find the word you need in this way, but it takes two processes to find the result in the TOC and then turn to the page number you want.
we refer to this catalogue as purely a directory, and the text is simply a sort of text that is called a nonclustered index.
From the above example, we can understand what is "clustered index" and "nonclustered index".
Further, we can easily understand that each table can have only one clustered index, because the catalog can only be sorted in one way.
(ii) When to use clustered or nonclustered indexes
The following table SummaryWhen to use a clustered or nonclustered index (very important).

Action Description	Columns are often sorted in groups	Returns data in a range	One or very few different values	A small number of different	A large number of different values	frequently updated columns	FOREIGN key columns	Primary key columns	Frequently modify index columns
Using Clustered Indexes	Should	Should	should not be	should not be	should not be	should not be	Should	Should	should not be
Do not use clustered indexes	Should	should not be	should not be	should not be	Should	Should	Should	Should	Should

In fact, we can understand the table above by using examples of the definitions of the previous clustered and nonclustered indexes. For example, returns a range of data items. For example, if you have a table with a time column that happens when you set up the aggregate index in that column, this speed will be quick when you query all the data from January 1, 2004 to October 1, 2004, because the text of your dictionary is sorted by date, The clustering index only needs to find the beginning and end data in all the data to be retrieved, and unlike nonclustered indexes, you must first look up the page number of each item in the table of contents, and then find the specific content based on the page number.
(three) combining with practice, talking about the misunderstanding of index use
The purpose of the theory is to apply. Although we have just listed when a clustered or nonclustered index should be used, in practice the above rules are easily overlooked or can not be carried out according to the actual situation
Combined analysis. Below we will talk about the use of the index based on the actual problems encountered in practice, so as to facilitate the understanding of the method of index establishment.
1, the primary key is the clustered index
This idea is extremely wrong and is a waste of the clustered index. Although SQL Server creates a clustered index on the primary key by default.
Typically, we create an ID column in each table to distinguish each piece of data, and the ID column is automatically enlarged, and the step size is typically 1. This is the case for the column GID in our example of office automation. At this point, if we set this column as the primary key, SQL Server will think of this Lieme as a clustered index. The advantage of doing this is that you can keep your data in DatabaseIs physically sorted by ID, but I don't think it makes much sense.
Obviously, the advantages of clustered indexes are obvious, and each table can have only one clustered index rule, which makes the clustered index more valuable.
From the definition of the clustered index we talked about earlier, we can see that the biggest benefit of using a clustered index is the ability to quickly narrow the scope of the query to avoid full table scans, based on query requirements. in practice, because the ID number is generated automatically, we do not know the ID number of each record, so it is very difficult to use the ID number for the query. This makes the ID number the primary key as a clustered index a waste of resources. Secondly, having a field with different ID numbers as a clustered index does not conform to the rule that no aggregate index should be established in the case of a large number of different values .Of course
This situation is only negative when the user changes the record content, especially the index entry, but it has no effect on the query speed.
In the office automation system, whether the system home page to show the need for the user to sign the file, the meeting or the user file query, and so on in any case data query can not be separated from the field is "date" and the user's own "user name."
Usually, the home page of office automation will display documents or meetings that have not been signed by each user. Although our where statement can only limit the situation that the current user has not signed up for, if your system has been built for a long time and has a large amount of data, it does not make sense to have a full table scan every time a user opens the home page, The vast majority of users have been browsing the files 1 months ago, doing so only to increase the cost of the database. In fact, we can allow users to open the System home page, the database only to query the user nearly 3 months of unread files, through the "Date" field to limit the table scan, improve query speed. If your office automation system has been established for 2 years, then your home page display speed will theoretically be 8 times times faster, or even quicker.
The word "theoretically" is mentioned here because if your clustered index is blindly built on the ID this primary key, your query speed is not so high, even if you are in the "day
The index established on this field (not an aggregate index). Let's take a look at the speed performance of various queries in the case of 10 million data volumes (250,000 for 3 months)
：
(1) A clustered index is established on the primary key and does not divide the time period:
Select Gid,fariqi,neibuyonghu,title from Tgongwen
Spents: 128470 milliseconds (ie: 128 seconds)
(2) A clustered index is established on the primary key and a nonclustered index is established on the Fariq:
Select Gid,fariqi,neibuyonghu,title from Tgongwen
where Fariqi> DateAdd (Day,-90,getdate ())
Spents: 53763 milliseconds (54 seconds)
(3) Set up the aggregate index on the date column (Fariqi):
Select Gid,fariqi,neibuyonghu,title from Tgongwen
where Fariqi> DateAdd (Day,-90,getdate ())
Spents: 2423 milliseconds (2 seconds)
Although each statement extracts 250,000 data, the differences in the various situations are enormous, especially when the clustered index is built on a date column. In fact, if your database really has 10 million capacity, the primary key is built on the ID column, as in the 1th or 2 cases above, the performance on the Web page is timed out, simply cannot be displayed. This is one of the most important factors that I discard the ID column as a clustered index.
The way to get the above speed is to add: Declare @d datetime before each SELECT statement
Set @d=getdate ()
and add after the SELECT statement:
Select statement execution takes time (ms)]=datediff (Ms,@d,getdate ())
2, as long as the establishment of the index can significantly improve query speed
In fact, we can see that in the above example, the 2nd and 3 statements are exactly the same, and the fields indexed are the same; the difference is that the first is a Fariqi index created on the field, which is an aggregated index, but the query speed is a different one. So not simply indexing on any field can improve query speed.
From the statement in the table, we can see that there are 5,003 different records in the Fariqi field in the table with 10 million data. It is more appropriate to establish an aggregated index on this field. In reality, we send a few files every day, and the documents are issued on the same date, which is exactly the same as setting up a clustered index: "Neither the vast majority nor the very few are the same" rule. From this perspective, Our establishment of an "appropriate" aggregate index is very important for us to improve our query speed.
3, add a clustered index to all fields that need to improve the query speed to increase the query speed. As mentioned above, the data query can not be separated from the field is "date" and the user's own "user name." Now that these two fields are so important, we can combine them to create a composite index (compound index).
Many people think that as long as you add any field to the clustered index, you can increase the speed of the query, and others are puzzled: if the composite clustered index field separate query, then the query speed will slow down? With this question, let's take a look at the following query speed (result set is 250,000 data): (Date column Fariqi first row in the composite clustered index, the user name Neibuyonghu row in the back column)
(1) Select Gid,fariqi,neibuyonghu,title from Tgongwen where fariqi> ' 2004-5-5 '
Query speed: 2513 ms
(2) Select Gid,fariqi,neibuyonghu,title from Tgongwen where fariqi> ' 2004-5-5 ' and neibuyonghu= ' office '
Query speed: 2516 ms
(3) Select Gid,fariqi,neibuyonghu,title from Tgongwen where neibuyonghu= ' office '
Query speed: 60280 ms
From the above experiment, we can see that if only the starting column of the clustered index is used as the query and the query speed of all the columns using the composite clustered index is almost the same, even faster than using all the composite indexed columns (in the case of the number of query result sets) , and this index has no effect if only the non-starting column of a composite clustered index is used as a query condition. Of course, the query speed of statements 1 and 2 is the same as the number of entries in the query, if all the columns of the composite index are used and the query results are few, this will result in an "index overlay", thus achieving optimal performance. While keep in mind that no matter how often you use other columns that aggregate indexes, the leading columns must be the most frequently used columns.
(iv) Summary of the experience of indexing used in other books
1, the use of aggregate index than the index is not aggregated faster than the primary key
Here is the instance statement: (all 250,000 data extracted)
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '
Usage Time: 3326 ms
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid<=250000
Usage Time: 4470 ms
In this case, it is nearly 1/4 faster to use an aggregate index than to use a primary key that is not an aggregate index.
2, with the aggregate index than the normal primary key for the order by when the speed, especially in small amount of data
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen ORDER by Fariqi
Spents: 12936
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen ORDER by GID
Spents: 18843
In this case, the aggregate index is 3/10 faster than the normal primary key. In fact, if the amount of data is small, a clustered index is much faster than using a nonclustered index, and if the amount of data is large, such as over 100,000, the speed difference between the two is not obvious.
3, using the time period within the aggregation index, the search time will be reduced by the percentage of the data in the entire data table, regardless of the number of aggregated indexes used
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-1-1 '
Spents: 6343 MS (extract 1 million)
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-6-6 '
Spents: 3170 MS (extract 500,000)
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '
Spents: 3326 milliseconds (exactly as the result of the previous sentence). If the number of samples is the same, then the greater than and equal number is the same.
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-1-1 ' and fariqi< ' 2004-6-6 '
Spents: 3280 ms
4, the date column will not be due to the minutes and seconds of input to slow down the query speed
In the following example, there are 1 million data, there are 500,000 data after January 1, 2004, but only two different dates, the date is accurate to day; before there were 500,000 data, 5,000
Different date, the date is accurate to the second.
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi> ' 2004-1-1 ' ORDER by Fariqi
Spents: 6390 ms
Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi< ' 2004-1-1 ' ORDER by Fariqi
Spents: 6453 ms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More