SQL Server query performance optimization-index creation principle (I)

Last Update:2018-12-07 Source: Internet

Author: User

Tags sql server query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is an index? Index is an important tool to improve query performance. index is to add a small amount of data required by the query statement to the index page. In this way, you only need to access the pages of a few indexes when accessing data. However, indexes are not omnipotent in improving query performance, nor are they as good as creating more indexes. The index is missing.WhereThe efficiency of clause data searching is low, which is not conducive to data searching. When many indexes are created, it is not conducive to operations such as adding, modifying, and deleting, because during these operations,SQL ServerIn addition to updating the data table itself, all related indexes must be updated immediately, and too many indexes will waste hard disk space. Therefore, we need experience to build it right.

I. basic purpose of Indexes

The basic purpose of indexing is to find a small amount of data in a large amount of data. You can imagine that if a book has700Page, just as the data table has700Data pages, but the index has600Index pages. Do you want to use indexes to query the content of books?

The lower the repeatability of the index field, the better. Assume that the "" and"ArticleEach page contains a large number of words with extremely high repeatability. You will first flip the index page to a location where there is ", and then flip the page to read, then index the next "of", and the result is in different locations on the previous page, and go back to the original page of the book to view the next "".

So how can we understand that indexes are used to find a small amount of data from a large amount of data? The following is an example.

If the average length of a data table record is400Byte, then100Tens of thousands of records required5The calculation formula is as follows:

1000000/(8060/400)= 50000

If the data table creates a clustered index, the key value is4Length, whileIDThe data length is13Therefore, each record in the index structure is20Bytes.

4(Clustered index key value)+ 13(IDKey value)+ 3(Manage Information)= 20

ToId fieldIf the index is set to 100%, the total number of pages is approximately2482Page, the calculation method is as follows:

1000000/(8060/20)

Even if you use80%Only the filling rate3106Page. The calculation method is as follows:

1000000 // (8060*0.8)/20)

From the above we can see that if this is the first case, the index page only accounts for 5% of the total data page:

2482/50000 = 0.04964

Even if you consider filling in only 80% of the index data per page, in the second case, the index page only accounts for 6% of the total data pages:

3106/50000 = 0.06212

In addition, if an index is created for a field in the query conditionB-treeYou can use binary search to find data. That is2OfNIf the power is greater than the number of records, you can find the data. While2Of20Power greater100, So most search20You can find this record.Because the number of times is small and the data structure is small, saving access to hard disks and internal resources, indexing will greatly improve the efficiency of data searching. SQL ServerTo improve the efficiency of access and search comparison, the smaller the key value of the data domain used for indexing is, that is, to store more key-value records on pages as much as possible.

Note:

If notUniqueAttribute to create a clustered index. The database engine automatically adds4ByteUniqueifier column. If necessary, the database engine automatically adds a uniqueifier value to the row to make each key unique. This column and column value are used internally and cannot be viewed or accessed by users.

Ii. What is an index?

InSQL ServerThe index is based onBTree structure.For example.

You can also think of indexes as a special directory. MicrosoftSQL ServerTwo types of indexes are provided: clustered index (Clustered Index, Also known as clustering index, cluster index) and non-clustered index (Nonclustered Index, Also known as non-clustering index and non-cluster index ). The following is an example of the difference between a clustered index and a non-clustered index:

In fact, the body of the Xinhua Dictionary is itself a clustered index. For example"Press"It is natural to open the first few pages of the dictionary, because"Press"The Pinyin is""And sorted by pinyinThe Xinhua Dictionary is an English letter.""Start"Z"At the end"Press"The word is naturally placed in the front of the dictionary. If you have finished all""The start part still cannot be found.The word is not in the Xinhua Dictionary; similarly, If you query"Recruitment"It will alsoThe Xinhua Dictionary goes to the last part, because"Recruitment"The Pinyin is"Zhao". That is to say,The body of Xinhua Dictionary is a directory. You do not need to check other directories to find the content you need. We refer to this text content as a directory arranged according to certain rules"Clustered Index".

If you encounter a word you do not know and do not know its pronunciation, then you cannot find the word you want to query according to the method just now. " Radicals " Find the word you are looking for, and then go to a page based on the page number after the word to find the word you are looking. However " Radical directory " And " Word Checklist " The word sorting method is not the real text sorting method. For example " Zhang " Word, we can see the word checklist after the first part of the query " Zhang " The page number is 672 Page, word Checklist " Zhang " Above is " Chi " But the page number is 63 Page, " Zhang " Below is " Crossbow " Word, the page is 390 Page. Obviously, these words are not really " Zhang " The upper and lower sides of the word, now you see the continuous " Chi, Zhang, and crossbow " The three words are actually their sorting in the non-clustered index, which is the ing of words in the dictionary body in the non-clustered index. We can find the words you need in this way, but it requires two steps: first find the results in the directory, and then flip to the page number you need. We call this directory purely a directory, and the body is purely a sort of the body called " Non-clustered Index " .

Through the above example, we can understand what is"Clustered Index"And"Non-clustered Index". Further extension.

Clustered Index

Clustered index refers to the data table itself as a part of the index, that is, the data table itself is the leaf layer of the clustered index, the order of the entire data table is sorted from small to large according to the selected key values,SQL Server 2000Later versions can specify that the data is sorted from large to small.

The entire data table is sorted from small to large by key value fields, and then combined with the upper-layer index structure of the key value field plus the pointer, that is, the root node and non-slave layer, forming the entire clustered index. Because the actual data placement in a data table can only follow one order, a data table can only have one clustered index. When a clustered index is specified, the data domain itself does not need to be unique, or is specified as a unique clustered index,SQL ServerDuplicate key values are automatically created internally.4The unique identifier of each byte.

If your data table has one column that is often used for sorting, the other column is often used for range query, and another column is highly repetitive, Which column should be used for clustered index. The correct answer is based on which query is most important and most often executed by users. For example, it is more important for your boss to execute a query multiple times in an hour than to execute a query once or twice in a month.

When a table (HEAP) creates or deletes or recreates an existing clustered index, the database is required to have an additional workspace to accommodate temporary copies of data sorting results and original tables or existing clustered index data.

When a heap or clustering table has multiple partitions, each partition has one heap orBTree structure, which contains the row Group of the specified partition. For example, if a clustered table has4Partitions, there will be4ItemsBTree, with one partition.

Clustered index (Clustered Index)

·The leaf node of the clustered index is the actual data page.

·On the data page, data is stored in index order.

·The physical location of the row is the same as that of the row in the index.

·Each table can have only one clustered index.

·The average size of clustered indexes is about the table size.5%Left and right

To use indexes to sort and query data more effectively, the most direct way is to create a clustered index on the fields you want to sort. After a clustered index is created,SQL ServerThe data page is reorganized to store the data rows in the order of the key values in the clustered index.SQL ServerData that does not need to be stored on the hard disk must be sorted by clustered index. However, when a clustered index is created, data is sorted logically, it will also sort data physically as much as possible. Each data page in the index leaf level has a pointer pointing to the previous and next pages of the index page, forming a bidirectional link serial, the internal system data table contains the address of the first page of each index leaf layer. To ensure that the data is stored logically in the order of clustered indexes,SQL ServerYou only need to start from the first page and search for data one by one based on the connection sequence. For example.

Note: A clustered table has a clustered index.

Non-clustered Index

Non-clustered indexes are completely independent from data tables, so they do not affect the order of data rows. Their leaf layer contains index rows. Each index row contains non-clustered key values, row delimiters, and any contained or non-key columns. There are two types of data stored in the row locator: bookmarks (Bookmark) Or the key value of the clustered index. If a clustered index is created on the data table, the data stored in the row locator is the key value of the clustered index. If the data table does not have a clustered index, the data stored in the row locator is a bookmarkdonet, that is, pointing to the specific location of the record in the data tableRowidThat is, the document number, the page number, and the record number in the page (calledSoltNumber. Use thisRowidRetrieving data from a data table is called a bookmarkdonning.Bookmark Lookup. Therefore, when a key value is found through a non-clustered index, it is also matched with a bookmarked query.

When a non-clustered index finds a matched record from the structure, although the key value is sorted from small to large at the leaf layer, therefore, all key values that meet the query conditions may exist on one page, however, because the data rows in a data table are not placed in order (or they are not placed in the order of the key values of the non-clustered index ), therefore, the data that truly conforms to the record is distributed throughout the document, andSQL ServerEvery time data is read, the unit is data page. Therefore, after finding the location of a record, you must first read the records stored on pages to the memory, and then read the records from the page.

BecauseBookmark LookupIt is a random I/O operation. When there are many records that match the query, non-clustered index access will lead to frequent reading of data pages, even if the two records are on the same page, this page is also read twice. ThereforeNTo read the pages in the data table.NPage, although most of the read operations are aimed at high-speed cache in the memory, but too many records are not as efficient, it is not as good as data table scan, all scan, find out the Qualified Data.

AlthoughSqls 2005Not mentioned in later versionsBookmark LookupNow(But in fact it is not changing the changes), Many of our searches areUse the following search process: first locate in non-clustered index, and then find in clustered index. For example.

Non-clustered index (Unclustered Index)

·A non-clustered index page is not a data page, but a page pointing to a data page.

·If the index type is not specified, non-clustered indexes are used by default.

·The order of leaf node pages is different from the physical storage order of tables.

·Each table can have a maximum249Non-clustered Indexes(Generally, each table should not exceed10Indexes)

·Create a clustered index before creating a non-clustered index (otherwise, index reconstruction is triggered)

Usage of clustered and non-clustered indexes:

Action Description	Use clustered Index	Use non-clustered Index
Foreign key column	Ying	Ying
Primary Key Column	Ying	Ying
Columns are sorted by group.(Order)	Ying	Ying
Returns data within a certain range.	Ying	Should not
Different decimal values	Ying	Should not
Different values of large numbers	Should not	Ying
Frequently updated Columns	Should not	Ying
Frequently modify index Columns	Should not	Ying
One or few different values	Should not	Should not

Today we will popularize some basic indexing knowledge. Tomorrow we will explain how to choose the columns to create the index, what are the conditions, and what are the methods.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More