One of the SQL Server INDEX OPTIMIZATION series: Working Principle & clustered index | non-clustered Index

Source: Internet
Author: User

From: http://crm.xingfa.com/HelpDesk/Archiver.asp? Threadid = 8587 thanks to the author

 

 

2010-09-29 8:00:53

Let's take a simple look at how the SQL Server Index works. We will not talk about the index concepts.

 

Clustered index:

(Figure)

 

Let's look at figure A, the structure of the clustered index.
The data page is the place where the data is actually stored in the database. You can see that it is stored on one page and one page.

Assume that the column is "lastname ".

Because it is a clustered index, it is arranged in order. We can see that the index is a tree. First, let's take a look at how the tree is formed.

First, let's look at the top of page100 and page110, which form page141. the first data of page141 is the first data of page100, And the last data of page141 is the first data of page110. For the same reason, page120 and page130 form page145, page141, and page145 form the root page140.

Well, let's see how it looks for data.

Let's look for the surname "Rudd.

First, it will start from the root, namely page140, because the value of "Rudd" is larger than that of "Martin" (you only need to compare their first letter, in 26 alphabetic order, R is placed behind m). So we will search for "Martin", that is, find page145, and then compare "Rudd" and "Smith ", "Rudd" is smaller than "Smith", so page120 will be searched on the left side, and then scanned in page120 line by line until "Rudd" is found ".

If no index is created, SQL Server will scan each page sequentially from the first page until "Rudd" is found ". Obviously, for a million-row table, the efficiency is extremely low. If an index is created, it can be found very quickly.

Non-clustered index:

(Figure B) 

See Figure B. Non-clustered index structure.

The difference between a clustered index and a non-clustered index is that the physical storage sequence of the clustered index data is consistent with the index sequence, that is, the data is sorted in order. Data storage of Non-clustered indexes is unordered and not arranged in the order of indexes.

Figure B shows that the data page is unordered. How is its index created?

See Figure B. It copies the data in the index column and then sorts the data in order to create an index. Each row of data has a pointer.

Let's look for "Rudd" again. First, start from the root of the index page. The search principle is the same as that of clustered indexes. Find "Rudd" in page61 on the index page, and its pointer is 470501. Then, find 470501 in page5 on the data page, which is the actual location of "Rudd" in the database, in this way, we found "Rudd ".

Well, the basic working principle of indexes is as follows, which may be more complicated.

Rokingphone2010-09-29 8:01:16 <Div class = posttitle> SQL Server INDEX OPTIMIZATION Series II: index performance considerations </div>

As mentioned above, indexing can greatly improve the data retrieval speed. Why not index every column? Beginners may be confused about this problem, and usually do not know which columns should be indexed or which should not be created, or even use like fuzzy query columns as index columns, in fact, like does not use indexes. Only equal to, greater than, and in operators use indexes. Sqlserver indexes must be updated for data insertion, update, and deletion. This will undoubtedly greatly increase the update time. In addition, if a data page is full, if you want to insert data to the page, the page split will generate fragments (which will be discussed later), affecting performance. Therefore, an index is created only when the query performance is more important than the update performance.

Columns to be indexed

1. Primary Key
2. Foreign key
3. frequently searched columns and columns frequently searched in order of sorting

Generally, the column referenced by the condition following the WHERE clause is the column to be indexed, except for Fuzzy queries (such as like queries)
Do not consider creating indexed Columns

1. Few or never reference columns in queries
2. Columns with only two or more values (for example, columns with only two values for male and female)
3. Small tables (tables with few rows, at this time, SQL Server takes a longer time to index than to scan the table directly)

SQL Server pays a certain cost to maintain the index for the columns that create an index. In addition, sqlserver automatically analyzes whether to use the index of this column. For example, if a column has only two values, male and female, it is considered that the efficiency of using the index to search for a column is not very high. Because the percentage of the returned result set is large, sqlserver will record the statistics. When you look for this column next time, the index of the modified column is determined based on the statistical data.

SQL Server may not use the index of a column with a large percentage of returned result sets (for example, if 1 million of data is returned and 0.5 million is returned, the full table scan method is used. You can perform a test on your own to insert 2000 data records, and 1999 data records are the same. For example, if forumid is 2, there are 1999 data records, and forumid is 3, there is only one data entry.

Set showplan_text on-display the execution plan to view the indexes used by the query statement
Go

Select * from posts where forumid = 2
The index of the forumid column is not used.

Select * from posts where forumid = 3
The index of the forumid column is used.

To insert or update a large volume of data, you must first Delete the index and then re-create the index to avoid updating the index every time you insert or update a data record, which affects the update speed.
Composite Index (an index composed of two or more columns. When the WHERE clause is followed by a condition consisting of multiple columns, you can create a composite index for these columns)

1) This index is used only when the first column of the index key is specified in the WHERE clause.
Example:
Create index posts_index
On posts (threadid, forumid)

If select * from posts where forumid = 2, the query will not use the posts_index index.
While select * from posts where threadid = 10 will use posts_index Index

2) The index should not be too large (<= 8 bytes is the best, int type is equivalent to 4 bytes, smallint is equivalent to 2 bytes ).
3) first define the most unique column (the order is different, and the index is different)
For example, if column A contains 30% of the duplicate data, column B has 10% of the duplicate data, and column C has 25% of the duplicate data, at this time, the order of the columns to be indexed should be B C.

There is also an important option to create an index: fill factor. Next article continues.

Rokingphone2010-09-29 8:01:32 <Div class = posttitle> SQL Server INDEX OPTIMIZATION Series 3: fill factor </div>

When creating an SQL Server Index, there is an option, that is, fillfactor ).

Few people may pay attention to it, but it is also important. You may also know this, but how to use it may be confusing. In addition, even if you understand the principles of the index, you may not be able to use it well. In this case, you must analyze the index Field update frequency and so on.

I remember reading a book knowing that there was such a thing, but they were all confused. I didn't know what to do, as if they were all set. As a matter of fact, when the data volume reaches hundreds of thousands or even millions, the effects of indexing are obvious.

Fill Factor definition: The data fill level of pages in the middle of the index. Its role: when the system creates or re-creates an index, a portion of the space is reserved on each index page. This allows the system to keep the index page from being split when adding index information. It aims to minimize the split of the index pages and fine-tune the performance.

(Figure)

As shown in Figure A, we know that data is stored by page. By definition, the fill factor refers to the data filled by page100 (page110, page120, page130) in Figure. By default, the fill factor is 0 (0 and 100% are the same), that is, completely filled. If 60% is set, only 60% of the data in page100 is full of space, and 40% of the remaining space is available.

The fill factor is only executed when the index is created. After the index is created, the fill factor is not maintained when data is added, deleted, or updated in the table. After the index is created, add data, such as "com", before "con" of page100, and the remaining space of page100 is less than 40%. Because the fill level is only 60%, there is a gap between "Barr" and "con", so "com" will be directly inserted between them, not as data movement and page split. If the page is filled with 100%, that is, the full page is filled with data, it will be regarded as "con" and the data behind it will move one digit backward, and then insert "com" before "con ", page100 moves about half of the rows (the second half) to the new page to free up space for the new row (this reorganization is called page splitting. Page splitting can reduce the performance and cause data storage fragments in the table.) In these moving processes, the speed decreases significantly, and fragments are generated on the other hand. After adding the data, you need to re-create the index to keep the fill factor at 60%.

Someone compares the fill factor to the following example:

To arrange the order of the 10 students in a class, we can give each student a number, for example:
A. From 1, 2, 3, 4, 5, 6, 7, 9, 10. In this case, the fill factor is 100.
At this time, if new students are coming, and their ranking is in the middle, we need to change the numbers of many students, such as the new students ranking 5th places, you need to add 1 to all the student numbers after the 4th, and then edit the new student number as the 5th.
B. We can give the students the numbers 1, 3, 5, 7, 9, 11, 13, 15, 17, and 19.
The order is also completed. we say that the fill factor is 50% at this time. If a new employee comes in the 5th bits, we only need to edit the number into 8. other students do not need to change.

It can be seen that when the fill factor is large, the number space used by the vertex is small, which consumes less resources. When the fill factor is small, the occupied resources increase, but the operation is convenient and fast.

When the fill factor is large, the re-indexing after the record is inserted or modified will be very large, and the disk I/O operations will increase, and the performance will inevitably decrease, but the occupied space is small. when the fill factor is small, the index file occupies a relatively large disk space and memory space. However, Io operations are required to re-index the system itself, which improves performance, but only occupies some storage space. depending on your own decision.

Generally, 100% is set only when data is not changed (for example, in a read-only table. In addition, it is useful to set the fill factor option to another value only when a new index is created based on existing data in the table and you can accurately predict which changes will be made to the data in the future. Therefore, it is not easy to set the fill factor.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.