Some small places that are easy to overlook in SQL Development (iv)

Source: Internet
Author: User

Original: Some small places that are easy to overlook in SQL Development (iv)

In this article I would like to make some corrections to some of the statements on the web for the use of nonclustered indexes . The following references the description of the nonclustered index structure under MSDN.

Nonclustered index structure:

1: Nonclustered indexes have the same B-tree structure as clustered indexes, and the significant difference between them is the following two points:

* The data rows of the underlying table are not sorted and stored in the order of the nonclustered keys.
* The leaf layer of a nonclustered index is made up of index pages rather than data pages.

2: A row locator in a nonclustered index row or a pointer to a row, or a clustered index key for a row, as described below:

* If the table is a heap (meaning that the table does not have a clustered index), the row locator is a pointer to the row. The pointer is generated by the file identifier (ID), the page number, and the row count on the page. The entire pointer is called the row ID (RID).

* If the table has a clustered index or a clustered index on the indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server adds internally generated values (called unique values) so that all duplicate keys are unique. This four-byte value is not visible to the user. This value is added only if you need to make the clustered key unique for use in a nonclustered index. SQL Server retrieves a data row by using the clustered index key stored in a leaf row of a nonclustered index to search for a clustered index.

Network view:the columns used in the ORDER BY clause can be built with nonclustered indexes to improve query speed.

Original address: http://gocom.primeton.com/blog10697_1221.htm

my opinion: in a word, the environment is different, the table structure is different, the data distribution is different, the final result is not necessarily the same.

Case:I recently made a project with two large table associated, are close to tens of millions. A table is Orders Form Order, the other is Membership Form Member, there is a field in the order Create_date: Type datatime, where the values are not the same and unique, and are not contiguous, here are some values:
Create_date
-----------------------
2008-10-05 04:00:56.000
2008-10-05 03:55:55.000
2008-10-05 03:55:42.000
2008-10-05 03:54:40.000
2008-10-05 03:54:32.000
2008-10-05 03:54:23.000
2008-10-05 03:47:16.000
2008-10-05 03:46:08.000
2008-10-05 03:42:28.000

2008-10-05 03:42:09.000

The order form and the membership table have an associated field Proxyid, each of which is indexed. The query statement is as follows:

SELECT * FROM order INNER JOIN member on Order.proxyid=member.proxyid
Where leavedate between ' Start time ' and ' End Time ' ORDER BY create_date Desc
Test:

Case one: Create a nonclustered index on create_date. The IO to execute and the consumed time consumption as follows: It can be seen that a large number of table scans are performed on the Memer table. 83,588 times.

Situation Two: Delete the index on the create_date, it should be more slower than the index, the following is the implementation of the IO and time consumption diagram:

For this I have the following findings:
The 1:order by field has been scanned only 9 times for the member table without creating an index. Much less than 83,588 times when the index was created.
2: Another phenomenon is that if you press all the data in the Query Analyzer, there is less time to create the index.

3: The query that created the index displays the data one step ahead of the query that did not create the index, but the final completion time is longer.

Test Unknown Challenge:

1: In terms of query speed, is the first step in the Query Analyzer display data query fast or to see the final complete time to judge. (Create_date will display the data earlier if the index is created, but it will be slower than not creating an index in total) park friend Zping told me not to look at the time to see the number of Io. I don't know how people analyze it.

2: Creating an index on a field why a multiple-table scan of a member table is raised.

Test Note: because SQL2005 has a cache function, all two queries are not the same time period, but the amount of data is similar.

According to Perfectdesign's point of view, when order by, if the field is a clustered index would be optimal, I personally as well as MSDN agree, oddly, the above statement, leave_date on the clustered index, and then order by leave_date Desc, however, will also produce more than 50,000 member table scans, as if the order by index field, whether clustered or nonclustered, will increase the scanning of the member table a lot. It's really a myth. The following is a detailed ID case:

(+ row (s) affected)
Table ' member '. Scan count 52796, logical reads 234885, physical reads 0, Read-ahead reads 3687, LOB logical reads 0, LOB physical reads 0 , LOB Read-ahead reads 0.
Table ' V_hotel '. Scan count 1, logical reads 3121, physical reads 0, Read-ahead reads, LOB logical reads 0, LOB physical reads 0, LOB re Ad-ahead reads 0.

Test Conclusion: This situation is sufficient to explain that the order by field to create an index does not necessarily play the advantage of nonclustered index, for which the reason I am not, there is no answer, if you have the answer also hope to teach one or two. Although there are some principles and guidelines for database tuning, But these so-called criteria must be all right. I feel that all are for reference only, or to be analyzed according to the actual situation.

The following is a description of MSDN for non-clustered indexing applications, which I think might refer to:

Before you create a nonclustered index, you should understand how the data is accessed. Consider using nonclustered indexes on queries that have the following properties:

    • Use the JOIN or GROUP by clause.
      You should create multiple nonclustered indexes for the columns involved in the join and grouping operations, and a clustered index for any foreign key columns.
    • Queries that do not return large result sets.
      Create a filtered index to overwrite a query that returns a well-defined subset of rows from a large table.
    • Contains columns that are frequently included in a query's search criteria, such as a WHERE clause that returns an exact match.

Note:

This article refers to: MSDN


Some small places that are easy to overlook in SQL Development (iv)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.