Basic SQL Server Index knowledge (2) -- clustered index, non-clustered index)

Source: Internet
Author: User

 

Clustered index is an index. The logical order of the key values in the index determines the physical order of the corresponding rows in the table. The clustered index determines the physical sequence of data in the table. The clustered index is similar to the phone book and sorts data by last name. Because clustered indexes specify the physical storage sequence of data in a table, a table can only contain one clustered index. However, this index can contain multiple columns (composite indexes), just as the phone book is organized by the last name and name. Clustered indexes are particularly effective for columns that frequently search for range values. When a clustered index is used to locate the row that contains the first value, you can ensure that the row that contains the subsequent index value is physically adjacent. For example, if the application Program If you perform a query to retrieve records within a certain date range, you can use the clustered index to quickly find the rows that contain the start date, and then retrieve all adjacent rows in the table until the end date is reached. This helps improve the performance of such queries. Similarly, if a column is frequently used to sort the data retrieved from a table, the table can be clustered (physically sorted) on the column ), this avoids sorting this column each time it is queried, thus saving costs. When the index value is unique, it is efficient to use clustered indexes to find specific rows. For example, the quickest way to use the unique employee ID column emp_id to find a specific employee is to create a clustered index or primary key constraint on the emp_id column. Applicable to clustered indexes:
1. A column containing a large number of non-repeated values.
2. Use between,>, >=, <or <= to return a range value column.
3. columns that are continuously accessed
4. Query of large result sets returned
5. columns frequently accessed by queries using join or group by clauses Clustered index. A non-clustered index is based on the B + tree structure. The difference between the value type and the reference type is very similar. The following isArticleLearn more

 

 

 

 

 

 

As I needed to train my colleagues on database indexing knowledge, I collected and sorted out this series of blogs. This is a summary of index knowledge. After summing up, I found that many vague concepts were much clearer in the past.

Both clustered indexes and non-clustered indexes are implemented using the B + tree. Before learning about these two indexes, we need to understand the B + tree first. If you do not know about Tree B, you are advised to refer to the following articles:

What are btree, B-tree, B + tree, and B * tree?
Http://blog.csdn.net/manesking/archive/2007/02/09/1505979.aspx

Structure of the B + tree:

Features of the B + tree:

    • All the keywords appear in the linked list of the leaf node (dense index), and the keywords in the linked list are exactly ordered;
    • It is impossible to hit non-leaf nodes;
    • Non-leaf nodes are equivalent to leaf node indexes (sparse indexes), and leaf nodes are equivalent to data layers that store (keywords) data;

Adding or deleting a data in the B + tree requires processing in multiple situations. This is complicated and will not be detailed here.

Clustered Index)

    • The leaf node of the clustered index is the actual data page.
    • On the data page, data is stored in index order.
    • The physical location of the row is the same as that of the row in the index.
    • Each table can have only one clustered index.
    • The average size of clustered indexes is about 5% of the table size.

The following are two simple descriptions of clustered indexes:

Execute the following statement in the clustered index:

 
Select*From Table WhereFirstname ='OTA'

 

The clustered index graph of a relatively abstract point:

 

Unclustered Index)

    • A non-clustered index page is not a data page, but a page pointing to a data page.
    • If the index type is not specified, non-clustered indexes are used by default.
    • The order of leaf node pages is different from the physical storage order of tables.
    • Each table can have a maximum of 249 non-clustered indexes.
    • Create a clustered index before creating a non-clustered index (otherwise, index reconstruction is triggered)

Execute the following statement in a non-clustered index:

 
Select*FromEmployeeWhereLname ='Green'

A non-clustered index graph of a relatively abstract point:

 

What is bookmark lookup?

Although bookmark lookup is no longer mentioned in SQL 2005 (changing the changes), many of our searches use such a search process as follows:
Search in non-clustering and then in clustered index.

 

In an example provided by the http://www.sqlskills.com/, we demonstrated that bookmark lookup is slower than table scan, and the example script is as follows:

 Use Credit Go  -- These samples use the credit database. You can download and restore  -- Credit database from here:  Http://www.sqlskills.com/resources/conferences/CreditBackup80.zip  -- Note: This is a SQL Server 2000 backup and merge examples will work on  -- SQL Server 2000 in addition to SQL Server 2005.  ------------------------------------------------------------------------------- -- (1) create two tables which are copies of charge:  -------------------------------------------------------------------------------  -- Create the heap  Select * Into Chargeheap From Charge Go  -- Create the CL table  Select * Into Chargecl From Charge Go  Create  Clustered   Index Chargecl_clind On Chargecl (member_no, charge_no) Go  -------------------------------------------------------------------------------  -- (2) Add the same non-clustered indexes to both of these tables:  -------------------------------------------------------------------------------  -- Create the NC index on the heap  Create   Index Chargeheap_ncind On Chargeheap (charge_no)Go  -- Create the NC index on the CL table  Create   Index Chargecl_ncind On Chargecl (charge_no) Go  -------------------------------------------------------------------------------  -- (3) begin to query these tables and see what kind of access and I/O returns  -------------------------------------------------------------------------------  -- Get ready for a bit of analysis:  Set  Statistics Io On  -- Turn graphical showplan on (CTRL + k)  -- First, a point query (also, see how a bookmark lookup looks in 2005)  Select * From Chargeheap Where Charge_no= 12345 Go  Select * From Chargecl Where Charge_no= 12345 Go -- What if our query is less selective?  -- 1000 is. 0625% of our data... (1,600,000 million rows)  Select * From Chargeheap Where Charge_no <1, 1000 Go  Select * From Chargecl Where Charge_no <1, 1000 Go  -- What if our query is less selective?  -- 16000 is 1% of our data... (1,600,000 million rows) Select * From Chargeheap Where Charge_no <1, 16000 Go  Select * From Chargecl Where Charge_no <1, 16000 Go  -------------------------------------------------------------------------------  -- (4) What's the exact percentage where the bookmark lookup isn't worth it?  ------------------------------------------------------------------------------- -- What happens here: Table scan or bookmark lookup?  Select * From Chargeheap Where Charge_no <1, 4000 Go  Select * From Chargecl Where Charge_no <1, 4000 Go  -- What happens here: Table scan or bookmark lookup?  Select * From ChargeheapWhere Charge_no <1, 3000 Go  Select * From Chargecl Where Charge_no <1, 3000 Go  -- And-you can narrow it down by trying the middle ground:  -- What happens here: Table scan or bookmark lookup?  Select * From Chargeheap Where Charge_no <1, 3500 Go  Select * From Chargecl Where Charge_no <1, 3500 Go  -- And again:  Select * From Chargeheap Where Charge_no <1, 3250 Go  Select * From Chargecl Where Charge_no <1, 3250 Go  -- And again: Select * From Chargeheap Where Charge_no <1, 3375 Go  Select * From Chargecl Where Charge_no <1, 3375 Go  -- Don't worry, I won't make you go through it all :)  -- For the heap table (in this case), the cutoff is: 0.21%  Select * From ChargeheapWhere Charge_no <1, 3383 Go  Select * From Chargeheap Where Charge_no <1, 3384 Go  -- For the clustered table (in this case), the cut-off is: 0.21%  Select * From Chargecl Where Charge_no <1, 3438 Select * From ChargeclWhere Charge_no <1, 3439 Go 

This example is the demo of Wu jiazhen at teched 2007.

Summary:

This blog simply uses several charts to introduce the index implementation methods: B + number, clustered index, non-clustered index, and bookmark lookup information.

References:

Table organization and index organization
Http://technet.microsoft.com/zh-cn/library/ms189051.aspx
Http://technet.microsoft.com/en-us/library/ms189051.aspx

How indexes work
Http://manuals.sybase.com/onlinebooks/group-asarc/asg1200e/aseperf/@Generic__BookTextView/3358

Bookmark Lookup
Http://blogs.msdn.com/craigfr/archive/2006/06/30/652639.aspx

Logical and physical operators reference
Http://msdn2.microsoft.com/en-us/library/ms191158.aspx

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.