Table Scan, Index Scan, and Index Seek in SQL Server

Source: Internet
Author: User
Tags types of tables

Table Scan, Index Scan, and Index Seek in SQL Server

0. References

Oracle Table Access Method

Differences between Index Seek and Index Scan and their applicability

1. Table Access in oracle

There is a table access method in oracle. There are three methods to access table data:

Full table scan directly accesses the data page to find data that meets the conditions

Through rowid scan (table access by rowid), if you know the rowid of the data, you can directly search through rowid

Index scan: if an index is created for a table, you can use the index to find the location where the expected data is stored in the table, that is, rowid, return rowid and then use rowid to access specific data.

Index scanning can be divided into index full scan, index range scan, and index unique scan.

2. clustered index scan, table scan, and index scan in SQL server

Similar content also exists in sqlserver. Here we will use table scan, index scan, and index seek.

A table scan is where the table is processed row by row from beginning to end.

An index scan is where the index is processed row by row from beginning to end.

If the index is a clustered index then an index scan is really a table scan.

Summary: in SQL server, a table scan is performed for a row starting from the beginning to the end of the table. The processing here can be understood as the condition judgment of the where clause in SQL. We need to traverse each row in the table to determine whether the where condition is met. The simplest table scan is select * from table.

Index scanning is the access to each node in the index from start to end. Assume that our index is in the B-tree structure, then index scan accesses every node in the B-tree.

If the index is a clustered index, the leaf node of the B-tree index stores the actual data on the data page. If the index is not a clustered index, the leaf node of Tree B stores the pointer to the data page.

(Ps: The following 2.1-2.6 is added in)

2.1 prepare lab data

After introducing clustered index scan, table scan, and index scan, we will use experiments to describe how to use these table scan methods. We will use the sample database AdventureWorks2008R2 for the experiment. First, we will prepare the experiment data, as shown in TSQL:

View Code

-- Prepare test data --------------------------------------------------

Use adventureworks2008R2

Go

-- If the table already exists, delete it.

Drop table dbo. SalesOrderHeader_test

Go

Drop table dbo. SalesOrderDetail_test

Go

-- Create a table

Select * into dbo. SalesOrderHeader_test

From Sales. SalesOrderHeader

Go

Select * into dbo. SalesOrderDetail_test

From Sales. SalesOrderDetail

Go

-- Create an index

Create clustered index SalesOrderHeader_test_CL

On dbo. SalesOrderHeader_test (SalesOrderID)

Go

Create index SalesOrderDetail_test_NCL

On dbo. SalesOrderDetail_test (SalesOrderID)

Go

-- Select * from dbo. SalesOrderDetail_test

-- Select * from dbo. SalesOrderHeader_test

Declare @ I int

Set @ I = 1

While @ I <= 9

Begin

Insert into dbo. SalesOrderHeader_test

(RevisionNumber, OrderDate, DueDate,

ShipDate, Status, OnlineOrderFlag, SalesOrderNumber, PurchaseOrderNumber,

AccountNumber, CustomerID, SalesPersonID, TerritoryID,

BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID,

CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt,

Freight, TotalDue, Comment, rowguid, ModifiedDate)

Select RevisionNumber, OrderDate, DueDate,

ShipDate, Status, OnlineOrderFlag, SalesOrderNumber, PurchaseOrderNumber,

AccountNumber, CustomerID, SalesPersonID, TerritoryID,

BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID,

CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt,

Freight, TotalDue, Comment, rowguid, ModifiedDate

From dbo. SalesOrderHeader_test

Where SalesOrderID = 75123

Insert into dbo. SalesOrderDetail_test

(SalesOrderID, CarrierTrackingNumber, OrderQty, ProductID,

SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal,

Rowguid, ModifiedDate)

Select 75123 + @ I, CarrierTrackingNumber, OrderQty, ProductID,

SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal,

Rowguid, getdate ()

From Sales. SalesOrderDetail

Set @ I = @ I + 1

End

Go

-- Data preparation completed --------------------------------

2.2 tutorial data description:

Dbo. SalesOrderHeader_test stores the header information of each order, including the order creation date, customer number, contract number, and salesperson number. Each order has a separate order number. There is a clustered index on the order number field.

Dbo. SalesOrderDetail_test stores the Order details. One order can sell multiple products to the same customer, so dbo. SalesOrderHeader_test and dbo. SalesOrderDetail_test are one-to-many relationships. The details of each entry include the order number to which it belongs, its own unique ID in the table (SalesOrderDetailID), product number, unit price, and sales quantity. Here, only a non-clustered index is created on SalesOrderID. By default, create index creates non-clustered indexes.

According to the original data in AdventureWorks, dbo. SalesOrderHeader_test contains more than 30 thousand orders. More than 0.12 million orders are recorded in dbo. SalesOrderDetail, and 3 ~ Five detailed records. This is a normal distribution. To make the data distribution uneven, we add nine order records in dbo. SalesOrderHeader_test, whose numbers are from 75124 to 75132. This is nine special orders, each with more than 0.12 million detailed records. That is to say, 90% of data in dbo. SalesOrderDetail_test belongs to the nine orders. It mainly uses "select 75123 + @ I..." to search for all records in Sales. SalesOrderDetail and insert them to dbo. SalesOrderDetail. A total of 9 executions are performed.

2.3 table scan

In SQL server, there are two types of tables. One is a clustered table with clustered indexes, and the other is a table without clustered indexes. Data in a clustered table is stored in an ordered manner according to the clustered index, while the table is stored in hash unordered. Taking dbo. SalesOrderDetail_test as an example, there is no clustered index on it, and there is only one non-clustered index on SalesOrderID. Therefore, records in each row of the table are not stored in any order, but randomly stored in the Hash. In this case, we need to find detailed sales records with a unit price greater than 200 and run the following statement:

View Code

Select maid ID, UnitPrice from dbo. SalesOrderDetail_test where UnitPrice> 200

Because the table has no index on UnitPrice, SQL Server has to scan the table from start to end and pick out all records whose UnitPrice value is greater than 200 one by one, as shown in the process.

The execution plan clearly shows that SQL Server performs a table scan, as shown in:

2.4 index scan and index seek

We have created a non-clustered index on SalesOrderID. If the query condition is SalesOrderID and only the column SalesOrderID is added, what query method will be used for execution? First, we query the record of SalesOrderID <43664 and execute the following TSQL statement:

Select SalesOrderID from SalesOrderDetail_test where SalesOrderID <43664

Shows the execution plan. We found that index seek is executed.

If we want to query all SalesOrderID records without the where condition,

Select SalesOrderID from SalesOrderDetail_test

As shown in the query plan, index scan is executed.

Then, if we want to query all records with SalesOrderID <80000, how is the query performed. Clear execution plan cache before query execution

View Code

Dbcc dropcleanbuffers -- clear the execution plan Cache

Dbcc freeproccache -- clear data cache

Select SalesOrderID from SalesOrderDetail_test where SalesOrderID <80000

The query plan is shown in. We found that index seek is used.

2.5 clustered index scan

What if this table has clustered indexes? Take the table as an example. First, create a clustered index on the SalesOrderDetailID field where the value is unique. In this way, all data is stored in the order of clustered indexes.

View Code

-- Create a clustered index for SalesOrderDetail_test

Create clustered index SalesOrderDetail_test_CL

On dbo. SalesOrderDetail_test (SalesOrderDetailID)

Go

Unfortunately, there is no index on the query condition UnitPrice, so SQL Server still needs to scan all records. The difference is that the table scan in the execution plan is changed to a clustered index scan ). As shown in:

Because the data in a table with clustered indexes is directly stored at the bottom of the index, to scan the data in the entire table, you need to scan the entire clustered index. Here, clustered index scanning is equivalent to a table scan. There is no difference between the time and resources used and the table scan. It doesn't mean that the Index indicates how much progress the execution plan has made than the table scan. Of course, if we see the word "Table Scan", it means that there is no clustered index on this Table.

Create a non-clustered index on UnitPrice to see how the situation changes.

View Code

-- Create a non-clustered index on UnitPrice

Create index SalesOrderDetail_test_NCL_Price

On dbo. SalesOrderDetail_test (UnitPrice)

Go

In a non-clustered index, a non-clustered index key value and a clustered index key value are stored for each record (in a table without clustered index, is the RID value ). Therefore, each record contains UnitPrice and SalesOrderDetailID, which are stored in the order of UnitPrice.

Run the query just now,

Select maid ID, UnitPrice from dbo. SalesOrderDetail_test where UnitPrice> 200

You will see that this SQL Server does not need to scan the entire table, as shown in. In this query, records with UnitPrice> 200 are directly found based on the index.

Based on the newly created index, it directly finds the value that matches the record, as shown in the query plan. We can see that the index seek operation is directly performed on nonclustered index.

However, indexes created on UnitPrice alone cannot tell us the values of other fields. If several fields are added to the query, the following TSQL query is performed:

View Code

Select SalesOrderID, SalesOrderDetailID, UnitPrice

From dbo. SalesOrderDetail_test with (index (SalesOrderDetail_test_NCL_Price ))

Where UnitPrice> 200

SQL Server first finds all records whose UnitPrice is greater than 200 on the non-clustered index, and then finds the detailed data stored on the clustered index based on the value of SalesOrderDetailID. This process can be called "Bookmark Lookup", as shown in.

After SQL Server 2005, the Bookmark Lookup action is completed in a nested loop. Therefore, in the execution plan, we can see that SQL Server first seek the non-Clustered Index SalesOrderDetail_test_NCL_Price, and then uses Clustered Index Seek to find the required rows. The nested loop here is actually Bookmark Lookup, as shown in:

The above Key Lookup is a kind of Bookmark Lookup, because there is a clustered index in our table. If we do not have a clustered index, here is the RID Lookup, as shown in:

The time consumed by the preceding key lookup is as follows:

SQL Server Execution Times:

CPU time = 2995 ms, elapsed time = 10694 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

In the preceding query, the with (index (SalesOrderDetail_test_NCL_Price) statement is used to force the non-clustered index SalesOrderDetail_test_NCL_Price, the clustered index key value is found through a non-clustered index and then queried in the clustered index. If not, SQL server may use clustered index scan or bookmark lookup, depending on the data volume returned by the query.

(1) For example, query the result of UnitPrice> 200:

Select SalesOrderID, SalesOrderDetailID, UnitPrice from dbo. SalesOrderDetail_test where UnitPrice> 200

The query plan is as follows. We can find that clustered index scan is used, and 481590 records are returned, which is very large.

More importantly, its cpu time is as follows:

SQL Server Execution Times:

CPU time = 515 ms, elapsed time = 10063 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

We found that the cpu time is only 515 ms, which is smaller than the 2995ms we saw earlier. This indicates that index seek is not necessarily better than index scan. SQL server selects a more method to perform the operation based on the statistics.

(2) If you query the result of UnitPrice <2:

Select SalesOrderID, SalesOrderDetailID, UnitPrice from dbo. SalesOrderDetail_test where UnitPrice <2

We found that the query plan no longer uses cluster index scan, but uses index seek + clustered index seek, as shown in. There are only 1630 records returned. The number of records is relatively small, so clustered index scan is not required.

2.6 conclusion

To sum up, there are several situations in SQL Server that vary with the data search targets and methods.

Construct ScanSeek

Heap (Table data page without clustered indexes) Table Scan none

Clustered Index ScanClustered Index Seek

Non-clustered Index ScanIndex Seek

If you see these actions in the execution plan, you should be able to know which object SQL Server is doing what kind of operation. Table scan indicates that the table being processed has no clustered index, and SQL Server is scanning the entire table. Clustered index scan indicates that SQL Server is scanning a table with clustered indexes, but it is also a full table scan. Index Scan indicates that SQL Server is scanning a non-clustered Index. Since non-clustered indexes generally only have a small part of fields, although this is also a scan, the cost is much lower than the whole table scan. Clustered Index Seek and Index Seek indicate that SQL Server is using Index results to retrieve target data. If the result set only accounts for a small portion of the total data volume of the table, Seek is much cheaper than Scan, and the index plays a role in improving performance. If there are many query result sets, table scan may be preferred.

3. Comparison of Index Scan and Index Seek

Index Seek is used by SQL to scan indexes during query. The Index node is scanned first, that is, the Index tree is traversed. After finding the leaf node of the index, if it is a clustered index, the value of the leaf node value is taken directly. If it is a non-clustered index, then, the corresponding row is searched based on the rowid in the leaf node (the leaf node of the clustered index is the data page, and the leaf node of the non-clustered index is the index page pointing to the data page, that is, the rowid of the data page, which occurs when the table does not have a clustered index. If the table itself contains a clustered index, the leaf nodes of non-clustered indexes store the non-clustered index key values and clustered index key values. After obtaining the clustered index key value, they are searched in the clustered index again .). For Index Scan, all rows on the entire Index page are traversed from the beginning to the end. Therefore, the efficiency is not very high when the data volume is large. In the case of clustered indexes, clustered index scan is table scan.

SQL has a Query optimization analyzer Query Optimizer which will analyze the Query before executing the Query. When there are available indexes in the Query, the efficiency of using Index Seek for Query will be analyzed first, if the query efficiency using Index Seek is not good, Index Scan is used for query. Under what circumstances will Index Seek be less efficient than Index Scan? This can be divided into the following situations:

1. when there are not many data in the table to be queried, the efficiency of using Index Seek is not necessarily high, because using Index seek must start from the Index tree first, then use the leaf node to find the corresponding row. If the number of rows is small, Index scan is not performed directly. Therefore, the data stored in the table cannot be too small.

2. when the returned data volume is large, for example, if the returned data volume accounts for 50% of the total data volume or exceeds 50%, the efficiency of using Index Seek is not fixed. When the returned data volume accounts for 10%-15% of the total data volume, use Index Seek to obtain the best performance. Therefore, if you want to use index seek, the returned data volume cannot be too large or too small.

3. When many columns with the same value are created, creating an index may not be very efficient. For example, we do not recommend that you create an index on the "gender" column. In fact, the reason is very simple. When the value of the column to which the index is created changes little, the index binary tree should be of the low fat type, and the tree hierarchy is not high, the information of many rows is contained on the leaves. Such queries obviously cannot be well utilized by indexes.

In the original words of MSDN: do not always equate the use of indexes with good performance, or equate good performance with the efficient use of indexes. If you only need to use the index to get the best performance, the query optimizer is easy to work on. However, in fact, incorrect index selection cannot achieve optimal performance. Therefore, the task of the query optimizer is to select the query optimizer only when the index or index combination can improve the performance, and avoid using the query optimizer when the index search performance is poor.

4. I/O in SQL server

The I/O from an instance of SQL Server is divided into logical and physical I/O. A logical read occurs every time the database engine requests a page from the buffer cache. if the page is not currently in the buffer cache, a physical read is then med to read the page into the buffer cache. if the page is currently in the cache, no physical read is generated; the buffer cache simply uses the page already in memory.

In sqlserver, I/O can be divided into logical IO and physical IO. Reading a page from the buffer cache is logical read. If the data page is not in the current cache, the data page must be read from the disk to the cache, which is a physical read.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.