Go: How tables are accessed in SQL Server table scan, index Scan, index Seek

Source: Internet
Author: User
Tags create index prepare types of tables

0. References

Table Scan, Index Scan, index Seek

SQL Server–index Seek vs. Index Scan–diffefence and Usage–a simple Note

How Oracle tables are accessed

Difference between index seek and index scan and where applicable

How to access tables in 1.oracle

In Oracle, where table access is described, the data in the Access table is accessed primarily in three ways:

    1. Full table scan, direct access to data pages to find data that meets your criteria
    2. Through the ROWID scan (table access by ROWID), if you know the rowid of the data, find it directly through ROWID
    3. Index Scan, if a table creates an index, it can be indexed to find where the data we want is stored in the table, that is, ROWID, by returning rowID and then using ROWID to access the specific data.
    4. In the index scan, it can be divided into index full scan, indexed range Scan (index range) and index unique Scan (index unique scan).
Clustered index scan,table scan,index scan in 2.sql server

There are similar things in SQL Server, which is going to be the table Scan,index Scan and the index seek.

    1. A table Scan is where the table was processed row by row from beginning to end.
    2. An index scan is a where the index is processed row by row from beginning to end.
    3. If The index is a clustered index then an index scan is really a table scan.
    4. Summary: In SQL Server, the table scan is the one row from the beginning to the end of the data in the table. The processing here can be understood as the conditional judgment of the WHERE clause in SQL. We need to traverse each row in the table to determine if the where condition is satisfied. The simplest table scan is the SELECT * from table.
    5. An index scan is an access to the beginning and end of each node in the index. Assuming our index is a B-tree structure, index scan accesses every node in the B-tree.
    6. If the index is a clustered index, the leaf node of the B-Tree index holds the actual data in the data page. If the index is a nonclustered index, the B-Tree leaf node holds a pointer to the data page.

(PS: The following 2.1-2.6 added in 2012-9-4)

2.1 Experimental Data preparation

After introducing clustered index scan,table Scan and Index Scan, we will use experiments to describe the circumstances under which these table scans will be used. We will experiment with ADVENTUREWORKS2008R2, the sample database, and first prepare the experimental data, TSQL as follows:

--Prepare the test data--------------------------------------------------use adventureworks2008r2go--if the table already exists, delete the drop table dbo. Salesorderheader_testgodrop table dbo. salesorderdetail_testgo--CREATE TABLE SELECT * INTO dbo. Salesorderheader_testfrom Sales.salesorderheadergoselect * into dbo. Salesorderdetail_testfrom sales.salesorderdetailgo--creates an index of Create clustered index SALESORDERHEADER_TEST_CL on dbo. Salesorderheader_test (SalesOrderID) gocreate index Salesorderdetail_test_nclon dbo. Salesorderdetail_test (SalesOrderID) Go--select * FROM dbo. Salesorderdetail_test--select * FROM dbo. Salesorderheader_test declare @i intset @i = 1while @i<=9begin INSERT INTO dbo. Salesorderheader_test (RevisionNumber, OrderDate, DueDate, Shipdate,status, Onlineorderflag, Salesordernumber,purcha Seordernumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, Billtoaddressid, Shiptoaddressid, Shipmethodi D, Creditcardid, Creditcardapprovalcode, Currencyrateid, Subtotal,taxamt, Freight,totaldue, Comment,rowguid,Modifieddate) Select RevisionNumber, OrderDate, DueDate, Shipdate,status, Onlineorderflag, Salesordernumber,purchaseor Dernumber, AccountNumber, Customerid,salespersonid, TerritoryID, Billtoaddressid, Shiptoaddressid, ShipMethodID, Cr     Editcardid, Creditcardapprovalcode, Currencyrateid, Subtotal,taxamt, Freight,totaldue, Comment,rowguid,ModifiedDate FROM dbo. salesorderheader_test where SalesOrderID = 75123 INSERT INTO dbo. Salesorderdetail_test (SalesOrderID, Carriertrackingnumber, OrderQty, ProductID, Specialofferid,unitprice,unitprice     Discount,linetotal, rowguid,modifieddate) SELECT [Email protected], Carriertrackingnumber, OrderQty, ProductID,  Specialofferid,unitprice,unitpricediscount,linetotal, Rowguid, GETDATE () from sales.salesorderdetail set @i = @i +1endgo--data is ready to complete--------------------------------
2.2 Experimental data show:
    1. Dbo. The salesorderheader_test contains the header information for each order, including the order creation date, customer number, contract number, salesperson number, etc., with a separate order number for each order. On the Order Number field, there is a clustered index .
    2. Dbo. The details of the order are stored in the salesorderdetail_test. A single order can sell multiple products to the same customer, so dbo. Salesorderheader_test and dbo. Salesorderdetail_test is a one-to-many relationship. Each detail includes the order number to which it belongs, its own unique number (Salesorderdetailid) in the form, the product number, the unit price, and the number of sales. In this case, only a nonclustered index is established on the SalesOrderID. Create index is a nonclustered index created by default.
    3. According to the original data in AdventureWorks, dbo. There are more than 30,000 order information in Salesorderheader_test, dbo. SalesOrderDetail has more than 120,000 orders detailed records, basically an order has a detailed record of the article. This is a normal distribution. To make the data unevenly distributed, we're in the dbo again. The salesorderheader_test adds 9 order records, which are numbered from 75124 to 75132. This is 9 special orders, each with more than 120,000 detailed records. In other words, dbo. 90% of the data in Salesorderdetail_test will be in these 9 orders. The main use of "select [email protected]" to search out Sales.SalesOrderDetail all records inserted into Dbo.salesorderdetail. Executed 9 times altogether.
2.3 Table Scan

There are two types of tables in SQL Server, one is a clustered table with a clustered index, and the other is a table with no clustered index. In a clustered table, the data is stored in order by the clustered index, while the table is stored in a hash. to dbo. Salesorderdetail_test , for example, does not have a clustered index above it, only a nonclustered index on the SalesOrderID. Therefore, each row of the table records, not in any order, but randomly stored in the hash . At this point we find all the sales price is greater than the detailed records, to run the following statement:

View Code

Because the table does not have an index on UnitPrice, SQL Server has to scan the table from start to finish, bringing all UnitPrice values greater than the records one by one, and the process as shown .

It's clear from the execution plan that SQL Server does a table scan, as shown in:

2.4 Index Scan and Index seek

We created a nonclustered index on the SalesOrderID, the join query condition is SalesOrderID, and only salesorderid this column, then what query method will be executed? First we query the salesorderid<43664 record and execute the following TSQL statement:

Select SalesOrderID from salesorderdetail_test where salesorderid< 43664

As shown in its execution plan, we found that the index seek was executed

If we want to query all SalesOrderID records without a Where condition,

Select SalesOrderID from Salesorderdetail_test

So, as shown in the query plan, we found that the index scanwas executed.

So if we ask for a query of all salesorderid<80000 records, how are we going to query them? Clear sky to execute plan cache before executing query

DBCC dropcleanbuffers--empty execution plan cache DBCC freeproccache--empty data cache Select SalesOrderID from Salesorderdetail_test where salesorderid< 80000

As shown in its query plan, we found that the index seek was used

2.5 clustered Index Scan

What happens if there is a clustered index on this table? Or do you just give it an example of a table that is the only one in which the value is the only field salesorderdetailid to create a clustered index. This allows all data to be stored in the order of the clustered index.

View Code

Unfortunately, there is no index on the query condition UnitPrice, so SQL Server still has to scan all the records again. And the difference is that the table scan in the execution plan becomes a clustered index scan (clusteredindex Scan). As shown in the following:

Because in a table with a clustered index, the data is stored directly at the bottom of the index, so to scan the entire table of data, the entire clustered index will be scanned once. Here, the clustered index scan is equivalent to a table scan. The time and resources to use are no different from the table scan. This is not to say that there is the word "Index", which shows how much progress has been made in the execution plan than the table scan. Of course, if you see the word "Table Scan", it means there is no clustered index on the table.

Now build a nonclustered index on the UnitPrice to see how things will change.

--Creating a nonclustered index on UnitPrice CREATE INDEX Salesorderdetail_test_ncl_priceon dbo. Salesorderdetail_test (UnitPrice) go

In a nonclustered index, a value for the index key of the nonclustered index is stored for each record and a value for the index key of the clustered index (in a table without a clustered index, the RID value). So here, each record will have a copy of the UnitPrice and Salesorderdetailid records, stored in the order of UnitPrice.

Run that query again,

Select Salesorderdetailid, UnitPrice from dbo. Salesorderdetail_test where UnitPrice > 200

You will see that this time SQL Server does not have to scan the entire table as shown in. This query will find the records of UnitPrice > 200 directly based on the index.

Based on the new index, it directly finds the value that matches the record, as shown in the query plan. We can see that the index seek operation is performed directly on the nonclustered index.

But the UnitPrice index on the light does not tell us the values of the other fields. If you add a few more fields back in that query, the following TSQL query:

View Code

SQL Server will first find all records on the nonclustered index that are greater than UnitPrice , and then find the detailed data stored on the clustered index based on the value of Salesorderdetailid. This process can be called "Bookmark Lookup" as shown in .

After SQL Server 2005 , the action of the Bookmark Lookup is done with a nested loop. So in the execution plan, you can see that SQL Server first seek the nonclustered index Salesorderdetail_test_ncl_price, and then use the clustered index seek Find the line you need. The nested loop here is actually the Bookmark Lookup, as shown in:

The key lookup above is one of the Bookmark lookup , because we have a clustered index in our table, and if we don't have a clustered index, this is the RID lookup, as shown in:

The above key lookup uses the following time:

SQL Server Execution Times:cpu time = 2995 ms, Elapsed time = 10694 Ms. SQL Server parse and compile time:cpu time = 0 ms, Elapsed time = 0 Ms.

In the above query, the reason for using the WITH (Index (SALESORDERDETAIL_TEST_NCL_PRICE)) statement is to force it to use Salesorderdetail_test_ncl_price, the nonclustered index. A clustered index key value was found through a nonclustered index and then queried in the clustered index. If not used, SQL Server may use clustered index Scan, or bookmark lookup, depending on the amount of data returned by the query.

(1) For example, the results of query UnitPrice > 200:

Select Salesorderid,salesorderdetailid,unitprice from dbo. Salesorderdetail_test where UnitPrice > 200

The query plan is as follows, we can find that using clustered index scan, the number of records returned is 481,590, very large.

More important is its CPU time, as follows:

SQL Server Execution Times:cpu time = 515 ms, Elapsed time = 10063 Ms. SQL Server parse and compile time:cpu time = 0 ms, Elapsed time = 0 Ms.

We found that the CPU time was only 515ms, smaller than the 2995ms we had seen before. This indicates that index seek is not necessarily better than index scan. SQL Server chooses the more ways to perform operations based on statistics.

(2) If the query UnitPrice <2 results:

2

We found that the query plan no longer uses the cluster index scan, but instead used the index seek+clustered index seek, as shown in the number of returned records is only 1630. Relatively small number of records, so there is no need to clustered index scan.

2.6 Summary

To summarize, there are several scenarios in which the different and different methods of finding targets in SQL Server are based on data.

  structure

scan

seek

table Scan

no

clustered index

clustered index Scan

clustered Index Seek

nonclustered index

index Scan

index Seek

If you see these actions in the execution plan, you should be able to know what kind of object SQL Server is doing. Table scan indicates that the table being processed does not have a clustered index, and SQL Server is scanning the entire table. Clustered index Scan indicates that SQL Server is scanning a table with a clustered index, but is also an entire table scan. The index scan indicates that SQL Server is scanning a nonclustered index. Because there are generally only a small number of fields on a nonclustered index, this is also a scan, but at a much smaller cost than an entire table scan. Clustered index seek and index seek indicate that SQL Server is using index results to retrieve target data. If the result set accounts for only a fraction of the total table data, seek is much cheaper than scan, and the index plays a role in performance. If you have a large number of query result sets, you may be more inclined to use table scan.

3.Index Scan, Index Seek the comparison

Index seek is the SQL at the time of the query using the established index to scan, first scan the index node, that is, traversing the index tree. After finding the leaf node of the index, if the clustered index takes the value of the leaf node directly, if it is a non-clustered index, it finds the corresponding row according to the ROWID in the leaf node (the leaf node of the clustered index is the data page, and the leaf node of the nonclustered index is the index page that points to the data page, which is the rowID, which occurs when the table does not have a clustered index, and if the table itself contains a clustered index, the nonclustered index key values and clustered index key values that are held in the leaf nodes of the nonclustered index are then searched in the clustered index after the clustered index key value is obtained. ). For the index scan, which traverses all the rows in the entire index page from the beginning, it is not very efficient when the volume of data is large, and in the case of a clustered index, the clustered index scan is Table Scan .

SQL has a query Optimization analyzer that queries Optimizer, which is analyzed first before executing the query, and when there are indexes available in the query, the efficiency of querying with index seek is prioritized, and if the query using index seek is not efficient, Then use the index scan to query. Under what circumstances would that result in index seek efficiency lower than index scan? You can split the concentration situation:

1. In cases where there is not much data in the table to be queried, it is not necessarily efficient to use index seek, since the index seek is used to start with the tree and then use the leaf node to find the corresponding row. If the number of rows is less, the index scan is not directly performed. Therefore, the data stored in the table cannot be too small.

2. When the amount of data returned is large, such as the amount of data returned is 50% or more than 50% of the total, using index seek efficiency is not necessarily good, and when the amount of data returned is 10%-15%, index seek is used to obtain the best performance. Therefore, if you want to use index seek, the amount of data returned is neither too much nor too small.

3. In cases where the indexed columns are in a consistent number of values, indexing is not necessarily a good efficiency to build. For example, it is not recommended to create an index on the "Gender" column. In fact, the reason is very simple, when the index of the column to change the value of the case, the establishment of the index binary tree should be chunky type, tree level is not high, many rows of information are contained in the leaves, such a query is obviously not very good use of the index

MSDN words: Do not always equate the use of indexes with good performance, or equate good performance with efficient use of indexes. If you can get the best performance with an index, the query optimizer will work just as easily. But in fact, incorrect index selection does not get the best performance. Therefore, the task of the query optimizer is to select only when the index or index combination improves performance, and avoids using index retrieval when it is detrimental to performance.

4.SQL Server in the I/O

The I/O from a instance of SQL Server is divided to logical and physical I/O. A logical read occurs every time the database engine requests a page from the buffer cache. If the page is not currently in the buffer cache, a physical read was then performed to read the page into the buffer cache . If the page is currently in the cache, no physical read is generated; The buffer cache simply uses the page already in memory.

In SQL Server I/O can be divided into logical io and physical IO, reading a page from the cache (buffer cache) is a logical read, if the data page is not in the current cache, then the data page must be read from disk to the cache, which is considered a physical read.

Ext.: http://www.cnblogs.com/xwdreamer/archive/2012/07/06/2579504.html

Go: How tables are accessed in SQL Server table scan, index Scan, index Seek

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.