What is bookmark lookup? In the books online in SQL Server, this is said: "bookmark lookup logical operators and physical operators use bookmarks (row IDs or clustering keys) to search for corresponding rows in tables or clustered indexes.Argument
A column contains a bookmarkmark used to search for rows in a table or clustered index.ArgumentThe column also contains the name of the table or clustered index of the row to be searched. If the with prefetch clause appears in
ArgumentIn the column, it indicates that the query processor has decided to use Asynchronous pre-extraction (pre-read) when searching bookmarks in a table or clustered index )." After reading this explanation, it is still unclear. Later, I found out what bookmark lookup is and when it will happen. What impact does bookmark lookup have on the query performance and how to avoid bookup lookup. Share with you. If you have any errors, please do not give me any further information. Unless otherwise stated, the SQL server mentioned in this Article refers to SQL Server 2000.
To understand bookmark lookup, you must start with the SQL Server Index and table Storage Architecture. SQL Server tables use the following two methods to organize their data pages:
Clustered table: a clustered table is a table with clustered indexes. It stores data rows in sequence based on the clustered index key, and indexes are implemented based on the B-tree index structure. Tree B performs a quick search of rows based on the clustered index key value. The page links of each index are in a two-way linked list, but the key value is used for navigation between different levels. Data rows constitute the lowest level of clustered indexes.
STACK: A table without clustered indexes does not store data rows in any special order. The data page is not linked in the linked list. A non-clustered index has a B-tree index structure similar to the clustered index, but it does not take effect on the order of data rows. Its lowest row contains the key value of a non-clustered index, each key-value item has a pointer pointing to a data row containing the key value. For a stack, this pointer is a pointer to a row, and for a clustered table, it is a clustered index key. This pointer is called a row positioner.
One of SQL Server's data files is IAM, that is, the index allocation ing table, which stores the extended disk information about tables and indexes. A heap has a row in sysindexes, whose indid is 0. The firstiam column points to the iam chain of the data page set of the table. The server uses the iam page to search for pages in the data page set. By scanning the iam page, you can perform table scans or serial reads on the stack to find the extended Disk Area of the page of the stack. Therefore, no matter what kind of query is performed for a heap without any indexes, the server must scan the table once. Even if only one row is returned, the I/O count is the same, that is, the number of rows in the table.
The clustered index of a table and view has a row in sysindexes, and its indid is 1. The root column points to the top of the clustered index B tree. The server uses the B tree to search for data pages. SQL Server browses the clustered index to find the row corresponding to the clustered index key. To locate the key range, SQL Server browses the index to find the starting value of the range, and then scans the data page from the forward or backward pages. To locate the top of the data page chain, SQL server scans the index's root node along the leftmost pointer. Therefore, if a clustered index is used to search for data and only one row is returned, the IO number is the depth from the top of the B tree to the data row where the key value is located, which is abbreviated as D. If multiple rows are returned, the number of pages that meet the conditions must be added. The short note is P. The total I/O count is D + P.
A non-clustered index of a table or view also has a row in the sysindexes index. Its indid value ranges from 2 to 250, and the root column points to the top of the non-clustered index B tree. When SQL Server looks for data, the server first uses the same search method as the clustered index to find the row locator-bookmark of the index, and then uses the row locator to find the required data, bookmark lookup is used to search for data through the row locator. If the table where the index is located is a heap, SQL server uses the row pointer to locate the data. Therefore, in this case, the I/O number of the returned line is the depth D + 1 of the B tree until the row positioner is found. If multiple rows are returned, the number of I/o is D + the number of pages on the index page of all Mazu conditions P + the number of rows returned H. If the table where the index is located is clustered, then the SQL
Server uses the clustered index key to locate data. Therefore, in this case, the number of I/O Records returned for one row is the depth D + of B tree that finds the key of the clustered index. The IO number of multiple rows returned is D + P + H * D1.
When looking for data based on a non-clustered index, there is another situation, that is, if the data column is included in the index's key value, or include the key value of the index + the key value of the clustered index, bookup lookup will not occur, because the required data has been found when the index item is found, there is no need to go to the data row again. In this case, it is called index coverage.
Now let's take an example.
There is a table like this:
Employees (employeeid, employeename, sex, birthday, photofile, enterdate, provinceid, cityid, address, postcode, idcardno ). Here, employeeid is the primary key and a clustered index pk_employeeid is created based on him. Non-clustered index values, ix_birthday, ix_birthday, ix_enterdate, and ix_postcode are created on employeename, birthday, enterdate, ix_idcardno.
If we use such a statement to query:
Select * from employees where employeeid = 'c054965'
Select employeeid from employees where employeename = 'Liu yonghong'
Bookmark lookup will not occur. If the following statement is used, bookupmark lookup will occur:
Select sex from employees where employeename = 'Liu yonghong'
Let's take a look at the explanations in the books online.
"Bookmark lookup logical operators and physical operators use bookmarks (row ID or clustering key) to find corresponding rows in a table or clustered index ."
For the select sex from employees where employeename = 'Liu yonghong 'statement, the server first finds the row locator corresponding to "Liu yonghong" on the non-clustered index ix_employeename -- "c054965 ", then, based on the value in the clustered index pk_employeeid, find the data row corresponding to "c054965" and return the value of sex-"male. When we use select employeeid from employees where employeename = 'Liu yonghong ', because the employeeid is included in the key value of the clustered index pk_employeeid, No bookmark is required.
Lookup, but you can return it directly.
However, the Select sex from employees where employeename = 'Liu yonghong 'is different because sex is not included in the pk_employeeid key value or in the key value of employeename, therefore, you must perform further search based on the row positioner-"c054965.
If we remove the clustered index pk_employeeid, when the server executes select sex from employees where employeename = 'Liu yonghong, first, find the row locator corresponding to "Liu yonghong" on the non-clustered index ix_employeename -- pointer to the data row corresponding to employeename = 'Liu yonghong, then return the sex of the row-"male ".
Of course, if we execute select * from employees where sex = 'male', there will be no bookmark lookup, but a direct table scan, whether or not the table employees has a clustered index.
Here, we can draw some interesting conclusions:
When a clustered table is queried using a non-clustered index, its performance is lower than when a non-clustered index is used on the stack.
Query performance comparison:
A large number of rows are returned: index coverage> clustered index> table scan> non-clustered index of the stack> clustered non-clustered Index
Few rows returned: Index overwrite = clustered index> non-clustered index of the stack> clustered non-clustered index> table Scan
Therefore, understanding the storage structure of tables is of great significance for writing efficient queries and building efficient indexes.