Introduction to SQL Server indexing: SQL Server index-level ladder
David Durant, 2014/11/05 (first published: 2011/02/17)
The series
This article is part of the Stair series: SQL Server index Ladder
Indexes are the basis of database design and tell developers to use the database to understand the designer's intentions very well. Unfortunately, when performance problems arise, indexes are often added to the post-mortem. This concludes with a simple series of articles that should enable any database professional to quickly "keep up" with their pace
The first level introduces SQL Server indexing: Enables SQL Server to find and/or modify database objects for request data in the least amount of time, using minimal system resources to achieve maximum performance. A good index also allows the SQL Server to achieve maximum concurrency, so queries run by one user have little impact on queries run by other users. Finally, the index provides an efficient way to enforce data integrity, guaranteeing the uniqueness of key values when creating a unique index. This level is an introduction; it covers concepts and usages, but leaves physical details at a later level.
For database developers, an in-depth understanding of the index is important for more than one reason: When a SQL Server request arrives from the client, SQL Server has only two possible ways to access the requested row:
It can scan each row in the table that contains the data, starting with the first line, and checking each row to see if it meets the request criteria, until the last line.
Or, if a useful index is available, it can use the index to locate the requested data.
The first option is always available to SQL Server. If you have instructed SQL Server to create a useful index, then the second option is available only, but it can lead to significant performance improvements that we will demonstrate later on at this level.
Because indexes have overhead associated with them (they occupy space, they must be kept in sync with tables), they do not require SQL Server. It is possible to have a database that is completely non-indexed. It may behave poorly and it will certainly have data integrity issues, but SQL Server will allow it.
However, this is not what we want. We all want a well-performing database with data integrity, while keeping the index overhead to a minimum. This level will enable us to move towards this goal.
Sample Database
Throughout the staircase, we will use examples to illustrate the key concepts. These examples are based on the Microsoft AdventureWorks sample database. Our focus is on the functionality of the sales order. 5 tables will provide a good mix of transactional and non-transactional data, customers, sales people, products, sales orders, and sales details. To stay focused, we used a subset of the columns.
AdventureWorks is standardized, so sales personnel information is broken down into three tables, salespeople, employees, and contacts. For some examples, we treat them as a single table. We will use the complete set of tables as well as the relationship between them as shown in 1.1.
Figure 1.1: AdventureWorks table to be used on this staircase
Note:
All TSQL code displayed at this step level can be downloaded along with the article (see the link at the bottom of this article)
What is an index?
We start with a short story of our index study, which uses an old, but proven technique, which we will refer to in this article to introduce the basic concepts of indexing.
You leave home to do the work. When you come back, you'll get a message from your daughter's softball coach. Three girls, Tracy, Rebecca and Amy have lost their team hats. Could you turn around at the sporting goods store and buy a hat for the girls? Their parents will reimburse you in the next game.
You know those girls, and you know their parents. But you don't know the size of their hats. There are three homes somewhere in your town, each containing the information you need. No problem, just call your parents and take out the size of your hat. You hold your phone and reach for an index-the white page of your phone book.
The first place you need to reach is Helen Mayer. It is estimated that "Meyer" will be in the middle of the population, you will jump to the middle of the white page, only to find you on the page, the title says "Kline-koerber". You jump forward and reach the "Nagle-nyeong" page. On the "Maldonado-nagle" page, you can see a smaller jump. When you realize that you're on the right page right now, you'll go down the page until you reach the "Meyer,helen" line and get the phone number. Using the phone number, you can reach Meyer residence and get the information you need.
You repeat this process two times to reach the other two residences, and then get two hat sizes.
You have just used an index and are using it the same way that SQL Server uses indexes, because there are a lot of similarities and differences between white pages and SQL Server indexes.
In fact, the index you just used represents the two SQL Server index types supported by SQL Server: clustered and non-clustered. White Pages best represent the concept of nonclustered indexes. Therefore, at this level, we introduce a nonclustered index. The subsequent levels are introduced into the cluster index, and the two types are dug in depth.
Nonclustered indexes
White pages are similar to nonclustered indexes because they are not an organization of the data itself, but rather a mechanism, or a map, to help you get the data. The data itself is the person we need to contact. The telephone company did not make the town's house a meaningful order, moving the house from one place to another so that all the girls lived next to the same softball team, and the house was not organized by the surname of the inhabitants. Instead, it will give you a book with the entrance to each house. These entries are sorted by the search key for the white page, the last name, first name, middle name, and street address. Each entry contains a search key and a block of data that allows you to access the residence;
As with white pages, each entry in the SQL Server nonclustered index consists of two parts:
Search key, such as last name-first name-middle name. In SQL Server terminology, this is the index key.
Bookmarks with the same purpose as phone numbers, allowing SQL Server to navigate directly to the row in the table corresponding to that index entry.
Additionally, SQL Server non-clustered index entries have some internal header information that may contain optional information. Both of these methods will be discussed at a later level, and at this point, the understanding of nonclustered indexes is not very important.
As with white pages, a SQL Server index is maintained in the search key sequence to access any particular entry in a small set of "jumps." Given a search key, SQL Server can quickly get the index entry for that key. Unlike white pages, SQL Server indexes are dynamic. That is, each time you add a row, delete, or modify a search key column value, the SQL Server updates the index.
As in white pages, the order of entries differs from the geographical order in towns, and the sequence of entries in a nonclustered index is not the same as the sequence of rows in a table. The first entry in the index might be the last row in the table, and the second entry in the index might be the first row in the table. If the fact is not like an index, its entries always have a meaningful sequence; the rows of a table can be completely unordered.
When you create an index, SQL Server generates and maintains an entry in the index of each row in the underlying table (the exception to this general rule is encountered at a later level when we overwrite the filtered index). You can create more than one nonclustered index on a table, but you cannot have an index that contains data from multiple tables.
The biggest difference is that SQL Server cannot use the phone. It must use the information in the bookmark section of the index entry to navigate to the corresponding row in the table. This is required whenever SQL Server needs any information in the data row, rather than the corresponding index entry, such as the size of the softball cap for Tracy Meyer. So, for a better analogy, a white page entry contains a set of GPS coordinates instead of a phone number. Then use GPS coordinates to navigate to the domicile represented by the white page entry.
Create and benefit from nonclustered indexes
We end this level by querying the sample database. Make sure that you are using the AdventureWorks version of SQL Server 2005, which can be used by SQL Server 2008. The AdventureWorks2008 database has a different table structure, and the following query will fail. We run the same query every time, but before we create an index on the table, the first execution will occur and the second execution will be after we create an index. Each time, SQL Server tells us how much work has been done to retrieve the requested information. We will look for the "Helen Meyer" row in our contact form (her row is in the middle of the table). Initially, the table has no index on either the FirstName column or the LastName column. To ensure that you can run this example multiple times, make sure that we will not present the index in the third batch, by running the following code:
EXISTS (sys.indexesobject_id(' person.contact 'FullName ') Person.Contact.FullName
Listing 1.1-ensuring The index does not exist
Our task would require four SQL command batches.
The first command batch:
OnGO
Listing 1.2-turning on statistics
The above batch informs SQL Server, we want our queries to return performance information as part of the output.
The second command batch:
* person.contact ' Helen ' Meyer '; GO
Listing 1.3-Retrieving some data
The second batch retrieves the "Helen Mayer" line:
584. Helen Mayer [email protected] risk work site 0-519-555-0112
Add the following performance information:
Table "Contact". Scan count 1, logical read 569.
Execution time: CPU time = 3 milliseconds.
This output tells us that our request executes 569 logical iOS and requires about 3 milliseconds of processor time. The value of the processor time may be different.
The third batch of commands:
FullName person.contact LastName); GO
Listing 1.4-Creating a nonclustered index
This batch creates a nonclustered composite index on the first and last name columns of the Contact table. A composite index is an index that contains multiple columns to determine the index row sequence.
Fourth batch of commands:
select * from person.contact where firstname = and LastName = ' Meyer ' ; GO
Listing 1.3 (once again)
This last batch is the re-execution of our original SELECT statement. We get the same line as before; but this time the performance statistics are different
Table "Contact". Scan count 1, logical read 4.
SQL Server execution time: CPU time = 0 Ms.
This output tells us that our request only requires 4 logical iOS, and requires an immeasurable amount of processor time to retrieve the "Helen Meyer" line.
Conclusion
Creating a good index can greatly improve database performance. On the next level, we'll start studying the physical structure of the index. We will look at why this nonclustered index is so beneficial to this query and why it may not always be the case. Future levels will include other types of indexes, additional benefits for indexes, index-related costs, monitoring and maintenance indexes, and best practices, all of which are intended to provide you with the knowledge necessary to create the best indexing scheme for tables in your own database.
Translation: SQL Server index-level ladder