Recognize SQL Server indexes and differences between single-column and multicolumn indexes

Last Update:2014-12-03 Source: Internet

Author: User

Tags filegroup

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

first, the concept of the index

The purpose of the index: our data query and processing speed has become a measure of the success of the application system standards, and the use of indexes to speed up data processing speed is often the most commonly used optimization method.

What the index is: the index in the database is similar to a book's directory, and the use of a catalog in a book allows you to quickly find the information you want without having to read through the books. In a database, a database program can use an index to weigh the data in a table without having to scan the entire table. The catalog in the book is a word and a list of the page numbers where the words are, and the indexes in the database are the values in the table and a list of where each value is stored.

The pros and cons of the index: most of the overhead of query execution is I/O, and one of the primary goals of using indexes to fetch high performance is to avoid full table scans, because full table scans need to read every data page of a table from disk, and if an index points to data values, then the query needs to read only a few disks at a time. Therefore, reasonable use of the index can speed up the data query. However, indexes do not always improve the performance of the system, and indexed tables require more storage space in the database, as well as the time required to delete and remove data, and to maintain the index for longer processing time. So we should use the index reasonably, update the sub-optimal index in time.

second, the basic structure of the data table

When a new table is created, the system allocates a contiguous space of 8K in the disk, and when the value of the field is written to disk from memory, it is stored randomly in this given space, and when a 8K runs out, the database pointer automatically allocates a 8K of space. Here, each 8K space is called a data page (page), aka pages or data pages, and assigns a page number from 0-7, the No. 0 page of each file records the boot information, called the header (file header), and every 8 data pages (64K) are combined to form an expansion area (Extent), called an extension. The combination of all the data pages forms the heap (heap).

SQLS Specifies that rows cannot span data pages, so the maximum amount of data recorded per row is only 8K. That's why char and varchar are two types of string type that are limited to 8K, and storing more than 8K of data should use the text type, in fact, the field value of the text type cannot be entered and saved directly, it simply stores a pointer to an extension that consists of several 8K of text data pages , the real data is placed in these data pages.

The page has a space page and the data page points.

When the 8 data pages in an extent contain both a spatial page and a data or index page, called a mixed extension (Mixed Extent), each table starts with a mixed extension, and, conversely, is called a consistent extension (Uniform Extent), which stores data and index information specifically.

When the table is created, Sqls assigns it at least one data page in the hybrid extension, and as the amount of data grows, the SQLS can instantly allocate 7 pages in a hybrid extension, and when the data exceeds 8 pages, the data page is allocated from the consistent extension.

The space page is dedicated to the allocation and management of data spaces, including: PFS pages (page free space): Record whether a page is assigned, in a mixed or consistent extension, and how much space is available on the page; Gam page (Global allocation MAP) and Sgam page (secodary Global allocation map): Used to record the location of an idle extension or a mixed extension containing free pages. Sqls Comprehensive utilization of these three types of paging files creates a new space for the data table, if necessary;

A data page or index page is dedicated to saving data and index information, and SQLS uses 4 types of data pages to manage tables or indexes: IAM pages, data pages, text/image pages, and index pages.

In Windows, every step we perform on a file is only known to the system on the physical location on the disk, and SQL Server inherits this way of working, not only that each field value is randomly saved in the data page during data insertion, but that each data page The arrangement position in the heap is only known by the system.

What is this for? It is well known that the OS manages disk because the file allocation table is loaded first when the system starts: FAT (file Allocation table), which manages the filesystem and records all operations on the file, so that the system can function properly; As with the management system-level SQL SERVER, there is also a fat-like table exists, which is the index distribution image page: IAM (Index Allocation Map).

The presence of IAM makes sqls possible for the physical management of data tables.

An IAM page is assigned from a hybrid extension, records the location of the 8 initial pages, and the location of the extents, each IAM page can manage 512,000 data pages, and if the amount of data is too large, sqls can add more IAM pages that can be anywhere in the file. The first IAM page is called FirstIAM, which records the location of future IAM pages.

Data pages and text/image pages are mutually reversed, saving data of non-text/image types because they are not more than 8K in size, while the latter holds only text or image type data that is more than 8K in size. The index page, as its name implies, holds the data information related to the index structure. Understanding the issue of the page helps us to understand the next step of how Sqls maintains the index, such as page splits, fill factors, and so on. three, page split

Half of the data will remain on the old page, and the other half will be placed on the new page, and the new page may be assigned to any available page. Therefore, frequent page splitting, the consequences are serious, will cause the physical table to produce a large number of data fragments, resulting in a sharp decline in the direct I/O efficiency, and finally, stop Sqls run and rebuild the index will be our only choice!

Iv. filling factor

An attribute of the index that defines the amount of free space on each page of the index. The FILLFACTOR (fill factor) adapts to the subsequent expansion of table data and reduces the likelihood of page splits. The fill factor is a percent value from 0 to 100, and when set to 100 means that the data page fills up. This setting is only used when no changes are made to the data, such as in a read-only table. The smaller the value, the greater the amount of free space on the data page, which reduces the need for page splitting during the index growth process, but requires more hard disk space to be consumed. Improper fill factor designation reduces the read performance of the database, which is inversely proportional to the fill factor setting value.

v. Classification of Indexes

There are multiple index types in SQL Server.

Differentiated by storage structure: Clustered index (also known as Cluster index, cluster index), clustered index (nonclustered index, non-clustered index)

Differentiated by Data uniqueness: "Unique index", "non-unique index"

The number of key columns is distinguished: "single column Index", "Multi-column index".

vi. Clustered Index

A clustered index is a sort of re-organization of actual data on disk to be sorted by a specified column or columns of values. Like the Chinese dictionary we used, is a clustered index, such as to look up "Zhang", we naturally turn to the back of the dictionary Baishi page. Then follow the alphabetical order to find out. Here, Microsoft's balanced binary tree algorithm, that is, the first to turn the book to about One-second of the position, if you want to find the page number is smaller than the number of pages, the book forward to One-fourth, otherwise, the book back to Three-fourths of the place, and so on, the pages continue to be divided into smaller parts, until the correct page number.

Because a clustered index is a sort of data, it is impossible to have multiple rows, so a table can only build one clustered index. Scientific statistics establishing such an index requires at least an additional space equivalent to that of table 120% to hold a copy of the table and an index intermediate page, but his performance is almost always faster than other indexes.

Because the data is physically arranged on the data page in the clustered index, the duplicate values are also sorted together, thus containing a range check (bentween,<,><=,>=) or a query using group by or order by, once the row for the first key value is found, The back will all be linked together, without further searching, avoiding a wide range of scans that can greatly improve query speed.

vii. Nonclustered Indexes

SQL Server creates an index that is not a clustered index by default, he does not reorganize the data in the table, but instead stores the index column values on each row and points to the page where the data resides. He looks like a Chinese dictionary based on the ' radicals ' to find the word, even if the data is not sorted, but he has a directory more like a directory, the efficiency of data extraction is also a room for improvement, without the need for a full table scan.

A table can have multiple nonclustered indexes, and each nonclustered index provides a different sort order based on the different indexed columns.

viii. use of indexes

1, the creation of the index

The syntax for creating an index in SQL Server is as follows:

CREATE [UNIQUE] [clustered| Nonclustered]INDEXIndex_name on{Table | View} (column [ASC | DESC] [,... N] )[With[pad_index][[,]FILLFACTOR=FillFactor][[,]Ignore_dup_key][[,]Drop_existing][[,]Statistics_norecompute][[,]Sort_in_tempdb]][On filegroup]

The CREATE INDEX command creates indexes for each parameter description as follows:

Unique: Used to specify that a unique index is created for a table or view, that is, two rows with the same index value are not allowed.

CLUSTERED: Used to specify that the index created is a clustered index.

Nonclustered: Used to specify that the index created is a nonclustered index.

Index_name: Used to specify the name of the index being created.

Table: Specifies the name of the table on which the index is created.

View: Specifies the name of the view that created the index.

asc| DESC: Used to specify an ascending or descending sort direction for a specific index column.

Column: Used to specify the columns to be indexed.

PAD_INDEX: Used to specify the space that remains open on each page (node) in the intermediate level of the index.

FILLFACTOR = FILLFACTOR: Used to specify that the data for each index page is a percentage of the index page size when the index is created, and the value of FILLFACTOR is 1 to 100.

Ignore_dup_key: Used to control the response of SQL Server when inserting duplicate data into a column that is contained in a unique clustered index.

Drop_existing: Used to specify that a named pre-existing clustered index or nonclustered index should be dropped and recreated.

Statistics_norecompute: Index statistics used to specify expiration are not automatically recalculated.

SORT_IN_TEMPDB: Used to specify that intermediate sorting results are stored in the TEMPDB database when the index is created.

ON filegroup: Used to specify the filegroup to hold the index.

2, the management of the index

ExecSp_helpindex Bigdata--View Index DefinitionsExecsp_rename'Bigdata.idx_mobiel','Idx_big_mobiel'  --Change index name from ' Idx_mobiel ' to ' Idx_big_mobiel 'Drop IndexBigdata.idx_big_mobiel--Remove the Idx_big_mobiel index from the Bigdata tableDBCCShowcontig (Bigdata,idx_mobiel)--Check the fragmentation information for index Idx_mobiel in the Bigdata tableDBCCIndexdefrag (Test,bigdata,idx_mobiel)--Defragment the index Idx_mobiel on the Bigdata table in the test databaseUpdate StatisticsBigdata--UPDATE STATISTICS for all indexes in the Bigdata table

3, the design principles of the index

For a table, it depends on the WHERE clause and the join expression to index the index.

The principles of indexing generally include the following:

The system typically automatically builds a clustered index for the gradual field.
Consider establishing a clustered index with a large number of duplicate values and frequently scoped queries and sorting, grouped columns, or frequently accessed columns.
In a table that often inserts operations, you should use FILLFACTOR (fill factor) to reduce page splits, while increasing the concurrency to reduce the occurrence of deadlocks. If the table is read-only, the fill factor can be set to 100.
When selecting an index key, use columns of the small data type as the key so that each index page can hold as many index keys and pointers as possible, which minimizes the index pages that a query must traverse, and, as much as possible, uses integers as key values because the integer accesses the fastest.

ix. Comparison of single-row and multi-column indexes

In SQL Server, indexes are divided into two, one-column indexes and one multi-column index. Multi-column index creation methods are similar to single-column indexes, but have multiple columns.

We created the Ix_stuid_stuname Multi-column index, which contains two columns of Stuid and stuname in the table.

Suppose there is such a people table:

CREATE TABLEpeople (PeopleidINT  not NULL, FirstNameNVARCHAR( -) not NULL, LastNameNVARCHAR( -) not NULL, AgeINT  not NULL,    PRIMARY KEY(Peopleid));

Here is the data we inserted into this people table:
The table contains four people named "Mikes" (two of them Sullivans, two surname McConnells), two are 17 years old, and Joe Smith, who has a distinctive name. The primary purpose of this table is to return the corresponding Peopleid based on the specified user name, first name, and age.

For example, we may need to find the Peopleid,sql statement for a user with the name Mikesullivan, age 17 years

 select  peopleid from  people where  firstname=   " mike   " and  lastname=   " sullivan   '

First, we can consider creating an index on a single column, such as FirstName, LastName, or the Age column. If we create an index of the FirstName column, the database will quickly limit the search to the records of those firstname= ' Mike ' through this index, and then search for other conditions on this "intermediate result set": It first excludes those LastName that are not equal to " Sullivan "Record, and then exclude records whose age is not equal to 17. The database returns the final search results when all the search criteria have been met for the record.
Because of the FirstName column index, MySQL is much more efficient than performing a full scan of the table, but we require that the number of logs scanned by MySQL still far exceeds what is actually needed. Although we can delete the index on the FirstName column, and then create an index of the LastName or age column, it seems that the efficiency of creating an index search is still similar, regardless of which column.
To improve search efficiency, we need to consider using multi-column indexes. If you create a multi-column index for the three columns of FirstName, LastName, and age, the database can find the correct results with a single retrieval!
　　So, if you create a single-column index on the three columns of FirstName, LastName, and age, will the effect be the same as creating a multicolumn index of FirstName, LastName, and age?

　　The answer is no, the two are totally different. When we execute the query, MySQL can use only one index. If you have three single-column indexes, MySQL will try to select one of the most restrictive indexes. However, even the most restrictive single-column index is limited in its ability to be significantly less than a multicolumn index on the three columns of FirstName, LastName, and age.

A multi-column index has another advantage, which is manifested by the concept of the leftmost prefix (leftmost prefixing). Continuing to consider the previous example, we now have a multi-column index on the FirstName, LastName, and age columns, which we call the index fname_lname_age. When the search condition is a combination of the following columns, the database uses the Fname_lname_age index:
Firstname,lastname,age
Firstname,lastname
FirstName
On the other hand, it is equivalent to the index we created (Firstname,lastname,age), (Firstname,lastname), and (FirstName) on these column combinations. The following queries all have the ability to use this Fname_lname_age index:

SELECTPeopleid fromPeopleWHEREFirstName='Mike'  andLastName='Sullivan'  andAge=' -'; SELECTPeopleid fromPeopleWHEREFirstName='Mike'  andLastName='Sullivan'; SELECTPeopleid fromPeopleWHEREFirstName='Mike';

The following queries are not able to use this Fname_lname_age index:

SELECTPeopleid fromPeopleWhereLastName='Sullivan'; SELECTPeopleid fromPeopleWhereAge=' -'; SELECTPeopleid fromPeopleWhereLastName='Sullivan'  andAge=' -';

Disclaimer: Most of this article comes from http://www.cnblogs.com/knowledgesea/p/3672099.html, single-row and multi-column index introductions from the network.

Recognize SQL Server indexes and differences between single-column and multicolumn indexes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More