Principles and Applications of SQL Server Indexes

Last Update:2018-06-01 Source: Internet

Author: User

Tags filegroup

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The usage of index: Data Query and processing speed has become a benchmark for measuring the success or failure of the application system. Using indexes to speed up data processing is usually the most widely used optimization method. What is an index: the index in a database is similar to a directory in a book. Using a directory in a book can quickly find the information you want without having to read completely

Index concept

The purpose of indexing: Data Query and processing speed has become a benchmark for measuring the success or failure of the application system. Using indexes to speed up data processing is usually the most widely used optimization method.

What is an index: indexes in a database are similar to directories in a book. Using directories in a book can quickly find the information you want without reading the book. In a database, the database program can use indexes to duplicate the data in the table without scanning the entire table. The directory in the book is the page list of a word and each word, and the index in the database is the list of values in the table and the storage location of each value.

Advantages and disadvantages of indexes: Most of the overhead of query execution is I/O. One of the main objectives of improving performance with indexes is to avoid full table scanning, because full table scan needs to read every data page of the table from the disk, if there is an index pointing to the data value, the query only needs to read a few disks. Therefore, the rational use of indexes can accelerate data queries. However, indexing does not always improve the system performance. Indexed tables need to occupy more storage space in the database, the Command run time for adding or deleting data and the processing time for maintaining indexes will be longer. Therefore, we need to properly use indexes and promptly update and remove sub-optimal indexes.

Basic Structure of a data table

When a new table is created, the system allocates a continuous space in 8 K units to the disk. When the field value is written to the disk from the memory, the database pointer is automatically allocated with an 8 K space when an 8 K space is used up. Here, each 8 K space is called a data Page, also known as a Page or data Page, and is allocated with a Page number ranging from 0 to 7. The Guide information is recorded on the 0th Page of each file, it is called a File header. Each 8 Data Pages (64 K) combine to form an extended area (Extent. A Heap is formed by a combination of all data pages ).

SQLS specifies that the row cannot span data pages. Therefore, the maximum data volume recorded in each row can only be 8 KB. This is the reason why the capacity of the char and varchar string types must be limited to 8 KB. The text type should be used to store data that exceeds 8 KB. In fact, text Field Values cannot be directly input or saved. It only stores a pointer and points to an extended area consisting of several 8 K text data pages, the real data is placed on these data pages.

Pages can be divided into space pages and data pages.

When the eight data pages in an extended area contain both the spatial page and the data or index page, it is called the Mixed extension (Mixed Extent). Each table starts with a Mixed extension; on the contrary, it is called the Uniform extension (Uniform Extent), which stores data and index information.

When a table is created, SQLS allocates at least one data page to it in the hybrid expansion. As the data volume increases, SQLS can allocate seven pages in the hybrid expansion, when the data exceeds 8 pages, the data page is allocated from the consistent expansion.

The space Page is responsible for data space allocation and management, including the PFS Page (Page free space ): record whether a page has been allocated, is located in the hybrid expansion or consistent expansion, and the available space on the page. GAM page (Global allocation map) and SGAM page (Secodary global allocation map ): used to record the location of idle extensions or hybrid extensions that contain idle pages. SQLS uses these three types of page files to create new spaces for data tables when necessary;

The data page or index page stores data and index information. SQLS uses four types of data pages to manage tables or indexes: these are IAM pages, data pages, text/image pages, and index pages.

In WINDOWS, the physical location of each step of operations on the file is only known by the system. SQL SERVER follows this approach, in the process of inserting data, not only are the values of each field randomly stored in the data page, but also the location of each data page in the heap is only known by the system.

Why? As we all know, the reason why the OS can manage disks is that the File Allocation Table FAT (File Allocation Table) is first loaded when the system is started. It manages the File system and records all operations on the File, the system can run normally. Similarly, as a management system-level SQL SERVER, there is also such a FAT-like table, which is the Index distribution image page: IAM (Index Allocation Map ).

The existence of IAM makes it possible for SQLS to physically manage data tables.

The IAM page is allocated from the hybrid expansion and records the location of the eight initial pages and the location of the expansion area. Each IAM page can manage 512,000 data pages. If the data volume is too large, SQLS can also add more IAM pages, which can be located anywhere in the file. The first IAM page is called FirstIAM, which records the location of the subsequent IAM page.

Data Pages and text/image pages are opposite each other. The former stores non-text/image data because they both do not exceed 8 K capacity, the latter only saves text or image data that exceeds 8 KB. As the name suggests, the index page stores data information related to the index structure. Understanding the page helps us to understand how SQLS maintains indexes, such as page splitting and fill factor.

Split pages

Half of the data will be retained on the old page, while the other half will be placed on the new page, and the new page may be allocated to any available page. Therefore, frequent page splitting has serious consequences. This will cause a large amount of data fragments to the physical table, resulting in a sharp drop in I/O efficiency. Finally, stopping SQLS and re-indexing is our only choice!

Fill Factor

An index feature that defines the amount of available space on each page of the index. FILLFACTOR can be used to expand table data and reduce the possibility of page splitting. Fill factor is a percentage value from 0 to 100. If it is set to 100, the data page is filled. This setting is used only when no changes are made to the data (such as read-only tables. The smaller the value, the larger the free space on the data page. This reduces the need for page splitting during index growth. However, this operation requires more disk space. Improper filling factor setting reduces the database read performance. The reduction is inversely proportional to the filling factor setting value.

Index category

SQL SERVER has multiple index types.

Differentiation by storage structure: "clustered index (also called clustering index and Cluster Index)" and "clustered index (non-clustered index and non-clustered index )"

Data uniqueness: "unique index" and "non-unique index"

The number of key columns is different: "Single Column Index" and "Multi-column Index ".

Clustered Index

Clustered index sorts the actual data on a disk by one or more specified columns. Like the Chinese dictionary we use, it is a clustered index. For example, if we want to query "sheets", we will naturally go to the next hundred or ten pages of the dictionary. Then, it is searched out in alphabetical order. Microsoft's balanced binary tree algorithm is used here, that is, to first flip the book to about 1/2. If the page number to be searched is smaller than the page number on this page, it will flip the book forward to 1/4. Otherwise, turn the book back to 3/4, and so on, and further divide the page into smaller parts until the correct page number is displayed.

Because clustered indexes sort data, there cannot be multiple sorting methods, so one table can only create one clustered index. The establishment of such an index in scientific statistics requires at least 120% additional space for the table to store copies of the table and the intermediate page of the index, but its performance is almost always faster than other indexes.

In clustered indexes, data is physically arranged on the data page in order, and duplicate values are also arranged together. Therefore, the range check (bentween, <, ><=,> =) or when you use a group by or order by query, once the row with the first key value is found, it will be connected together and you do not need to perform further search, A wide range of scans can be avoided, greatly improving the query speed.

Non-clustered Index

The index created by sqlserver by default is a non-clustered index. Instead of reorganizing the data in the table, stores the index column values for each row and points them to the page where the data is located with a pointer. He looks for the words in the Chinese Dictionary Based on the 'radicals '. Even if the data is not sorted, his directory is more like a directory, the efficiency of data query is also improved without the need for full table scanning.

A table can have multiple non-clustered indexes. Each non-clustered Index provides different sorting orders based on different index columns.

Create an index

Syntax

CREATE [UNIQUE] [CLUSTERED| NONCLUSTERED ]INDEX index_name ON { table | view } ( column [ ASC | DESC ] [ ,...n ] )[with[PAD_INDEX][[,]FILLFACTOR=fillfactor][[,]IGNORE_DUP_KEY][[,]DROP_EXISTING][[,]STATISTICS_NORECOMPUTE][[,]SORT_IN_TEMPDB]][ ON filegroup ]

Parameters for creating an INDEX using the create index Command are described as follows:

UNIQUE: used to create a UNIQUE index for a table or view. Two rows with the same index value are not allowed.

CLUSTERED: used to specify the created index as a CLUSTERED index.

NONCLUSTERED: used to specify the created index as a non-clustered index.

Index_name: used to specify the name of the created index.

Table: the name of the table used to create the index.

View: Specifies the name of the view for which an index is created.

ASC | DESC: used to specify the ascending or descending order direction of a specific index column.

Column: used to specify the Column to be indexed.

PAD_INDEX: used to specify the open space for each page (node) at the intermediate level of the index.

FILLFACTOR = fillfactor: used to specify the percentage of data on each index page during index creation. The value of fillfactor is 1 to 100.

IGNORE_DUP_KEY: used to control the SQL Server response when duplicate data is inserted into a column that contains a unique clustered index.

DROP_EXISTING: used to specify whether to delete and recreate the named existing clustered index or non-clustered index.

STATISTICS_NORECOMPUTE: used to specify the expired index statistics and will not be automatically recalculated.

SORT_IN_TEMPDB: used to specify the intermediate sorting results when an index is created. The intermediate sorting results are stored in the tempdb database.

ON filegroup: Specifies the file group for storing indexes.

Example:

-- Create a non-clustered index named idx_mobiel in the bigdata table, and the index field is mobielcreate index idx_mobielon bigdata (mobiel). -- create a unique clustered index named idx_id in the bigdata table, the index field is id. duplicate values must be ignored when data is inserted in batches. Statistical information is not recalculated. the fill factor is 40 create unique clustered index idx_idon bigdata (id) with pad_index, fillfactor = 40, ignore_dup_key, statistics_norecompute

Manage Indexes

Exec sp_helpindex BigData -- view the index definition Exec sp_rename 'bigdata. idx_mobiel ', 'idx _ big_mobiel' -- change the index name from 'idx _ El 'to 'idx _ big_mobiel' and drop index BigData. idx_big_mobiel -- delete idx_big_mobiel index dbcc showcontig (bigdata, idx_mobiel) in the bigdata table -- check the fragmentation information dbcc indexdefrag (Test, bigdata, idx_mobiel) in the index idx_mobiel of the bigdata table) -- sort out the fragmentation update statistics bigdata on the index idx_mobiel of the bigdata table in the test database -- update the statistics of all the indexes in the bigdata table.

Index Design Principles

For a table, whether or not the index has been created or not depends on the where clause and Join expression.

The indexing principles include:

Generally, the system automatically creates a clustered index for the progressive field.
Clustering indexes should be created for columns with a large number of duplicate values and with frequent range queries and sorting and grouping, or frequently accessed columns.
To create an index in a table that is frequently inserted, use fillfactor to reduce page split and improve concurrency to reduce the occurrence of deadlocks. If the table is read-only, the fill factor can be set to 100.
When selecting the index key, use a column of the small data type as the key to make each index page contain as many index keys and pointers as possible. In this way, you can minimize the number of index pages that must be traversed by a query. In addition, try to use an integer as the key value, because the integer access speed is the fastest.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More