First, the concept of the index
The purpose of the index: our data query and processing speed has become a measure of the success of the application system standards, and the use of indexes to speed up data processing speed is often the most commonly used optimization method.
What the index is: the index in the database is similar to a book's directory, and the use of a catalog in a book allows you to quickly find the information you want without having to read through the books. In a database, a database program can use an index to weigh the data in a table without having to scan the entire table. The catalog in the book is a word and a list of the page numbers where the words are, and the indexes in the database are the values in the table and a list of where each value is stored.
The pros and cons of the index: most of the overhead of query execution is I/O, and one of the primary goals of using indexes to fetch high performance is to avoid full table scans, because full table scans need to read every data page of a table from disk, and if an index points to data values, then the query needs to read only a few disks at a time. Therefore, reasonable use of the index can speed up the data query. However, indexes do not always improve the performance of the system, and indexed tables require more storage space in the database, as well as the time required to delete and remove data, and to maintain the index for longer processing time. So we should use the index reasonably, update the sub-optimal index in time.
second, the basic structure of the data table
When a new table is created, the system allocates a contiguous space of 8K in the disk, and when the value of the field is written to disk from memory, it is stored randomly in this given space, and when a 8K runs out, the database pointer automatically allocates a 8K of space. Here, each 8K space is called a data page (page), aka pages or data pages, and assigns a page number from 0-7, the No. 0 page of each file records the boot information, called the header (file header), and every 8 data pages (64K) are combined to form an expansion area (Extent), called an extension. The combination of all the data pages forms the heap (heap).
SQLS Specifies that rows cannot span data pages, so the maximum amount of data recorded per row is only 8K. That's why char and varchar are two types of string type that are limited to 8K, and storing more than 8K of data should use the text type, in fact, the field value of the text type cannot be entered and saved directly, it simply stores a pointer to an extension that consists of several 8K of text data pages , the real data is placed in these data pages.
The page has a space page and the data page points.
When the 8 data pages in an extent contain both a spatial page and a data or index page, called a mixed extension (Mixed Extent), each table starts with a mixed extension, and, conversely, is called a consistent extension (Uniform Extent), which stores data and index information specifically.
When the table is created, Sqls assigns it at least one data page in the hybrid extension, and as the amount of data grows, the SQLS can instantly allocate 7 pages in a hybrid extension, and when the data exceeds 8 pages, the data page is allocated from the consistent extension.
The space page is dedicated to the allocation and management of data spaces, including: PFS pages (page free space): Record whether a page is assigned, in a mixed or consistent extension, and how much space is available on the page; Gam page (Global allocation MAP) and Sgam page (secodary Global allocation map): Used to record the location of an idle extension or a mixed extension containing free pages. Sqls Comprehensive utilization of these three types of paging files creates a new space for the data table, if necessary;
A data page or index page is dedicated to saving data and index information, and SQLS uses 4 types of data pages to manage tables or indexes: IAM pages, data pages, text/image pages, and index pages.
In Windows, every step we perform on a file is only known to the system on the physical location on the disk, and SQL Server inherits this way of working, not only that each field value is randomly saved in the data page during data insertion, but that each data page The arrangement position in the heap is only known by the system.
What is this for? It is well known that the OS manages disk because the file allocation table is loaded first when the system starts: FAT (file Allocation table), which manages the filesystem and records all operations on the file, so that the system can function properly; As with the management system-level SQL SERVER, there is also a fat-like table exists, which is the index distribution image page: IAM (Index Allocation Map).
The presence of IAM makes sqls possible for the physical management of data tables.
An IAM page is assigned from a hybrid extension, records the location of the 8 initial pages, and the location of the extents, each IAM page can manage 512,000 data pages, and if the amount of data is too large, sqls can add more IAM pages that can be anywhere in the file. The first IAM page is called FirstIAM, which records the location of future IAM pages.
Data pages and text/image pages are mutually reversed, saving data of non-text/image types because they are not more than 8K in size, while the latter holds only text or image type data that is more than 8K in size. The index page, as its name implies, holds the data information related to the index structure. Understanding the issue of the page helps us to understand the next step of how Sqls maintains the index, such as page splits, fill factors, and so on.
three, page split
Half of the data will remain on the old page, and the other half will be placed on the new page, and the new page may be assigned to any available page. Therefore, frequent page splitting, the consequences are serious, will cause the physical table to produce a large number of data fragments, resulting in a sharp decline in the direct I/O efficiency, and finally, stop Sqls run and rebuild the index will be our only choice!
Iv. filling factor
An attribute of the index that defines the amount of free space on each page of the index. The FILLFACTOR (fill factor) adapts to the subsequent expansion of table data and reduces the likelihood of page splits. The fill factor is a percent value from 0 to 100, and when set to 100 means that the data page fills up. This setting is only used when no changes are made to the data, such as in a read-only table. The smaller the value, the greater the amount of free space on the data page, which reduces the need for page splitting during the index growth process, but requires more hard disk space to be consumed. Improper fill factor designation reduces the read performance of the database, which is inversely proportional to the fill factor setting value.
v. Classification of Indexes
There are multiple index types in SQL Server.
Differentiated by storage structure: Clustered index (also known as Cluster index, cluster index), clustered index (nonclustered index, non-clustered index)
Differentiated by Data uniqueness: "Unique index", "non-unique index"
The number of key columns is distinguished: "single column Index", "Multi-column index".
vi. Clustered Index
A clustered index is a sort of re-organization of actual data on disk to be sorted by a specified column or columns of values. Like the Chinese dictionary we used, is a clustered index, such as to look up "Zhang", we naturally turn to the back of the dictionary Baishi page. Then follow the alphabetical order to find out. Here, Microsoft's balanced binary tree algorithm, that is, the first to turn the book to about One-second of the position, if you want to find the page number is smaller than the number of pages, the book forward to One-fourth, otherwise, the book back to Three-fourths of the place, and so on, the pages continue to be divided into smaller parts, until the correct page number.
Because a clustered index is a sort of data, it is impossible to have multiple rows, so a table can only build one clustered index. Scientific statistics establishing such an index requires at least an additional space equivalent to that of table 120% to hold a copy of the table and an index intermediate page, but his performance is almost always faster than other indexes.
Because the data is physically arranged on the data page in the clustered index, the duplicate values are also sorted together, thus containing a range check (bentween,<,><=,>=) or a query using group by or order by, once the row for the first key value is found, The back will all be linked together, without further searching, avoiding a wide range of scans that can greatly improve query speed.
vii. Nonclustered Indexes
SQL Server creates an index that is not a clustered index by default, he does not reorganize the data in the table, but instead stores the index column values on each row and points to the page where the data resides. He looks like a Chinese dictionary based on the ' radicals ' to find the word, even if the data is not sorted, but he has a directory more like a directory, the efficiency of data extraction is also a room for improvement, without the need for a full table scan.
A table can have multiple nonclustered indexes, and each nonclustered index provides a different sort order based on the different indexed columns.
Recognize SQL Server indexes and differences between single-column and multicolumn indexes