SQL Server Topic One: Index (top)

Last Update:2015-09-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

written in front of the crap : The index problem has been a cliché, although it is often said, but as I have not used the index (very scary bar), I as a MS-BI implementation engineer incredibly useless index, speak naturally not emboldened. Prior to the understanding of the index to stay in the "know" point of view, with the passage of time feel not really to apply the index is simply slag! End of the crap, start indexing the topic, in order to thoroughly (I am not a DBA, can use the index to understand proficiency, so do not thoroughly) understand the index, learn from the data page! This article is the main source of MS online documentation!

Page and District system structure

Page: The basic unit of data storage in SQL Server is the page. The disk space allocated for data files in the database can be logically divided into pages (numbered from 0 to n consecutively). The I/O operations on the disk are performed at the page level. This means that SQL Server reads or writes all data pages.

How to understand it? As a storage unit, here's an inappropriate metaphor: a pig farm with many pigsty, each pigsty as a page for storing data. As a breeder, I suppose I'm buying and selling pig pigs, and I'm doing the pig right, but actually we're buying and selling on a pigsty, first of all, I'll number the pigsty, and then I'll buy the pigs in different numbers. So it could be understood that I kept the pig (data) in a pigsty (page) with different numbers (logically divided into 0-to-n sequential numbers), and I traded (I/O operations) The pigs, all from each pigsty.

The size of the page is 8K. The beginning of each page is a 96-byte header that stores information about the page. This information includes the page number, the page type, the available space for the page, and the allocation unit ID of the object that owns the page.

This sentence good understanding: Each pigsty can raise a few pigs (8k), each pigsty has a pigsty card, pigsty card write something? Of course to write: The number of pigsty, is the male pigsty, female pigsty, or Piggy Circle, has raised a few pigs and the head of the pigsty contact.

The following is a list of the types of data pages in MSSQL (male pigsty, female pigsty, or Piggy circle):

Page type	Content
Data	When the text in row is set to ON, contains data rows for all data except text, ntext, image, nvarchar (max), varchar (max), varbinary (max), and XML data.
Index	The index entry.
Text/image	Large Object Data type: · Text, ntext, image, nvarchar (max), varchar (max), varbinary (max), and XML data. A variable-length data type column when the data row exceeds 8 KB: · varchar, nvarchar, varbinary and sql_variant
Global Allocation map, Shared Global Allocation map	Information about whether the zone is assigned.
Page Free Space	Information about page assignments and the available space for the page.
Index Allocation Map	Information about the extents used by the table or index in each allocation unit.
Bulk Changed Map	Information about the extents modified by bulk operations in each allocation unit since the last BACKUP LOG statement.
Differential Changed Map	Information about the extents in each allocation unit that were changed since the last BACKUP DATABASE statement.

Come down and look at the pigsty:

The above is a sample diagram of a page: note that on the data page, the data row is placed in order immediately after the header, the end of the page is the row offset table, for each row in the page, each row offset table contains an entry, each entry records the distance of the first byte of the corresponding row from the top of the page (the line offset has not Assuming that each pig does not move in the pigsty, each pig is at the distance from the Pigsty gate is the offset, of course, this distance-the line offset is also like a pigsty card is recorded on another "brand", but it does not follow the order of 123 records, is flashbacks). The order of the entries in the row offset table is the opposite of the order of the Rows in the page (as for why it is the opposite order, I guess from the data structure, and so on a deep point of view, for the moment to write this line offset pig brand, standing in the end of the pigsty, first saw the last pig, so inverted record, not quasi- ）。

At this point, this pig sty seems to be able to raise pigs, but, an extreme situation? This pig, mutated, pig leg super big, a pigsty can't keep what to do! Attention, attention, look at Ms How to do: line can not cross the page, but the line of the part may move the page, so the actual line may be very large (see a pig without legs, do not abandon it small, the leg is thicker than the elephant!) ）。 Yes, cut off the legs of the pig and put it alone ... Here's how MS is done: the maximum amount of data and overhead in a single row of a page is 8060 bytes (8K), but does not include data stored with the Text/image page type (why not include these types of pages?). The reason is simple, some pig pens are specially designed to store super-large pigs, is the whole pig, not a pig leg. ), tables that include varchar, nvarchar,varbinary, or sql_variant columns are not constrained by this restriction. When the total size of all fixed and variable columns in a table exceeds the limit of 8k, SQL Server dynamically moves one or more variable-length columns to the page in the Row_overflow_data allocation unit, starting from the largest column. (Readme: Move a column in a table to a different page). This will be done whenever the insert or update operation increases the total size of the row to a limit of 8,060 bytes. When you move a column to a page in a Row_overflow_data allocation unit, a 24-byte pointer is maintained on the original page in the In_row_data allocation unit. If subsequent operations reduce the size of the rows, SQL Server dynamically moves the columns back to the original data page.

District: A district is the basic unit of management space. is a collection of eight physically contiguous pages that are used to manage pages efficiently, and that all pages exist in the zone. To make space allocations more efficient, SQL Server does not differentiate all of the tables that contain a small amount of data. This is a good understanding: 8 Pigsty is a big pigsty. One or two pigs, MS is not going to put it in a big pigsty. SQL Server has two types of zones: the uniform zone, which is owned by a single object. All 8 pages in the zone can only be used by the owning object. Mixed area, can be shared by up to eight objects. Each page of page eight in the zone can be owned by a different object. The uniform area is equivalent to a pig in a pigsty from the same farmer. The mixed area is the equivalent of pigs from different farmers.

Admin area allocation and free space
Most of the distribution information is not chained together. This simplifies the maintenance of distribution information. You can quickly perform allocation or deallocation of each page. This reduces contention between concurrent tasks that require the allocation of pages or the release of pages.

Allocation of the Management area SQL Server uses two types of allocation mapping tables to record the allocation of extents (below is a good understanding of how a pig farm manages a large number of pigsty, and is sure to try to record useful information to use the pigsty properly):

Global Allocation Mapping Table (GAM): GAM page records the allocated extents. Each GAM contains 64,000 extents, which is equivalent to nearly 4 GB of data. GAM uses a bit to represent the state of each area within the covered range. If the bit is 1, the zone is available, and if the bit is 0, the extents are assigned.

Shared Global Allocation Mapping table (SGAM): SGAM page records the extents that are currently used as mixed extents and have at least one unused page. Each SGAM contains 64,000 extents, which is equivalent to nearly 4 GB of data. SGAM uses a bit to represent the state of each area within the covered interval. If the bit is 1, the area is being used as a blending area and there are pages available. If 0, the zone is not used as a blending zone, or although it is used as a mixed area but all of its pages are in use.

The Trace free space page free space (PFS) page records the allocation status of each page, whether a single page has been assigned and the amount of free space per page. PFS has a byte for each page that records whether the page has been allocated. If assigned, the page is logged as empty, full 1% to 50%, full 51% to 80%, full 81% to 95%, or full 96% to 100%. After the assigned object is differentiated, the database engine uses the PFS page to record which pages in the zone are allocated or which pages are available. This information is used when the database engine must allocate a new page. The amount of free space in a reserved page is used only for heaps and text/image pages. The database engine must find a page with free space to hold the newly inserted row, using this information. The index does not require tracking free space on the page, because the point at which the new row is inserted is set by the index key value.

In a data file, the PFS page is the first page after the header page of the file (page 1). Next is the GAM page (page 2), followed by the SGAM page (Page 3). The first PFS page is followed by a PFS page that is approximately 8,000 pages in size. After the first GAM page on page 2nd there is another GAM page (with 64,000 extents) and another SGAM page (with 64,000 extents) after the first SGAM page on page 3rd. Displays the page order used by the database engine for allocation and management.

The following paragraphs are difficult to understand: my understanding, estimation is wrong!

(IAM: is used to record which pages in a zone come from the same object, such as pigs in a unified area are coming to a farmer, and that data is recorded in an IAM page.) ）

The Index allocation Mapping (IAM) page maps the extents in the 4 GB portion of the database file used by the allocation unit. There are three types of allocation units:

The in_row_data is used to store heap partitions or index partitions.

Lob_data is used to store large object (LOB) data types, such as XML, varbinary (max), and varchar (max).

The row_overflow_data is used to store variable-length data stored in a varchar, nvarchar, varbinary, or sql_variant column that exceeds the 8,060-byte row size limit.

Track modified extents
SQL Server uses two internal data structures to track the extents modified by the bulk copy operation and the extents that have been modified since the last full backup. These data structures greatly speed up differential backups. These data structures can also speed up the logging of bulk copy operations to the log when the database uses the bulk-logged recovery model. The same as the global Allocation Diagram (GAM) and shared Global allocation diagram (SGAM) pages, these are bitmaps, each of which represents a separate area.
Differential Change mapping table (DCM)

This allows you to track the extents that have changed since the last BACKUP DATABASE statement was executed. If the extents are bits 1, the extents have been modified since the last BACKUP DATABASE statement was executed. If the bit is 0, the extents are not modified. A differential backup reads only the DCM pages to determine which extents have been modified. This greatly reduces the number of pages that a differential backup must scan. The time required to run a differential backup is proportional to the number of extents modified since the last backup DATABASE statement, rather than to the size of the entire databases.

Large Capacity Change mapping table (BCM)
Tracks the extents modified by bulk-logged operations since the last BACKUP LOG statement was executed. If the bit of an extent is 1, it indicates that the extents have been modified by a logging bulk copy operation since the last BACKUP log statement was executed. If the bit is 0, the extents are not modified by the bulk copy operation with logging. Although BCM pages are displayed in all databases, the BCM pages are only relevant when the database uses the bulk-logged recovery model. In this recovery model, when you execute backup LOG, the backup process scans the BCM to find the area that has been modified. Those zones are then included in the log backup. Bulk-Logged operations can be resumed if the database is recovered from a database backup and a series of transaction log backups. In databases that use the simple recovery model, BCM pages are irrelevant because bulk-logged operations are not logged. In databases that use the full recovery model, BCM pages are also irrelevant because the recovery model treats bulk-logged operations as having full logging operations. The interval between the DCM page and the BCM page is 64,000 extents from the same interval as the GAM and Sgam pages. In physical files, the DCM and BCM pages are located after the GAM and SGAM pages.

Write some of the accompanying knowledge below!

File and Filegroup architecture

SQL Server maps the database to a set of operating system files. data and log information will never be mixed in the same file , and a file is used by only one database. Filegroups are named collections of files that are used to help with data layout and management tasks.

There are three types of files in the SQL Server database:

Master data file: Is the starting point of the database, pointing to other files in the database. Each database has a master data file (. Mdf

Secondary data files: Other than the primary data file, the other is the secondary data file. Some databases may not contain secondary data files (. ndf).

Log file: Used primarily to recover all information from a database. Each database must have a log file, and of course there can be multiple (. ldf).

In SQL Servcer, the locations of all the files in the database are recorded in the database master file and in the master database.

Most of the time, the SQL Server database uses the file location in the master database, but under the circumstances, the database engine initializes the file location entries in the master database with the file location information of the primary file:

1. When attaching a database using the Create DATADASE statement with the Forattache or for Attach_rebuild_log option.

2. When upgrading from version SQLSERVER2000 version 7.0.

3. Restore the master database.

data file page: the pages in the SQL Server data file are numbered sequentially and the first page of the file starts with 0. Each file in the database has a unique file ID number. To uniquely identify a page in a database, you need to use both the file ID and the page number.

Transaction log Architecture

1 Transaction log Logical architecture

The SQL Server transaction log runs logically as if the transaction log is a string of log records. Each log record is identified by a log sequence number (LSN). Each new log record is written to the logical end of the log and uses a higher LSN than the previously recorded LSN. Logging is stored as a serial sequence when it is created. Each log record contains the ID of the transaction to which it belongs. For each transaction, all the log records related to the transaction are linked in one chain by using a pointer to the speed at which the transaction is rolled back.

2 Transaction Log Physical architecture

Conceptually, a log file is a series of log records. Physically, the sequence of log records is stored in a physical set of files that implement the transaction log. The Sqlserer database engine internally divides each physical log file into multiple virtual log files. A transaction log is a wrapping file. For example: a database that contains a physical file divided into four virtual log files. When the database is created, the logical log file starts at the beginning of the physical log file. The new log record is added to the end of the logical log and then expands to the end of the physical log. Log truncation frees all virtual logs that were recorded before the minimum recovery log sequence number (MInLSN). "MInLSN" is the log sequence number of the oldest log record that is required for a successful database-wide rollback. When the end of the logical log reaches the end of the physical log file, the new log record wraps around to the beginning of the log file. This loop repeats repeatedly as long as the end of the logical log does not reach the beginning of the logical log.

* The checkpoint will flush the dirty data page from the current database's buffer cache to disk. This minimizes the portion of the activity log that must be processed when the database is fully restored.

SQL Server Topic One: Index (top)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL Server Topic One: Index (top)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL Server Topic One: Index (top)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support