About the management of database files, I often say, often mixed in the lake, which has not been the stick, with a long time, basically have encountered some database file management problems, such as:
1. SQL Server data file space full
2. log Files Skyrocket
3. File cannot be shrunk
4. How to automate file growth and auto-shrinking
This article revolves around these issues, of course, to be proficient in the management of database space, you need to first understand the relevant theoretical knowledge. First, let's take a look at the spatial management of data files.
Data File Space management
Each SQL Server database has a minimum of two operating system files: A data file and a log file. Data files contain data and objects, such as tables, indexes, stored procedures, and views. The log file contains the information required to recover all the transactions in the database. For ease of allocation and management, data files can be assembled and placed into filegroups. The basic unit of data storage in SQL Server is called a page. Each page is 8kb,sql server read or is the smallest unit of writing data is also the page, then 1MB has 128 pages. Rows cannot span pages (the maximum number of individual rows on a page is 8,060 bytes 8kb 1024*8), excluding page data of type Text/image, and for variable-length columns, if rows exceed 8060, move one or more variable-length columns to Row_, starting with columns of the maximum length Overflow_data the page in the allocation unit, maintaining a 24-byte pointer on the original page, and then moving back if the total size of the row is less than 8060. Processing time is prolonged when the query is executed, because the records are processed synchronously, not asynchronously.
As shown in the following:
But there is another concept, you know, called the District, the district refers to 8 physically contiguous pages of the collection, if the 8 physically contiguous pages belong to the same table, then this zone is called the Unified Zone, if the 8 pages belong to at least two different tables. This area is called a mixing zone.
Although each page is 8KB, it does not mean that 8KB is used to hold specific data, and each page has a 96-byte header at the beginning to store system information about the page, such as page numbers, page types, available space on the page, and the object ID that owns the page (that is, which object is used for this page). Different types of data, stored in different types of pages. As shown, the various page types in the data file and what is stored inside them are displayed:
In this case, we use more pages that are data and text/image types, and at the beginning of a data file there are many management pages such as: GM, SGAM, PFS. These pages are used by SQL Server to know which pages in this data file are already in use, which pages are not used, and so on.
When a table or an index requires more space, SQL Server needs to find the space to allocate. If the table or index as a whole is still less than 8 pages, SQL Server must find the space that can be used to allocate the mixed-type area. If a table or index has 8 pages or larger, SQL Server must find a free, uniform type of zone. Then SQL Server needs to know that the zone has been allocated, which zones can be used, which will use the Global Allocation Map page and the shared Global Allocation Map page, referred to as: Gam/sgam.
GAM records which areas have been allocated and for what purpose. A GAM page has a data bit for each area in the space it covers. If 1, then it is the idle area, which can be used to allocate, and if 0, the area is already in use. GAM can identify approximately 64,000 zones, or 4G of space. If it exceeds 4G, then a GAM page is enabled to identify the next 4G space.
Sgam records which zones are currently mixed and has at least one unused page. It can also identify 64,000 zones, approximately 4G of space. If the bit bit is identified as 1, it uses a mixed area and at least one page is available to allocate. If 0, the description is a uniform zone, or a mixed area, but there are no free pages.
Then SQL Server can easily find the desired page, if you need a new completely unused area, then you can use any one in the GAM page corresponding to the bit value of 1 of the area. If you need to find a zone with free space, such as a mixed type of one or more free pages, then it can look for a region with a value of 0 in the corresponding GAM and a value of 1 in Sgam. If there is no mixed-type zone with free space, SQL Server uses the GAM page to find a completely new area and assign it as a mixed-type zone, and then use a page in that area. If there is no free zone at all, then this file is full.
SQL Server is able to quickly lock the GAM page in a file because it is always on the third page of any database file (page 2). The Sgam page is on page fourth (Page 3). The next GAM page appears on each of the 511 230 pages after the first GAM page (page 2), and the next Sgam page appears on every 511 230 pages after the first SGAM page (Page 3). The page with page 0 for each database file is the file header page, and each file has only one page. Page 0 is the header file page, and Page 1 is the free Space page for pages Space,pfs. And the first eight pages of each database are fixed.
Below, we can view the page information of a database through the DBCC PAGE command, which has the following syntax:
DBCC page (database name | Database ID, file number, page number, output option)
Output Options: 0: Default, the title and page title of the output buffer, 1: the title of the output buffer, the page title (each row is output separately), and the row offset table, 2: the title of the output buffer, the page title (the overall output page), and the row offset table. 3: Displays the column values at the same time as the output header.
However, DBCC TRACEON (3604) must be enabled before this command is required.
A full page contains four sections, buffer, page header, data, offset table, which represent the cache, the header information of the page, the database, and the offset tables, respectively.
Buffer: The cache part that identifies the location of the page in memory.
Page Header: header information, including some important properties, such as: M_pageid for page numbering, m_headerversion for page header format, M_type for page type, such as: 1, indicates data page, 2 index page, 8 for GAM page, 9 for Sgam page, etc. ; M_typeflagebits: Type identification bit, basically not used; M_level is the level in the B-tree structure, the bottom level is 0;m_flagbits for page properties, such as 0x200 indicates that the page has checksum check; M_prevpage and m_ NextPage, the data pages of the same level in the B-tree structure are connected to each other through m_prevpage and m_nextpage; m_slotcnt indicates how many records are in the current page; m_freecnt indicates how much space is left in the current page, in bytes. M_LSN indicates that the last change in all records in the current page corresponds to the log record number.
The data section is generally divided into slot numbers (slots), which can be interpreted as a row of records if it is a page or index page, and SQL Server uniquely identifies each record in the table by its file number + page number + slot number. But in the GAM page we can interpret slot 0 as a reserved page for the GAM page, totaling 94 bytes.
The end of the page is the row offset table, and for each row in the page, each row offset table contains an entry. Each entry records the distance of the first byte of the corresponding row from the top of the page. The order of entries in the row offset table is reversed in the order of the rows in the page. As shown in the following:
We can see the first data row, that is, the last record in, the offset is 96, this is because the normal good front is 96 bytes of the head, followed by exactly the first record, the analysis of other records, and so on.
SQL Server database File management