As I needed to train my colleagues on database indexing knowledge, I collected and sorted out this series of blogs. This is a summary of index knowledge. After summing up, I found that many vague concepts were much clearer in the past.
Whether it is cached data information or physically saved information, their basic unit is data pages. Therefore, understanding data pages is the most basic knowledge point. This blog introduces some basic knowledge about data pages related to indexes.
Basic knowledge of data pages
The basic unit of data storage in SQL Server is page ). The disk space allocated by data files (. MDF or. NDF) in the database can be logically divided into pages (from 0 to N consecutive numbers ). Disk I/O operations are performed on pages. That is to say, the minimum data unit for SQL Server to read or write data each time is the data page.
Note: log files are not stored in this way, but a series of log records.
The database is divided into logical pages (8 KB for each page). In each file, all pages are consecutively numbered from 0 to X, where X is determined by the file size. You can specify a database ID, a file ID, and a page number to reference any data page. When we use the alter database command to expand a file, the new space will be added to the end of the file. That is to say, the page number of the first data page of the new space of the expanded file is x + 1. When we use the DBCC shrinkdatabase or DBCC shrinkfile command to contract a database, the page will be removed from the page with the highest page number (at the end of the file) in the database and moved to the page with a lower page number. This ensures that the page numbers in a file are always consecutive.
In SQL Server, the page size is 8 KB. This means that each MB of the SQL Server database has 128 pages. And so on. Based on the database file size, we can calculate the number of data pages in the database.
SQL Server 2005 has the following page types:
Page type |
Content |
Data |
When text in row is set to on, it contains the following options: Text, ntext, image, nvarchar (max), varchar (max), and varbinary (max) and all data rows other than XML data. |
Index |
Index entries. |
Text/Image |
Large Object Data Type: Text, ntext, image, nvarchar (max), varchar (max), varbinary (max), and XML data. Variable-length data type columns when the data row exceeds 8 KB: Varchar, nvarchar, varbinary, and SQL _variant |
Global allocation map, shared global allocation Map |
Information about whether a zone is allocated. |
Page free space |
Information about page allocation and available space of the page. |
Index allocation Map |
Information about the partitions used by tables or indexes in each allocation unit. |
Bulk changed Map |
Information about the region modified by the large-capacity operation after the last backup log statement in each allocation unit. |
Differential changed Map |
Information about the zones changed after the last backup database statement in each allocation unit. |
Structure of the data page:
Each page starts with a 96-byte header to store system information about the page. This information includes the page number, page type, available space of the page, and the allocation unit ID of the object that owns the page.
On the data page, data rows are placed in sequence followed by headers. The end of the page is a row offset table. For each row on the page, each row Offset Table contains an entry. The distance between the first byte of the corresponding row and the header of each entry record. The order of entries in the row Offset Table is the opposite to that of the row on the page.
For more information about data pages, see the following article:
Estimate the amount of space required to store data in the heap
Http://technet.microsoft.com/zh-cn/library/ms189124.aspx
You can also view the collected data: how to view the structure of the table data page
Http://blog.joycode.com/ghj/articles/113108.aspx
Support for large rows
In SQL Server 2005, rows cannot span pages, but some rows can be moved to the page where the trip is located. Therefore, the rows may be very large.
(For example, when one row contains multiple columns, some of the columns of this row are listed on data page A and some are listed on data page B)
The maximum data size and overhead of a single row of a page are 8,060 bytes (8 KB ). However, this does not include data stored in the text/image page type.
In SQL Server 2005Varchar,Nvarchar,VarbinaryOrSQL _variantColumns are not restricted.
When the total size of all rows of fixed and variable columns in the table exceeds the limit of 8,060 bytes, SQL Server dynamically moves one or more variable-length columns from the maximum-length column to the page in the row_overflow_data allocation unit.
This operation is performed whenever an insert or update operation increases the total size of a row to 8,060 bytes exceeding the limit.
After moving the column to the page in the row_overflow_data allocation unit, the 24-byte pointer is maintained on the original page in the in_row_data allocation unit.
If the row size is reduced in subsequent operations, SQL Server dynamically moves the column back to the original data page.
SQL Server Data Page Cache
SQL Server databases are mainly used to store and retrieve data. Therefore, intensive disk I/O is a major feature of database engines. In addition, completing disk I/O operations consumes a lot of resources and takes a long time. Therefore, SQL Server focuses on improving I/O efficiency. Buffer management is the key to achieving efficient I/O operations. The buffer management component of SQL Server 2005 consists of the following two mechanisms: the buffer manager used to access and update the database page and the buffer cache used to reduce the I/O of database files (also known as the "buffer pool ").
How buffer management works
A buffer zone is an 8 KB memory page with the same size as a data page or index page. Therefore, the buffer cache is divided into multiple 8 KB pages. The buffer manager reads data pages or index pages from database disk files into the buffer cache and writes the modified pages back to the disk. Pages are retained in the buffer cache until they are not referenced for a period of time, or the buffer manager needs the buffer to read more data. Data is rewritten to the disk only after being modified. You can modify the data in the buffer cache multiple times before writing it back to the disk.
Lab
Below is a simple experiment to see if you have mastered the above knowledge points:
Prepare the test environment
In an SQL 2005 database, execute the following script.
Simply put, two tables are created. Note that the two tables are stored in the nchar (2019) field and the nchar (2020) field. In the future, we will see the size of the storage space for these two tables under the same data. Because the basic unit of cache and physical storage is data pages, the physical storage size of this table is the same as that of all caches.
Then we fill in 20 data records for each table.
-- Create two test tables: Create Table [DBO]. [table_2019] ([data] [nchar] (2019) not null) Create Table [DBO]. [table_2020] ([data] [nchar] (2020) not null) Go -- fill data declare @ I intset @ I = 0 while (@ I <20) begin insert table_2019 (data) values ('') insert table_2020 (data) values ('') Select @ I = @ I + 1 endgo
Here we use the nchar data type because:
If the not null clause is specified, the nchar data type is a fixed-length data type.
If the length of the inserted value is smaller than that of the nchar not null column, spaces are filled on the right of the value until the column length is reached.
For example, if a column is defined as nchar (10) and the data to be stored is "Music", SQL server stores the data as "Music _____", "_" indicates a space.
Http://technet.microsoft.com/zh-cn/library/ms175055.aspx
In this way, the script for filling test data is very simple.
The space occupied by data rows is also very simple.
In addition, the two tables we created have no indexes, so they are all heaps. For estimated space required to store data in the heap, see the following article:
Http://technet.microsoft.com/zh-cn/library/ms189124.aspx
After completing the preparation, let's check the space occupied by the two items. In SQL Server Management studio, select test database, right-click the database, and select
Reports --> standard reports --> disk usage by top tables or disk usage by table, you can see the following statistics.
|
|
|
|
|
|
|
Disk usage by top tables: [ghj_demo] |
|
On GHJ1976-PC/sqlexpress at 9:21:33 |
|
|
|
|
|
|
|
|
|
|
This report provides detailed data on the utilization of disk space by Top 1000 tables within the database. |
|
|
|
|
|
|
|
|
Table Name |
# Records |
Reserved (KB) |
Data (KB) |
Indexes (KB) |
Unused (KB) |
DBO. table_2020 |
20 |
200 |
160 |
8 |
32 |
DBO. table_2019 |
20 |
136 |
80 |
8 |
48 |
The two tables have the same 20 records. Table_2020 table data occupies kb, that is, 20 data pages. Table_2019 table data accounts for 80 KB, that is, 10 data pages.
Why?
Only one data row can be put down on one data page of table_2020 table.
Only two data rows can be placed on one data page of The table_2019 table.
The field length of the two tables is only two bytes, but the physical storage is doubled.
References:
Deep Exploration of storage engines in SQL Server databases
Http://tech.ccidnet.com/art/1106/20070320/1040665_3.html
An electronic part of Microsoft SQL Server 2005 Technical Insider: storage engine
Http://book.csdn.net/bookfiles/504/10050417350.shtml
Description of "pages and zones" in msdn
Http://technet.microsoft.com/zh-cn/library/ms190969.aspx
Clustered Index Structure
Http://technet.microsoft.com/zh-cn/library/ms177443.aspx
Row overflow data exceeds 8 KB
Http://technet.microsoft.com/zh-cn/library/ms186981.aspx
Buffer Management
Http://technet.microsoft.com/zh-cn/library/aa337525.aspx
Estimate heap size
Http://technet.microsoft.com/zh-cn/library/ms189124.aspx
Nchar and nvarchar (TRANSACT-SQL)
Http://technet.microsoft.com/zh-cn/library/ms186939.aspx
"Alibaba SQL Server always-on tech-nologies: Advanced Indexing policy" recorded by Wu jiazhen at teched 2007:
Http://msevents.microsoft.com/CUI/EventDetail.aspx? Eventid = 1032364059 & Culture = ZH-CN
Note: This page is marked with "SharePoint 2007 website performance tuning", but it is actually an advanced index policy. Microsoft has mistaken files, which makes me look at them one by one, which is the required video.