SQL Server Enterprise Platform Management Practice reading notes--sql data file space usage and management in server
1. Table and index storage structure
Before SQL Server2005, a table was stored in a B-tree or a heap. Each B-tree or heap has a record corresponding to the sysindexes. After SQL Server2005, introduced the concept of partitioned tables (table Partition), in the storage organization, the existing partition basically replaced the original table concept, the original table concept became a logical concept. A partition is a B-tree or a heap. A table is a combination of one to multiple partitions.
1.1 Table data pages stored in a clustered index with a B-tree
If a table has a clustered index (Clustered index), the data rows are stored sequentially based on the clustered index key. The clustered index is implemented by the B-tree index structure, which supports fast retrieval of rows based on clustered index key values. The data pages are closely connected by a doubly linked list.
1.2 heap is a table with no clustered index
If there are no clustered indexes on the table, the data rows are not stored in any order, and the data pages do not have any special order. There are no link tables between data pages.
1.3 Nonclustered indexes
Nonclustered indexes have a similar B-tree index structure to the clustered index. The difference is that nonclustered indexes do not affect the order of data rows. The leaf level contains only index rows and no complete data. Each index row contains nonclustered index key values and row locators. A locator points to a data row (in another B-tree or heap) that contains a key value. Nonclustered indexes themselves also consume some data pages. These pages are linked in a doubly linked list.
Sys.partitions provides an object for tables and indexes, each of which occupies one line of information, divided into the following categories:
1, Index_id=0, this line record is the heap table information
The First_iam_page column in sys.system_internals_allocation_units points to the IAM chain that aggregates the heap data page I aggregations in the specified partition. Because these pages are not connected, it is not possible to find the next page from the first page, so SQL Server can only use IAM pages to find each page in the data page collection.
2. Index_id=1, which represents the clustered index of a table or view
The Root_page column in sys.system_internals_allocation_units points to the top of the clustered index B-tree within the specified partition. SQL Server uses the index B-tree list to find each data page in the page from the top page.
3. Index_id>1, nonclustered indexes created for a table or view
The Root_page column in sys.system_internals_allocation_units points to the top of the nonclustered index B-tree within the specified partition.
If there is another row in sys.partitions for each table that has LOB columns (image, varchar (max), text), and its index_id>250, it is used to manage the LOB pages.
The First_iam_page column points to the IAM page chain that manages the pages in the Lob_data allocation unit.
In short, from an object index_id can determine what type of storage, if it is 0, it is indicated that the table does not have a clustered index, if it is 1, is the clustered index page, if greater than 250, is the text or image field, if between 2--250, is the nonclustered index page.
Heap structure
A heap is a table that does not contain a clustered index. SQL Server uses the Index allocation Mapping (IAM) page to associate the pages of the heap. The heap features several of the following:
1. The data pages and rows in the heap do not have any particular order.
The data in a heap is completely stored randomly. and SQL Server assumes that there is no connection between the data.
2, the page is not connected together
The only logical connection between data pages is the information that is recorded within the IAM page. There is no close connection between the page and the page.
3, the rows in the heap are not returned in the order in which they were inserted
Because IAM labels the data pages in the order they exist in the data file, this means that the heap scan will follow each file. Instead of the order in which these lines are inserted, or in any logical order.
, showing how the SQL Server database engine uses an IAM page to retrieve data rows from a heap with a single partition.
As you can see, SQL Server is simpler for the manager of the heap. It is also more fragile in algorithm capability. Not talking about performance, light from the data storage management, it is difficult to manage a large table with heaps. So in SQL Server, a clustered index is built on all large, frequently used tables. Because clustered indexes can avoid many problems.
Clustered index Structure
In SQL Server, indexes are organized by the B-tree structure. Each page of the index B tree is called an index node. The top node of the B-tree is called the root node. The underlying node in the index is called a leaf node. Any index level between the root node and the leaf node is collectively referred to as the intermediate level. Each index row contains a key value and a pointer to a data row in an intermediate page or page-level index on the B-tree. Pages for each level of index are linked in a doubly linked list.
The pages and rows within the data chain are sorted by the clustered index key value. All insert operations are performed when the key value in the inserted row matches the sort order in the existing row. The B-Tree page collection is positioned by the page pointer in the sys.system_internals_allocation_units system view.
For a clustered index, the root_page column in sys.system_internals_allocation_units points to the top of a particular partition for that clustered index. SQL Server moves down in the index to find the row that corresponds to a clustered index key. To find the scope of a key, SQL Server moves through the index to find the starting key value for the range, and then scans the data page with a forward or backward pointer. To find the first page of a data page chain, SQL Server scans from the root node of the index along the leftmost pointer.
The clustered index has the following characteristics relative to the heap:
1. The data pages and rows in the heap are in strict order.
The clustered index ensures that the table's data is arranged in the order of the index rows. and SQL Server knows this order relationship.
2, the page links together. The page is closely linked to the page.
3. The rows of tree species can generally be returned in the order of the indexed columns.
So from this point of view, after the establishment of the B-tree, SQL Sever to the data page management can be more rapid and effective, some of the problems occurred on the heap is not easy to happen on the B-tree, performance is much higher
Nonclustered index structure
Nonclustered indexes have the same B-tree structure as clustered indexes, and the significant difference between them is the following two points:
1. Data rows of the underlying table are not sorted and stored in the order of the nonclustered keys
2. The page layer of a nonclustered index is made up of index pages that are not composed of data pages
3. A table that builds a nonclustered index can be a B-tree or a heap.
4. If the heap is represented (meaning that the table does not have a clustered index), the row locator is a pointer to the row. The pointer is generated by the file identifier (ID), the page number, and the row count on the page. The entire pointer is called the row ID (RID).
5. If the table has no clustered index or a clustered index on the indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server adds internally generated values (called unique values) so that all duplicate keys are unique. SQL Server retrieves a data row by using the clustered index key stored in a leaf row of a nonclustered index to search for a clustered index.
Therefore, nonclustered indexes do not change or improve the way data pages are stored. Its B-number structure is only for its own index pages. If the problem is caused by the property of the heap, adding a nonclustered index does not lead to a fundamental improvement.
Creating a clustered index on a table does not increase the size of the table, but increasing the nonclustered index increases the space, but if you create a clustered index on a frequently changing table, it is easy to run into page split, so trying to clustered the index can affect performance. With this in mind, many designers are reluctant to build clustered indexes on tables in SQL Server, but a non-indexed table performance is very poor, so the nonclustered indexes are added to expect good performance.
But this scheme is a waste of space, performance is not necessarily good design, this SQL Server2005 made a comparison, the conclusion is: There is a clustered index table in the SELECT, Update, delete these actions performance has a great increase, More importantly, on the INSERT, there is no difference between the two. There is no phenomenon that the clustered index affects the insert speed. Therefore, it is strongly recommended to build a clustered index on a large table.
The difference between delete and truncate
1. The delete command does not completely release the data structure of the table or index and the page they are requesting, especially on the heap table. In later versions of SQL Server2005, the data for the tree structure is a little better for page release, but it cannot be completely freed. and truncate can completely release it.
2. There is less transaction log space, because the DELETE statement deletes one row at a time and records an entry for each row that is deleted in the transaction log, TRUNCATE table deletes the data by releasing the data page used to store the data, and only the action is logged in the transaction log, not each row.
3. The locks used are usually less, and when the DELETE statement is executed with a row lock, the rows in the table are locked for deletion, TRUNCATE table always locks tables and pages, and is faster
4. The table will not retain any pages without exception, and after the DELETE statement is executed, the table still contains empty pages. For example, you must have at least one exclusive (lck_m_x) table lock to free empty pages in the heap. If you do not use a table lock when you perform a delete operation, many empty pages will be included in the table (heap). For indexes, deletions leave some empty pages, although these pages are clearly and quickly released through the background.
TRUNCATE table deletes all rows of rows in the table, but the table structure and its columns, constraints, indexes, and so on, remain unchanged. If you delete the table definition and its data, use the DROP TABLE statement.
So for a timely deletion of data, and then free up space, you can take the following methods:
1. Create a clustered index in a table
2. If all data is not available, use TRUNCATE table instead of delete
Delete does not completely free up space, will it cause space leakage? You don't have to worry about it, but these pages will be reused when the tables are inserted into the new data, although they are not released. So these pages are not "leaked" and will be left to SQL Server for reuse.
If you really want to use the DELETE statement, if the table has a clustered index, rebuild the index can be freed out of the space, but if not, you can rebuild a new table, the data from the old table, and then delete the old table, free space, or you can create a new clustered index on such a table, so a little toss, Of course, sometimes there is no need to release the space, you can wait for the new data to be inserted, the direct use of the line.
SQL Server Enterprise Platform Management Practice reading notes--sql data file space usage and management in server