How pages in SQL Server affect database performance (RPM)

Source: Internet
Author: User

Regardless of the database, if you want to optimize the performance of the database, you must understand the storage structure inside the database. Otherwise, a lot of database optimization work can not be expanded. For the database administrator, although the memory storage structure of the learning database is rather monotonous, it is a fortress that we must attack. In a SQL Server database, a data page is the most basic unit of its storage. The system operates on a page, whether it is saving data or reading data.

  

  First, the basic composition of the data page.

As shown, is the main component of the pages in the SQL Server database. As you can see from this diagram, a data page basically consists of three parts: header, data row, and row offset. Where the data rows are stored in the data itself, the other headers and offsets are some ancillary content. For this data page, I think the database administrator must understand the following content.

One is to understand the size of the data page. The size of the data pages in the SQL Server database is basically fixed, meaning that each data page has a size of 8kb,8192 bytes. There is a header at the beginning of each page that occupies 96 bytes to store information about the page. Information such as this page is assigned to the page number, the type of page, the available space on the page, and the allocation unit ID of the object that owns the page. Fortunately, these content databases are automatically managed and updated without the need for database administrators to worry about. What the database administrator needs to know is that there is a maximum amount of space in the data page that can be used to hold data. The size of each page is 8,192 bytes, minus some of the necessary overhead (such as header information or the amount of space occupied by the offset), which can be used to actually store the data only about 8000 bytes in space. Keeping this figure in mind is a great contribution to the optimization of subsequent database performance. Detailed content The author will explain in the subsequent line overflow section.

Second, we need to pay attention to the order of row placement. On each data page, the data rows are placed in order, followed by the headers. There is a row offset table at the end of the page. For each row in the page, each row offset table contains an entry. In this case, if the data row in the fruit industry reaches 100, then the 100 entries are in the Line offset table. The distance between the first byte of the corresponding row and the top of the page is recorded in each entry record. The second jump records the first letter of the second row of data to the top of the page at the beginning of a data page. Because the size of each data row is different, it is not regular for this row to have the contents of the record in the offset table. It is important to note that the order of entries in the row offset table is the opposite of the order of the rows in the page. This is primarily for easier database positioning of data rows.

  Two, big data types and rows.

According to the rules defined by the SQL Server database, the rows are not able to spread across pages. As shown, if the data value of a field is very large, it exceeds 8000 bytes. At this point a page has not been able to accommodate this data. What will the database do now? Although in SQL Server database, rows are not able to spread across pages. However, the rows can be divided into two parts, each stored in separate rows. So, for big data types, it's not limited by the size of the page (or the size of the row). According to the above analysis, it can be seen that a data page can use the maximum storage space at 8KB. If you deduct some of the necessary overhead, it's only about 8000 bytes. When all the columns of a record, including fixed-length columns and variable-length columns, are larger than this limit, the database is processed on a branch-by-row basis, stored on two different pages. When the total size of a column in a table exceeds the limit of 8KB (actually less than one o'clock) bytes, the database system dynamically moves one or more variable-length columns to another page, starting at the maximum-length column. Simply put, the part that exceeds a column is stored separately in another page. It also stores information such as pointers to create associations in records on different pages. This behavior takes a name in the SQL Server database called a row overflow.

  Third, the negative impact of row overflow on database performance.

Having mastered the basic workings of the data page above, the database administrator needs to focus on understanding the adverse impact of row overflow on database performance. It is important to understand that when the cumulative length of all columns, including fixed-length columns and variable-length columns, exceeds the maximum tolerance of a data page (or a row of data), the contents of the column are placed in a branch. Does the database handle this, adversely affecting the performance of the database? If so, how to avoid it?

In general, when the record of each row exceeds the maximum capacity of the page, it will certainly adversely affect the performance of the database. There is no doubt about it. Because when this capacity is exceeded, the database system needs to be paged over this data row. Paging requires additional overhead for the database. If the page is saved, you need to add additional pointers to the database, when querying the data, due to the existence of paging, in order to read a complete record, the database system may have to read the content of multiple pages, when the update operation, the content of a field is shortened, resulting in the entire line of content within the maximum page range, The related records are saved in the same row. These operations require additional overhead for the database. When these jobs are processed at the same time, the impact on database performance can be significant. Similarly, if the related records are sorted, counted, and so on, because of the multiple pages involved, it will prolong the execution time of these jobs, that is, reduce the performance of the database.

The second thing to note is the restrictions on some variable-length fields. In the Sqlservre database, it also contains a variable length data type such as varchar. There is a maximum length limit for this in the SQL Server database. In general, its maximum length cannot exceed the limit of 8000 bytes that cannot be exceeded. However, their total width can exceed this 8KB limit. If the data length of a single column exceeds this limit, then the normal data type cannot be used. For data that is used to save images or multimedia, you must use large object data types. Because only these large object data types are not limited by this length. The database has special handling methods for these large database type objects.

  Four, the database design considerations.

When the database is running, if there are more row overflow phenomena, it will affect the performance of the database to a great extent. This is why you need to consider this when designing your database. The general data type does not cause a row overflow condition. There are only a few varchar nvarchar or CLR user-defined types of columns that are more prone to this line overflow phenomenon. Therefore, when designing a database, the database administrator should analyze the percentage of possible row overflow phenomena based on the user-supplied boilerplate data, and evaluate how often the overflow will occur. If the overflow occurs at a higher percentage or frequency, then the database administrator needs to consider normalizing the table to improve the performance of the database and reduce the adverse effects of overflow on the database.

In general, there are two ways to significantly reduce the impact of this row overflow on database performance. If a column defines a data type such as varchar or a user-defined data type, if it is longer and is likely to cause a row overflow, then use the large object data type altogether. For large object data types, SQL Server database takes a special management approach, and this data is managed separately from normal data. Therefore, the effect of row overflow on database performance can be reduced to a large extent. However, it is important to note that the database itself requires more effort and resources to manage these large object data types. Therefore, the benefits brought by this way, and the loss caused by the overflow phenomenon will have a weight of the problem. The database administrator evaluates the resulting revenue to compensate for the loss caused by the row overflow object. If it can be remedied, then this scheme can be used. If not, then it is not worth the candle. Therefore, the author is not very recommended to use this method. The author is now using the following to describe this approach.

The second method is relatively simple to execute and has a relatively strong enforceability. That is, if a table has a varchar or user-defined data type, and its maximum length is longer, it is easy to cause row overflow phenomenon. It is a good idea to keep these columns separate from the other columns in the table. They will be placed in two different tables. The join statement is then connected. Because the data page has a limit on the maximum length of a single column, there is no way that the row overflow will occur if this is handled. If you need to query the full record, you also need to access multiple pages. But in the actual work, often do not need to access all the information. If you do not need to update the fields of the varchar data type when updating or statistical operations, the database efficiency will be greatly improved. Access to multiple pages is required even if you need access to the full record. However, it is better to take a join operation than the row overflow operation. If the varchar column is shortened when the data is updated, there is no problem with merging rows because of the two different tables. So you can save a lot of database overhead. Obviously, this sort of table-handling method is simpler and easier to operate. Therefore, I strongly recommend this approach to avoid the adverse impact of row overflow on SQL Server database.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.