Thoughts on the selection of table column data types

Source: Internet
Author: User
Introduction

In SQL Server, the selection of data types in each column in each table is usually very simple, but the effects of different data types on performance are slightly different. This articleArticleExplore the selection of data types in SQL Server table columns.

 

Basic knowledge of data storage

In SQL Server, data is stored in pages. Eight pages are a zone. A page is 8 K and A zone is 64 K, which means that 1 MB of space can accommodate 16 zones. 1:

Figure 1.page and area in SQL Server

 

1 (PS: Windows built-in drawingProgramAs you can see, the allocation units in SQL Server are divided into three types: in_row_data for storing row data, lob_data for storing lob objects, and row_overflow_data for storing overflow data. The following uses a more specific example to understand the three allocation units.

Create the table shown in 2.

Figure 2. Test Table

 

As shown in the test table in Figure 2, data is inserted so that the length of each row exceeds the maximum length of 8060 bytes per page. This not only results in row overflow (row_overflow_data), but also needs to store the lob page. The test insert statement and the allocation situation 3 shown through DBCC ind.

Figure 3. Page allocated for rows exceeding 8060 bytes

 

Except for the iam page, three pages are required for storing this line of data. The first is the lob page, which is designed to store binary files in the database. When this type is listed, a 24-byte pointer is stored in the original column, the specific binary data is stored in the lob page. In addition to text, varbinary (max) also exists in the lob page. Then there is an overflow row. in SQL Server 2000, a row exceeding 8060 bytes is not allowed. This feature is improved in Versions later than SQL Server 2005, using varchar, for data types such as nvarchar, when the row size does not exceed 8060 bytes, all data in the row exists in-row data. When the amount of data stored in the varchar exceeds 8060 bytes, the additional part will be stored in the row-overflow data page. If the column update reduces the row size to less than 8060 bytes, all rows will return to the in-row data page.

 

Data Type Selection

After learning some basic knowledge. We know that SQL Server reads data in pages. fewer pages not only mean less Io, but also less memory and CPU resource consumption. Therefore, the main purpose of data selection is:

Minimize the size of each row

This sounds simple, but you still need to know more about the data types of SQL Server.

For example, to store int-type data, according to business rules, bigint is not used when int is used, smallint is not used when smallint is used, and smallint is not used when tinyint is used.

Therefore, to make the data in each row smaller, the data type that occupies the smallest byte is used.

 

1. For example, do not use the datetime type, but use a more precise type based on the business, as shown in the following table:

Type Bytes occupied
Date (date only) 3
Time (only time) 5
Datetime2 (time and date) 8
Datetimeoffset (plus time zone) 10

 

2. Replace text, ntext, and image types with varchar (max), nvarchar (max), and varbinary (max ).

Based on the previous basic knowledge, we can know that for the Text, ntext, and image types, as long as each column is not null, an additional lob page needs to be allocated even if it occupies a small amount of data, this undoubtedly occupies more pages. For data types such as varchar (max), when the data volume is small, the data in-row-data can meet the requirements without additional lob pages, lob pages are allocated only when data overflow occurs. In addition, varchar (max) and other types support string operation functions such:

    • Col_length
    • Charindex
    • Patindex
    • Len
    • Datalength
    • Substring

 

3. For columns that only store numbers, use the numeric type instead of varchar.

Because the numeric type occupies less storage space. For example, to store 123456789, only four bytes are needed to use the int type, and 9 bytes are required to use varchar (this does not include varchar, but also occupies four bytes of record length ).

 

4. if not necessary, do not use nvarchar, nchar, and other data types stored in units of words. This type of data requires more storage space than varchar or char.

 

5. Selection of char and varchar

There are actually some such comparisons. If you are too lazy to remember, varchar is the right option in most cases. We know that the storage space occupied by varchar is determined by its storage content, while the storage space occupied by char is determined by the length of its definition. Therefore, the length of a char occupies the defined space no matter how much data is stored. Therefore, if the column stores fixed-length data such as zip code, select Char. Otherwise, it would be better to select varchar. In addition, varchar occupies several bytes of storage length compared to Char. Let's make a simple experiment.

First, we create a table. There are only two columns in this table, one int type column, and the other type is defined as char (5). Insert two test data into the table, then, you can use DBCC page to view the structure of the page, as shown in figure 4.


Figure 4. Use the char (5) type. Each row occupies 16 bytes.

 

Next let's change it to varchar (5). The page information is shown in Figure 5.

Figure 5. varchar (5), each row occupies 20 bytes of space

 

Therefore, it can be seen that varchar requires an additional 4 bytes to record its content length. Therefore, when the actual column storage content is less than 5 bytes in length, using char instead of varchar will save more space.

 

Use of null

The use of null is also slightly controversial. Some people do not recommend that null be allowed. All values are not null + default. This is because SQL server does not use the three-value Logic (true, false, unknown) for comparison, but uses the binary logic (true, false ), in addition, the isnull function is no longer required to replace the null value during query.

However, this also raises some problems. For example, when an aggregate function is used, the null value is not involved in the operation, and the use of the not null + default value requires exclusion.

Therefore, the use of null must be based on specific services.

 

Use sparse)

Sparse Columns are common columns that use optimized storage for null values. Sparse Columns reduce the space requirement for null values, but the cost is that the overhead for retrieving non-null values increases. Sparse Columns should be considered only when at least 20% to 40% space is saved.

Set Sparse Columns in SSMs 6.

Figure 6. Sparse Columns

 

For more details about how to save space for sparse columns, see msdn.

 

Primary Key Selection

Primary key selection is the top priority of table design, because the primary key is not only related to the business model, but also to the efficiency of table data operations (because the primary key will be in the non-leaf node of the B tree, ). I have already published an article about primary key selection: From the Perspective of performance, I will not elaborate on the selection of SQL Server clustered index keys.

 

Summary

This article explores the selection of data columns when designing tables. A good table design can not only meet business needs, but also optimize performance.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.