Data table compression in SQL Server 2008 detailed introduction to _mssql2008

Source: Internet
Author: User
Tags bulk insert create index numeric

SQL Server SP2 brings us the vardecimal feature, when a new storage format--vardecimal was introduced for decimail and numeric data types. The vardecimal storage format allows storage of the decimal and numeric data types as a variable-length column. This feature allows the originally fixed-length decimal data to be stored in a variable-length format in the data file, which is said to save 30% of the space for a typical data warehouse, and SQL Server 2008 further enhances the data compression capability on this basis. SQL Server 2008 now supports two options for row compression and page compression, and data compression options can be enabled on the following objects:

1. Table with no clustered index created
2. Table to create a clustered index
3. Nonclustered indexes (setting the compression option on the table does not affect nonclustered indexes on the table, so the compression of the clustered index needs to be set separately)
4. Indexed view
5. A single partition in a partitioned table and a partitioned index

The meaning of using data compression:

Let's take a look at why Microsoft is also taking pains to implement and continuously improve data compression technologies in SQL Server today as storage costs continue to fall.

Although storage costs are no longer the primary consideration in the traditional sense, this does not mean that database size is not a problem, since database dimensions are significantly associated with management costs and performance issues, in addition to the storage costs.

Why is there a problem with managing costs? Because the database needs to be backed up, the larger the size of the database, the longer the backup time will be, and, of course, the additional cost of the backup hardware that will be consumed (including the cost of the backup media required and the purchase cost of a more advanced backup device to meet the backup window), Another management cost is the cost of maintaining the database, such as the DBCC tasks we often need to complete, and the larger the database size, the more time we need to complete these tasks.

Then we'll look at the performance problem. SQL Server reads data on a data page when it scans the disk, so if the more rows of data are contained in a single data page, the more data the SQL Server obtains in the data page IO, the higher the performance.

Finally, considering the cost of storage, as in the case of vardecimal compressed data from the original SQL Server SP2, 30% of space savings means 30% storage costs, and according to the test data currently released by SQL Server 2008, The use of new data compression technology to achieve 2x-7x storage rate, plus if the enterprise to consider the disaster and increased storage space, so the cost of storage hardware savings will be significant.

Using Data compression methods:

The compression options in SQL Server 2008 can be set by option when a table or index is created, for example:

Copy Code code as follows:
CREATE TABLE testtable (col1 int, col2 varchar) with (data_compression = ROW);

If you need to change the compression options for a partition, you can use the following statement:
Copy Code code as follows:
ALTER TABLE testtable REBUILD PARTITION = 1 with (DATA COMPRESSION = PAGE);

If you need to set different compression options for each partition of the partition table, you can use the following statement: (SQL Server 2008 can use different compression options for different partitions, which is important for data warehousing applications because the fact tables of the data warehouse usually have one or several hot partitions, The data in these partitions is often updated, and the compression options can be turned off to prevent data compression from causing additional processing load to the data updates on these partitions.
Copy Code code as follows:
CREATE TABLE partitionedtable (col1 int, col2 varchar (200))
On PS1 (col1)
With (
Data_compression = ROW on Partitions (1),
Data_compression = PAGE on PARTITION (2 to 4));

If you are setting compression options for an index, you can use:

Copy Code code as follows:

CREATE INDEX ix_testtable_col1 on TestTable (Col1) with (data_compression = ROW);

If you are modifying the compression options for an index, you can use:

Copy Code code as follows:
ALTER INDEX ix_testtable_col1 on TestTable REBUILD with (data_compression = ROW);

SQL Server 2008 also provides an object size named Sp_estimate_data_compression_savings stored procedure to help DBA estimate the activation compression option.

The principle of using data compression:

For row compression, SQL Server 2008 uses the following three methods to conserve storage space:
Reduces the metadata overhead associated with records. This metadata is information about the column, column length, and offset. In some cases, the metadata overhead may be greater than the old storage format.

It uses variable-length storage formats for numeric types (for example, Integer, Decimal, and float) and numeric based types (for example, DateTime and money).

It stores fixed-length strings by using variable-length formats that do not store empty characters.
For page compression, SQL Server 2008 is followed by one data page:

1. Line compression
2. Prefix compression
3. Dictionary compression

Using Data compression considerations:

Although the data compression capabilities of SQL Server 2008 are valuable, there are a few issues to be aware of:
The data compression feature is available only in enterprise and development editions

Data compression allows a single data page to store more rows of data, but does not change the limit of 8060 bytes for a single row of data.

When a clustered index is created on a table that has data compression set, the clustered index inherits the compression options on the original table by default

When you set up page compression on a table that does not set a clustered index, the actual effect of page compression is achieved only when:

1. Data is added to the table using the BULK INSERT syntax
2. Data using INSERT INTO ... With (TABLOCK) syntax added to the table
3. Execute alter TABLE with page compression options ... Rebuild command

Changing the compression option on a table that does not set a clustered index causes all nonclustered indexes on the table to be rebuilt because the data row addresses that the nonclustered indexes point to are already changed.

The amount of temporary space required to change the compression option is the same as the space required to create the index, so for partitioned tables, we can set the compression options individually to reduce the demand pressure on the temporary space.

Since data compression in SQL Server 2008 is actually a superset of the vardecimal technology in SQL Server SP2, there is no need to preserve vardecimal after data compression has been set. Of course SQL Server 2008 retains vardecimal in the current release in order to maintain backward compatibility, but the next version of SQL Server 2008 and the vardecimal option may be discarded. So the database that made these settings should be changed to data compression settings as soon as possible.

SQL Server 2008 's compression option is working on the storage Engine layer, which is transparent to other parts of SQL Server, so when we import the outside data into SQL Server using bulk load, the CPU workload is significantly increased. When you export a compressed datasheet to an external file, you may consume much more space than you originally would.

I believe here, my friends. There is a more comprehensive understanding of the data compression technology in SQL Server 2008.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.