SQL Server 2005 SP2 brought us the vardecimal function. At that time, a new storage format-vardecimal was introduced for the decimail and numeric data types. The vardecimal storage format allows storage of the decimal and numeric data types as a variable length column. This feature allows the original fixed-length decimal data to be stored in a variable-length format in the data file. It is said that this feature can save 30% of the space for a typical data warehouse, SQL Server 2008 further enhances the data compression function. SQL Server 2008 now supports row compression and page compression. The data compression option can be enabled on the following objects:
1. Tables with no clustered index created
2. Create a clustered index table
3. Non-clustered index (setting the compression option on the table does not affect the non-clustered index on the table, so the compression of the clustered index needs to be set separately)
4. Index View
5. Single partition in the partition table and partition index
Meaning of data compression:
First, let's take a look at why Microsoft is struggling to implement and continuously improve the data compression technology in SQL Server as storage costs continue to decrease?
Although the storage cost is no longer the primary consideration in the traditional sense, it does not mean that the size of the database is not a problem, because the size of the database will not only affect the storage cost, it is also greatly associated with management costs and performance issues.
Why is there a management cost? Because the database needs to be backed up, the larger the size of the database, the longer the backup time, of course, the other point is that the backup hardware cost will also increase accordingly (including the required backup media cost and the procurement cost brought by more advanced backup equipment to meet the backup window ), another management cost is the database maintenance cost. For example, we often need to complete DBCC tasks. The larger the database size, we need more time to complete these tasks.
Then let's look at the performance issues. SQL Server reads data from a disk in units of data pages. Therefore, if a data page contains more data lines, the more data SQL Server obtains in one data page IO, which improves the performance.
Finally, considering the storage cost, according to the original vardecimal compressed data in SQL Server 2005 SP2 as an example, 30% space saving means 30% storage cost, according to the test data currently released by SQL Server 2008, the new data compression technology can reach the storage rate of 2-7x. In addition, if enterprises need to consider the storage space increased by disaster tolerance, the storage hardware cost saved in this way will also be considerable.
Data compression method:
The compression options in SQL Server 2008 can be set by Option when creating a table or index. For example:
Create table TestTable (col1 int, col2 varchar (200) WITH (DATA_COMPRESSION = ROW );
To change the compression option of a partition, use the following statement:
Alter table TestTable rebuild partition = 1 WITH (data compression = PAGE );
To set different compression options for each partition in a partition table, use the following statement: (SQL Server 2008 can use different compression options for different partitions, this is very important for Data Warehouse applications, because fact tables in the data warehouse usually have one or more hot partitions, and the data in these partitions often needs to be updated, to avoid additional processing loads caused by data compression for data updates on these partitions, you can disable the compression option for these partitions)
Create table PartitionedTable (col1 int, col2 varchar (200 ))
ON PS1 (col1)
WITH (
DATA_COMPRESSION = row on partitions (1 ),
DATA_COMPRESSION = page on partition (2 TO 4 ));
If you want to set the compression option for an index, you can use:
Create index IX_TestTable_Col1 ON TestTable (Col1) WITH (DATA_COMPRESSION = ROW );
If you want to modify the compression option of an index, you can use:
Alter index IX_TestTable_Col1 ON TestTable rebuild with (DATA_COMPRESSION = ROW );
SQL Server 2008 also provides a stored procedure named sp_estimate_data_compression_savings to help the DBA estimate the size of the object after the compression option is activated.
How data compression works:
For row compression, SQL Server 2008 uses the following three methods to save storage space:
Reduces the metadata overhead associated with the record. This metadata contains information about the column, column length, and offset. In some cases, the metadata overhead may be greater than the old storage format.
It uses variable-length storage formats for numeric types (such as integer, decimal, and float) and numeric-based types (such as datetime and money.
It stores fixed-length strings by using a variable-length format that does not store null characters.
For page compression, SQL Server 2008 uses one data page in sequence:
1. Row compression
2. Prefix compression
3. Dictionary compression
Precautions for using data compression:
Although the data compression function of SQL Server 2008 is very valuable, you still need to pay attention to the following issues:
Data compression is only available for enterprise and development editions.
Data compression allows a data page to store more data rows, but does not change the limit that a single row can contain up to 8060 bytes.
When a clustered index is created on a table with data compression configured, the clustered index inherits the compression option of the original table by default.
When Page compression is set for a table without a clustered index, page compression can be achieved only in the following cases:
1. Add data to the table using the bulk insert syntax
2. Add data to the table using the insert into... WITH (TABLOCK) syntax
3. Run the alter table... REBUILD command with page compression options.
Changing the compression option on a table without a clustered index causes all non-clustered indexes on the table to be rebuilt, because the data row addresses pointed to by these non-clustered indexes have all changed.
The size of the temporary space required for changing the compression option is the same as that required for creating an index. Therefore, for partitioned tables, we can set compression options one by one to reduce the pressure on the temporary space.
Since data compression technology in SQL Server 2008 is actually a superset of vardecimal technology in SQL Server 2005 SP2, it is unnecessary to retain vardecimal after data compression is set. Of course, to maintain backward compatibility, SQL Server 2008 retains vardecimal in the current version. However, the next version of SQL Server 2008 may discard the vardecimal option, therefore, the database with these settings should be changed to the data compression settings as soon as possible.
SQL Server 2008's compression options work on the storage engine layer, which is transparent to other SQL Server components, therefore, when we use bulk load to import external data to SQL Server, it will significantly increase the CPU workload and export the compressed data table to an external file, it may consume a lot more space than the original one.
I believe that my friends have a comprehensive understanding of data compression technology in SQL Server 2008.