SQL SERVER->> Data Compression

Source: Internet
Author: User

Recently made a project about data compression, to change all the table objects under the entire SQL Server server to page compression. So take this opportunity to learn about SQL Server compression technology.

This article is almost a complete instruction manual.

https://technet.microsoft.com/en-us/library/dd894051 (v=sql.100). aspx

Of course, there's a technology wiki page.

Https://msdn.microsoft.com/en-us/library/cc280449.aspx

That's a lot to see, here's a summary:

1) There are two types of compression in SQL Server: Row Compression and Page Compression

2) row compression The fixed-length data into a variable-length format, and then each column takes 4 bits (bits) to record the length of the data, and null and 0 have no effect on the data storage space except for the 4 bits. This is the case, but it is still not very understanding. However, page compression is easy to understand, prefix compression and dictionary compression technology. First, the prefix compression, and then the entire page of the dictionary compression to achieve the maximum compression ratio.

3) Want to know whether a database has been applied compression, check this dmv:select * from Sys.dm_db_persisted_sku_features;

4) Estimated space savings (estimated savings)

This stored procedure can be used to estimate how much database storage space a single table can possibly save: sp_estimate_data_compression_savings. The entire calculation process is sampled and placed under tempdb instead of a full table scan.

5) What data compression ratio is the most effective? Null value, most of the space is not used for fixed-length data (characters or numbers), the prefix value of high repetition rate data.

6) compression does not work for LOB, Row_overflow data pages, and FILESTREAM data

7) Compressed data pages from disk read into the memory is not pressure, the data page can only be involved in sorting (sort), connection (join), filtering (filter), or be updated to be uncompressed

8) The process of decompression is not to say in memory to copy out another piece of data to decompress, but the CPU is calculated after decompression.

9) Compression Technology for table scanning is obviously larger than the search, so as to reflect the advantages of compression to IO optimization in large storage space

10) Then, in a logical sense, compression boosts IO and memory at the expense of the CPU. The CPU cost of row compression is obviously lower than page compression, but the compression ratio of page compression is higher. The article said that row compression is only more than the uncompressed CPU more than ten percent, I have not actually tested.

11) for page Compression, it is recommended to use a large table with fewer updates, and a low CPU footprint on the server. And although the data pages are extracted before they are involved in sorting (sort), connection (join), filtering (filter), or being updated, the article mentions that if the query statement itself has complex aggregation run logic and connections, the effect of compression on it will not be a big one. In fact, according to the truth, it can be said that the compression technology even consumes more CPU resources, but in the premise of a large compression ratio, even if the participation of data in the aggregation or connection need to be decompressed, but all this is in memory, Compared to the non-compressed pages need to be re-read into the memory (in general, if it is a data warehouse, the data page is difficult to stay in memory for too long), I want to be more cost-effective, after all, memory read and write ability and disk read and write ability is not comparable.

12) is that the decision to row Compression or page Compression? The principle is U (update) and S (scan). The more frequently updated tables are suitable for row Compression, the more tables that participate in the query are appropriate for page Compression. But I think it's more about the size of the data. For example, if the data volume of a table is very large, the data volume of 10 g or hundreds of millions of rows of data rows, the Page compression the advantage of high compression ratio is too big. The following statements can detect the frequency of table updates and scans in the database

--UpdateSELECTO.name as [table_name], X.name as [index_name], I.partition_number as [Partition], i.index_id as [index_id], X.type_desc as [Index_type], I.leaf_update_count* 100.0 /(I.range_scan_count+I.leaf_insert_count+I.leaf_delete_count+I.leaf_update_count+I.leaf_page_merge_count+i.singleton_lookup_count) as [percent_update] fromSys.dm_db_index_operational_stats (db_id(),NULL,NULL,NULL) IJOINSys.objects o onO.object_id =I.object_idJOINSys.indexes x onX.object_id =I.object_id  andx.index_id=i.index_idWHERE(I.range_scan_count+I.leaf_insert_count+I.leaf_delete_count+Leaf_update_count+I.leaf_page_merge_count+I.singleton_lookup_count)!= 0 and ObjectProperty(i.object_id,'isusertable')= 1ORDER  by [percent_update] ASC--ScanSELECTO.name as [table_name], X.name as [index_name], I.partition_number as [Partition], i.index_id as [index_id], X.type_desc as [Index_type], I.range_scan_count* 100.0 /(I.range_scan_count+I.leaf_insert_count+I.leaf_delete_count+I.leaf_update_count+I.leaf_page_merge_count+i.singleton_lookup_count) as [Percent_scan] fromSys.dm_db_index_operational_stats (db_id(),NULL,NULL,NULL) IJOINSys.objects o onO.object_id =I.object_idJOINSys.indexes x onX.object_id =I.object_id  andx.index_id=i.index_idWHERE(I.range_scan_count+I.leaf_insert_count+I.leaf_delete_count+Leaf_update_count+I.leaf_page_merge_count+I.singleton_lookup_count)!= 0 and ObjectProperty(i.object_id,'isusertable')= 1ORDER  by [Percent_scan] DESC

So what do you need to be aware of if you want to compress? Or what are the implications for the system?

Compression is done by rebuild index.

SORT_IN_TEMPDB option affects the space growth of tempdb

Online affects the spatial growth of user databases and the growth of transaction log file sizes

RECOVERY mode affects the growth of transaction log file size

Table or index in the process of compression, the old index before being replaced and the compressed index exists in the user database at the same time, to replace the space back to the filegroup, then at least one index required space just index the existing space size + (index the existing space size-savings)

The online option consumes more CPU

For the impact of tempdb space growth,

If recovery mode is simple, each check

SQL SERVER->> Data Compression

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.