New Feature of DB210: adaptive compression

Source: Internet
Author: User
In the just-released DB2LUW10.1, a new compression technology-adaptive compression was introduced. This compression scheme is a mixture of table compression and page compression, that is, after a user's data table is compressed by a table-level dictionary, it is also compressed by a data page-level dictionary.

In the just-released DB2 LUW 10.1, a new compression technology-adaptive compression was introduced. This compression scheme is a mixture of table compression and page compression, that is, after a user's data table is compressed by a table-level dictionary, it is also compressed by a data page-level dictionary.

This solution inherits the advantages of the original table compression and high compression ratio, and uses dynamic page-level dictionaries to increase the compression ratio, effectively relieves the problem that the compression ratio degrades as data changes due to the compression of the original table static dictionary.

Table compression in DB2

In the era of big data explosion, the role of data compression becomes more and more important. Compression can save data storage costs and improve data access efficiency. Generally, compression is based on data dictionaries. The dictionary records repeated long patterns and short symbols used to represent these patterns. The compression process is to replace the mode in the raw data with the symbol, while decompression is to replace the symbol in the compressed data with the original mode to restore the original data.

Since V9.1, DB2 Linux Windows Unix provides Row Compression support for user table data, which is also known as Deep Compression ). Row compression is based on the existing value compression technology. Value compression only saves one copy of duplicate data of different rows and the same column in the table. Row compression is the basic unit of behavior data compression. During the compression process, rows are not split into columns for processing. With this technology, you can create a table-level data dictionary for a data table and use this dictionary to compress all the data in the entire table. After compression is enabled for a table, you need to execute table reorganization to create a dictionary on existing data and compress existing data. When you insert data to a table that has been compressed and a dictionary is created, the new data is compressed. When you modify the data in the table, the data is extracted, after modification, the data is stored in the table. When querying the data in the table, the data is decompressed and returned to the user. If the data in the table changes dramatically after the dictionary is created, the original dictionary may not be able to compress the changed data, at this time, you can use the TABLE reorganization command with the reset dictionary option to re-create a TABLE compression Dictionary (reorg table... Use the new dictionary to compress existing data.

In DB2 V9.1, table reorganization is the only way to create a dictionary. However, table reorganization is costly and inconvenient to use. The common practice is to import a small amount of data into the table, and then reorganize the table to create a dictionary, and then compress the data in the table. This process is automated in V9.5, and the feature of Automatic Dictionary Creation (ADC and Automatic Dictionary Creation) is added. After row compression is enabled, the compression dictionary is automatically created after a certain amount of data is added to the table (2 MB by default), and then the data in the table is compressed by the dictionary. Automatic dictionary creation simplifies the use of row compression, So that you no longer need to manually run table reorganization to create a dictionary and compress data. The dictionary will be automatically created when the data enters, data with a dictionary is automatically compressed, which requires no user intervention.

However, compared with the dictionary created by table reorganization and the compressed data, the dictionary created by the ADC is only based on a small amount of data. Therefore, the dictionary created by offline table reorganization is considered as a compression dictionary, the compression ratio of the ADC is low. If you want to get a high compression rate, table reorganization with the reset dictionary option is still necessary.

It is worth noting that each Data Partition under each partition of a user table has its own table compression dictionary. Partition refers to the Database Partition in the Database Partition (DPF) Environment; Data Partition refers to the Data Partition in the Range Partition Table. The data in the table below are physically stored independently, so they all have their own table compression dictionary. The table-level dictionary mentioned in this article, unless otherwise specified, refers to the table-level dictionary in a single database partition or non-Data Partition Table.

The preceding section briefly introduces the existing table compression feature in DB2 V9.7 and how to use this feature to create a dictionary and compress the data in the table. The following example uses table reorganization to create a dictionary and compress data. The dictionary is created based on the data in the table when the dictionary is created. Therefore, the dictionary can represent the data features in the table at this time. However, as the data in the table changes, such as the insertion of new data and the update of original data, the redundant features of the original data may change, the dictionary for table compression cannot automatically change dynamically based on these changes. Therefore, when data changes, sometimes the compression ratio of table compression degrades as data changes. To alleviate this problem, the new Compression feature-Adaptive Compression was introduced in the latest DB2 LUW V10.1 ).

Adaptive Compression

As mentioned earlier, the dictionary for table compression is static, that is, the dictionary will not change with the data changes after it is created, the only operation that can change the dictionary is the reconstruction of the dictionary-run the offline table reorganization operation with the reset dictionary option; at the same time, the Table compression dictionary is global for the entire Table (currently, the Database Partition Feature DPF and the Range Partition Table are not considered ), that is, the dictionary pattern is sampled from the entire table, which can represent the data features of the entire table. Therefore, the full table data can be compressed during dictionary creation.

At the same time, because the table's compressed dictionary is static, the dictionary does not change when the data in the table changes after the dictionary is created, as a result, the compression ratio of a table may decrease as data changes. Because the table's compressed dictionary is global, it represents the data features of the entire table, some local features of the table may not be represented. For example, the similarity between several adjacent data records in a table is very high, because these redundancy is only partial, it does not represent the data features of the entire table. Therefore, the compressed Dictionary of the table does not reflect the redundancy well, and the data may fail to achieve the best compression effect.

To address these two problems, adaptive compression came into being. Based on the original table compression, adaptive compression overwrites data page-level compression. The adaptive compression solution is a combination of traditional table compression and new page compression. Page compression is dynamic. When the page data reaches a certain level, the page dictionary is automatically created and the existing data is automatically compressed. After the page data changes, when the compression rate degrades to a certain extent, the page dictionary will be automatically rebuilt; the page is compressed to a certain extent, and it only targets the Data on the page where it is located. Therefore, for cluster Data (Clustered Data) the compression effect is better.

Page compression technology is also a type of row compression, that is, the row is the basic unit of compression. During the compression process, the row is not split into columns for processing. To distinguish the table-level row compression technology introduced in V9.1, This article calls it table compression, and page-level row compression in V10.1 is called page compression. Adaptive compression is a combination of the two.

Similar to the concept of pages in the operating system, data pages in DB2 are the smallest units of I/O. User data and some system control information are stored in the data page as records. The data page size in DB2 can be defined. The default value is 4 kb. In addition, there are 8 KB, 16 KB, and 32 KB. Page compression is for data on the same data page. The page compression dictionary is stored in the data page as a system record. Different data pages have different page compression dictionaries due to different data.

In DB2 LUW V10.1, a new keyword-ADAPTIVE is introduced to the SQL statement defined in the table for ADAPTIVE compression. The syntax is shown in Listing 1:

Listing 1. New Keyword ADAPTIVE syntax

      .-COMPRESS NO---------------.  >-----+---------------------------+--       |              .-ADAPTIVE-. |       '-COMPRESS YES-+----------+-'                     '-STATIC---'

ADAPTIVE indicates enabling a new ADAPTIVE compression scheme for the table, while STATIC keyword indicates the original table compression scheme. If no ADAPTIVE keyword is specified, ADAPTIVE compression is enabled by default, that is, the combination compression scheme of table compression Plus page compression. That is, if the compression option is specified, whether the compression type is ADAPTIVE or STATIC is not specified, in this case, adaptive compression is used implicitly. If you want to use only the original table compression, You need to display the specified STATIC keyword.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.