Originally from: Http://outofmemory.cn/mysql/database-compression-tech
Yesterday to the team's small partners to do a database compression technology survey, which can be disclosed in the part of the share. Database compression technology is now a standard technology for a variety of databases, including three business databases, a variety of professional databases, as well as a variety of open source databases and NoSQL databases.
Today, the use of database compression technology is not simply to save storage costs, more often in order to provide higher computational density (such as capacity-constrained SSDs), and to provide higher query performance (OLAP). For the favorable factors of compression, some public cognition is: Lebis is more advantageous to compress, the larger input, the orderly input is more advantageous to compress.
Different databases on the choice of compression granularity is also very diverse, most databases use blocks as the basic unit of compression, a few databases provide field-level compression, but there are some databases using tables, even the entire library as the basic unit of compression. Obviously, the thicker the compression granularity, the greater the impact on the usability of the system, the compression granularity above the table level is usually no longer considered to be the database itself supporting the compression technology.
In addition to the compression granularity, the choice of storage formats is also important for the scenario, such as that the row is suitable for wide queries (access to a few rows and most columns), for projection optimization (OLTP), and for narrow queries (access to most rows and few columns) for filter optimization (OLAP) The mixture of the two is so-called block-by-column compression, organized by row between blocks, i.e. row and column storage (PAX), typically represented by the Oracle Exadata HCC.
Although it is a standard, but different databases for the database compression technology implementation is almost different, but the overall can be divided into three levels: 1) Packing, such as the elimination of small integer front end 0, eliminate char trailing spaces, etc., this kind of compression technology usually occurs in the OLTP system, the granularity is usually the field level, The system usually provides two kinds of normal and condensed storage formats; 2) Encoding, that is, regular compression, typical methods include dictionary, RLE, prefix, difference, etc., the details of the relevant technology, in the reference material has a detailed discussion; 3) Compression, that is, back-end compression, That is, the direct use of general compression algorithms, such as snappy, Zlib, bZIP and so on.
All business databases and professional analytic databases introduce different encoding methods instead of using compression directly. The truth is simple, first of all, encode more understanding of data than compression, because compression always think of data as a continuous stream of bytes, and encoding know the boundary, type and range characteristics of each field, so encoding+ Compression will provide a higher compression ratio than using only compression, and second, encoding will provide a higher decoding speed, because even the fastest snappy, you need to fully decompress the data before you can query, Most encoding methods do not need decoding to query; Finally, encoding will provide reasonable encoding speed, although compared to snappy, but will be far more than zlib, bzip such opponents.
The following is a survey of a database compression technique:
Here are some useful links to database compression techniques:
Turn: Survey on database compression technology