Compression algorithms in HBase compare GZIP, LZO, Zippy, Snappy [go]

Source: Internet
Author: User

Website: http://www.cnblogs.com/panfeng412/archive/2012/12/24/applications-scenario-summary-of-compression-algorithms.html

GZIP, LZO, zippy/snappy are commonly used in several compression algorithms, each has its own characteristics, so the application scenarios are different. Here combined with related engineering practice, do a summary.

Comparison of compression algorithms

Here's a set of test data that Google released a few years ago (the data is somewhat old and someone has recently tested it to be shared):

Algorithm % remaining Encoding Decoding
Gzip 13.4% MB/s 118 MB/s
LZO 20.5% 135 MB/s 410 MB/s
Zippy/snappy 22.2% 172 MB/s 409 MB/s

Note: From the hbase:the definitive guide

which

1) gzip compression rate is the highest, but in fact CPU-intensive, CPU consumption more than other algorithms, compression and decompression speed is also slow;

2) Lzo compression rate is centered, lower than gzip, but the compression and decompression speed is significantly faster than gzip, wherein the decompression speed more quickly;

3) Zippy/snappy compression rate is the lowest, and compression and decompression speed is slightly faster than Lzo.

Selection of compression algorithms in BigTable and HBase

The zippy algorithm is used in bigtable to achieve the fastest possible compression and decompression speed while reducing CPU consumption.

HBase, before snappy release (Google 2011 release Snappy), the use of Lzo algorithm, the target and bigtable similar; after snappy release, it is recommended to use snappy algorithm (refer to the Hbase:the Definitive guide), in particular, according to the actual situation of lzo and snappy have done more detailed comparison test before making a choice.

Practical experience in the actual project

The probability algorithm used in the project to use the Clearspring company's Open source cardinality estimation:Stream-lib, which solves the problem of de-recalculation, such as UV computing, is characterized by:

1) A UV calculation, can be limited to a fixed size of the bitmap space to complete (different sizes, corresponding to different error rates), such as 8k,64k;

2) Different bitmaps can be combined to get the merged Uvs.

The more bitmaps are maintained in the system, the more storage space is consumed, whether in memory or in the storage System (MySQL, HBase, etc.). Therefore, it is necessary to consider an appropriate algorithm to compress the bitmap. This is divided into the following two types of situations:

1) when the bitmap in memory, the choice of compression algorithm at this time, must have as fast as possible compression and decompression speed, and can not consume too much CPU resources, it is suitable to use LZO or snappy compression algorithm, to achieve rapid compression and decompression;

2) When the bitmap is stored in the DB, it is more concerned about the storage space savings, to have the highest possible compression rate, so the use of gzip compression algorithm, while in the process from memory dump to DB can also reduce the transmission overhead of network IO.

Summary words

The above is a summary comparison of the features of Gzip, LZO, zippy/snappy compression algorithms, and some practical methods. If there is anything wrong, please correct me and discuss.

Compression algorithms in HBase compare GZIP, LZO, Zippy, Snappy [go]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.