Compression algorithms in HBase compare GZIP, LZO, Zippy, Snappy [go]

Last Update:2016-05-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Website: http://www.cnblogs.com/panfeng412/archive/2012/12/24/applications-scenario-summary-of-compression-algorithms.html

GZIP, LZO, zippy/snappy are commonly used in several compression algorithms, each has its own characteristics, so the application scenarios are different. Here combined with related engineering practice, do a summary.

Comparison of compression algorithms

Here's a set of test data that Google released a few years ago (the data is somewhat old and someone has recently tested it to be shared):

Algorithm	% remaining	Encoding	Decoding
Gzip	13.4%	MB/s	118 MB/s
LZO	20.5%	135 MB/s	410 MB/s
Zippy/snappy	22.2%	172 MB/s	409 MB/s

Note: From the hbase:the definitive guide

which

1) gzip compression rate is the highest, but in fact CPU-intensive, CPU consumption more than other algorithms, compression and decompression speed is also slow;

2) Lzo compression rate is centered, lower than gzip, but the compression and decompression speed is significantly faster than gzip, wherein the decompression speed more quickly;

3) Zippy/snappy compression rate is the lowest, and compression and decompression speed is slightly faster than Lzo.

Selection of compression algorithms in BigTable and HBase

The zippy algorithm is used in bigtable to achieve the fastest possible compression and decompression speed while reducing CPU consumption.

HBase, before snappy release (Google 2011 release Snappy), the use of Lzo algorithm, the target and bigtable similar; after snappy release, it is recommended to use snappy algorithm (refer to the Hbase:the Definitive guide), in particular, according to the actual situation of lzo and snappy have done more detailed comparison test before making a choice.

Practical experience in the actual project

The probability algorithm used in the project to use the Clearspring company's Open source cardinality estimation:Stream-lib, which solves the problem of de-recalculation, such as UV computing, is characterized by:

1) A UV calculation, can be limited to a fixed size of the bitmap space to complete (different sizes, corresponding to different error rates), such as 8k,64k;

2) Different bitmaps can be combined to get the merged Uvs.

The more bitmaps are maintained in the system, the more storage space is consumed, whether in memory or in the storage System (MySQL, HBase, etc.). Therefore, it is necessary to consider an appropriate algorithm to compress the bitmap. This is divided into the following two types of situations:

1) when the bitmap in memory, the choice of compression algorithm at this time, must have as fast as possible compression and decompression speed, and can not consume too much CPU resources, it is suitable to use LZO or snappy compression algorithm, to achieve rapid compression and decompression;

2) When the bitmap is stored in the DB, it is more concerned about the storage space savings, to have the highest possible compression rate, so the use of gzip compression algorithm, while in the process from memory dump to DB can also reduce the transmission overhead of network IO.

Summary words

The above is a summary comparison of the features of Gzip, LZO, zippy/snappy compression algorithms, and some practical methods. If there is anything wrong, please correct me and discuss.

Compression algorithms in HBase compare GZIP, LZO, Zippy, Snappy [go]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Compression algorithms in HBase compare GZIP, LZO, Zippy, Snappy [go]

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support