Influxdb data compression

Last Update:2017-04-23 Source: Internet

Author: User

Tags influxdb

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Environment: centos6.5_x64
Influxdb version: 1.1.0

Data compression can be referenced by:

https://docs.influxdata.com/influxdb/v1.1/concepts/storage_engine/#compression

Influxdb different compression algorithms are used depending on the data type.

First use the zigzag algorithm to encode, if the encoded value is less than (1 << 60)-1, using the simple8b algorithm;

If it is greater than the value, do not compress;

Timestamp

The sorting is encoded using the differential encoding algorithm and then compressed using the SIMPLE8B algorithm.

Float

Using the floating-point number compression algorithm provided by Facebook Gorilla paper

bool

Only 1-bit data, with a simple bit data packaging strategy

String

Using the Snappy algorithm

Introduction of zigzag algorithm in compression algorithm

The basis of this algorithm is that in most cases, the numbers we use are small numbers. Small integer corresponding to the zigzag code word short, large integer corresponding to the zigzag code length. However, in a particular scenario, for example, the integer to be transmitted is a large integer, the compression efficiency of the zigzag encoding is not ideal.

The implementation code is as follows:

//Zigzagencode Converts a int64 to a uint64 by Zig zagging negative and positive values//across even and odd numbers. Eg. [0,-1,1,-2] becomes [0, 1, 2, 3]func Zigzagencode (x Int64) UInt64 {return UInt64 (UInt64 (x<<1) ^ UInt64 (Int64 (x) >> the)))}//Zigzagdecode Converts a previously zigzag encoded UInt64 back to a int64func Zigzagdecode (v UInt64) Int64 {return Int64 (v>>1) ^ UInt64 ((Int64 (v&1) << the) >> the))}

SIMPLE8B algorithm

The algorithm is a 64-bit algorithm for compressing multiple integer data into a 64-bit storage structure, the first 4 bits in the storage structure are used to identify the value of the selector, and the last 60 bits are used to store the data, which can compress 0 to (1<<60)-1 digits.

Use the following table to encode:

┌──────────────┬─────────────────────────────────────────────────────────────┐│selector│0    1   2   3   4   5   6   7  8  9  0  One  A  -  -  the│├──────────────┼─────────────────────────────────────────────────────────────┤│bits│0    0   1   2   3   4   5   6  7  8 Ten  A  the  -  -  -│├──────────────┼─────────────────────────────────────────────────────────────┤│n│ -   -   -   -   -   the   A  Ten  8  7  6  5  4  3  2  1│├──────────────┼─────────────────────────────────────────────────────────────┤│wasted Bits│ -    -   0   0   0   0   A   0  4  4  0  0  0  0  0  0│└──────────────┴─────────────────────────────────────────────────────────────┘

Fackbook Gorilla XOR algorithm

The first value is not compressed; the subsequent value is the result of the XOR of the first value, and if the result is the same, only one 0 is stored, and if the result is different, the result after the XOR is stored.

Snappy algorithm

Here's a set of test data released by Google a few years ago (hbase:the definitive Guide):

Algorithm   % remaining Encoding    decodinggzip            13.4%    mb/s     118 MB/             Slzo20.5%   135 MB/s    410 mb/szippy/ Snappy    22.2%   172 MB/s    409 MB/s

which

1) gzip compression rate is the highest, but it is CPU-intensive, CPU consumption is more than other algorithms, compression and decompression speed is also slow;

2) Lzo compression rate is centered, lower than gzip, but the compression and decompression speed is significantly faster than gzip, wherein the decompression speed more quickly;

3) Zippy/snappy compression rate is the lowest, and compression and decompression speed is slightly faster than Lzo.

All right, that's it, I hope it helps you.

This article GitHub address:

https://github.com/mike-zhang/mikeBlogEssays/blob/master/2017/20170423_influxdb Data compression description. RST

Welcome to Supplement

Influxdb data compression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More