Environment: centos6.5_x64
Influxdb version: 1.1.0
Data compression can be referenced by:
https://docs.influxdata.com/influxdb/v1.1/concepts/storage_engine/#compression
Influxdb different compression algorithms are used depending on the data type.
First use the zigzag algorithm to encode, if the encoded value is less than (1 << 60)-1, using the simple8b algorithm;
If it is greater than the value, do not compress;
The sorting is encoded using the differential encoding algorithm and then compressed using the SIMPLE8B algorithm.
Using the floating-point number compression algorithm provided by Facebook Gorilla paper
Only 1-bit data, with a simple bit data packaging strategy
Using the Snappy algorithm
Introduction of zigzag algorithm in compression algorithm
The basis of this algorithm is that in most cases, the numbers we use are small numbers. Small integer corresponding to the zigzag code word short, large integer corresponding to the zigzag code length. However, in a particular scenario, for example, the integer to be transmitted is a large integer, the compression efficiency of the zigzag encoding is not ideal.
The implementation code is as follows:
//Zigzagencode Converts a int64 to a uint64 by Zig zagging negative and positive values//across even and odd numbers. Eg. [0,-1,1,-2] becomes [0, 1, 2, 3]func Zigzagencode (x Int64) UInt64 {return UInt64 (UInt64 (x<<1) ^ UInt64 (Int64 (x) >> the)))}//Zigzagdecode Converts a previously zigzag encoded UInt64 back to a int64func Zigzagdecode (v UInt64) Int64 {return Int64 (v>>1) ^ UInt64 ((Int64 (v&1) << the) >> the))}
SIMPLE8B algorithm
The algorithm is a 64-bit algorithm for compressing multiple integer data into a 64-bit storage structure, the first 4 bits in the storage structure are used to identify the value of the selector, and the last 60 bits are used to store the data, which can compress 0 to (1<<60)-1 digits.
Use the following table to encode:
┌──────────────┬─────────────────────────────────────────────────────────────┐│selector│0 1 2 3 4 5 6 7 8 9 0 One A - - the│├──────────────┼─────────────────────────────────────────────────────────────┤│bits│0 0 1 2 3 4 5 6 7 8 Ten A the - - -│├──────────────┼─────────────────────────────────────────────────────────────┤│n│ - - - - - the A Ten 8 7 6 5 4 3 2 1│├──────────────┼─────────────────────────────────────────────────────────────┤│wasted Bits│ - - 0 0 0 0 A 0 4 4 0 0 0 0 0 0│└──────────────┴─────────────────────────────────────────────────────────────┘
Fackbook Gorilla XOR algorithm
The first value is not compressed; the subsequent value is the result of the XOR of the first value, and if the result is the same, only one 0 is stored, and if the result is different, the result after the XOR is stored.
Snappy algorithm
Here's a set of test data released by Google a few years ago (hbase:the definitive Guide):
Algorithm % remaining Encoding decodinggzip 13.4% mb/s 118 MB/ Slzo20.5% 135 MB/s 410 mb/szippy/ Snappy 22.2% 172 MB/s 409 MB/s
which
1) gzip compression rate is the highest, but it is CPU-intensive, CPU consumption is more than other algorithms, compression and decompression speed is also slow;
2) Lzo compression rate is centered, lower than gzip, but the compression and decompression speed is significantly faster than gzip, wherein the decompression speed more quickly;
3) Zippy/snappy compression rate is the lowest, and compression and decompression speed is slightly faster than Lzo.
All right, that's it, I hope it helps you.
This article GitHub address:
https://github.com/mike-zhang/mikeBlogEssays/blob/master/2017/20170423_influxdb Data compression description. RST
Welcome to Supplement
Influxdb data compression