Influxdb data compression

Source: Internet
Author: User
Tags influxdb

Environment: centos6.5_x64
Influxdb version: 1.1.0

Data compression can be referenced by:

https://docs.influxdata.com/influxdb/v1.1/concepts/storage_engine/#compression

Influxdb different compression algorithms are used depending on the data type.

    • Int

First use the zigzag algorithm to encode, if the encoded value is less than (1 << 60)-1, using the simple8b algorithm;

If it is greater than the value, do not compress;

    • Timestamp

The sorting is encoded using the differential encoding algorithm and then compressed using the SIMPLE8B algorithm.

    • Float

Using the floating-point number compression algorithm provided by Facebook Gorilla paper

    • bool

Only 1-bit data, with a simple bit data packaging strategy

    • String

Using the Snappy algorithm

Introduction of zigzag algorithm in compression algorithm

The basis of this algorithm is that in most cases, the numbers we use are small numbers. Small integer corresponding to the zigzag code word short, large integer corresponding to the zigzag code length. However, in a particular scenario, for example, the integer to be transmitted is a large integer, the compression efficiency of the zigzag encoding is not ideal.

The implementation code is as follows:

//Zigzagencode Converts a int64 to a uint64 by Zig zagging negative and positive values//across even and odd numbers. Eg. [0,-1,1,-2] becomes [0, 1, 2, 3]func Zigzagencode (x Int64) UInt64 {return UInt64 (UInt64 (x<<1) ^ UInt64 (Int64 (x) >> the)))}//Zigzagdecode Converts a previously zigzag encoded UInt64 back to a int64func Zigzagdecode (v UInt64) Int64 {return Int64 (v>>1) ^ UInt64 ((Int64 (v&1) << the) >> the))}
SIMPLE8B algorithm

The algorithm is a 64-bit algorithm for compressing multiple integer data into a 64-bit storage structure, the first 4 bits in the storage structure are used to identify the value of the selector, and the last 60 bits are used to store the data, which can compress 0 to (1<<60)-1 digits.

Use the following table to encode:

┌──────────────┬─────────────────────────────────────────────────────────────┐│selector│0    1   2   3   4   5   6   7  8  9  0  One  A  -  -  the│├──────────────┼─────────────────────────────────────────────────────────────┤│bits│0    0   1   2   3   4   5   6  7  8 Ten  A  the  -  -  -│├──────────────┼─────────────────────────────────────────────────────────────┤│n│ -   -   -   -   -   the   A  Ten  8  7  6  5  4  3  2  1│├──────────────┼─────────────────────────────────────────────────────────────┤│wasted Bits│ -    -   0   0   0   0   A   0  4  4  0  0  0  0  0  0│└──────────────┴─────────────────────────────────────────────────────────────┘
Fackbook Gorilla XOR algorithm

The first value is not compressed; the subsequent value is the result of the XOR of the first value, and if the result is the same, only one 0 is stored, and if the result is different, the result after the XOR is stored.

Snappy algorithm

Here's a set of test data released by Google a few years ago (hbase:the definitive Guide):

Algorithm   % remaining Encoding    decodinggzip            13.4%    mb/s     118 MB/             Slzo20.5%   135 MB/s    410 mb/szippy/ Snappy    22.2%   172 MB/s    409 MB/s

which

1) gzip compression rate is the highest, but it is CPU-intensive, CPU consumption is more than other algorithms, compression and decompression speed is also slow;

2) Lzo compression rate is centered, lower than gzip, but the compression and decompression speed is significantly faster than gzip, wherein the decompression speed more quickly;

3) Zippy/snappy compression rate is the lowest, and compression and decompression speed is slightly faster than Lzo.

All right, that's it, I hope it helps you.

This article GitHub address:

https://github.com/mike-zhang/mikeBlogEssays/blob/master/2017/20170423_influxdb Data compression description. RST

Welcome to Supplement

Influxdb data compression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.