Detailed description of the compression ratio of the Infobright Database

Source: Internet
Author: User

Infobright claims that the data compression ratio is 10:1 to 40: 1. We have mentioned earlier that the compression of Infobright is based on the data type in the DP. The system automatically selects the compression algorithm and automatically adjusts the algorithm parameters to achieve the optimal compression ratio.

Take a look at the compression ratio in my experiment environment, as shown in:

I believe that readers can clearly see that the overall compression ratio is 20.302. However, there is a misunderstanding that the compression ratio here refers to the size of the original data in the database/the size of the compressed data, rather than the physical data size of the text file/the size of the compressed data. Obviously, the former is much larger than the latter. In my experiment environment, the latter is around. Generally, the size of text data stored in the database is much larger than that of the original text, because some fields are set to a fixed length and occupy more space than the actual size. There is also a lot of statistical information data in the database, including indexes, which occupy a large amount of space. Infobright does not have an index, but it has KN data. Generally, the KN data size accounts for about 1% of the total data size.

Since Infobright compresses data types, let's look at the compression ratio of different data types. See the following table:

First look at the compression ratio of the Int type, the result is the compression ratio of Int <mediumint <smallint. Careful readers will easily find out how the compression ratio of tinyint is smaller than that of int. In addition to the data type, the data compression ratio is significantly different from the data type. PosFlag only has three possibilities: 0, 1, and-1. This data obviously cannot achieve a good compression ratio.

Let's look at the act field. The act field uses comment lookup, which has a better compression ratio and query performance than the simple char type. The principle of comment lookup is actually like bitmap index. The usage of comment lookup will be detailed in the next chapter.

Among all the fields, the compression ratio of the date field is the highest, and the final data size is only 0.1 M. The compression ratio of varchar is relatively poor, so we do not recommend using varchar unless necessary.

The above data clearly shows the powerful compression performance of Infobright. Once again, it is emphasized that data compression is not only related to data types, but also plays a significant role in the degree of data difference. When selecting the field data type, I personally think that performance considerations should be placed first. For example, the selection of some fields in the preceding table can be optimized. The ip can be changed to the bigint type, and the date can even be split into the year, month, or day columns as needed.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.