Enter the world of column database Infobright

Source: Internet
Author: User

Sysbase is a pioneer in column-based databases. Sysbase IQ 15 is the latest column-based database of Sybase. It has powerful functions, including fast data loading, ultra-high-speed analysis performance, powerful intelligent business analysis, and leading data modeling capabilities. Infobright is a MySQL-based data warehouse system. I have a detailed introduction to this blog.

It is also a columnar database, but the Infobright and Sybase IQ series are quite different. Infobright uses the knodge DGE Grid to organize data. There is no index in Infobright, which saves a lot of space. Sybase IQ series still use indexes, and my personal understanding of these indexes is the latest version of Bitmap indexes. According to the White Paper, the data compression ratio of Infobright can be from to 40: 1. I did a small experiment with a large log database, and I felt that compression was not so exaggerated. Based on the bitmap index idea, a higher similarity of each column of data has a higher compression ratio. Infobright should also satisfy this requirement, but it is still unclear about how to implement the knodge DGE Grid.

Infobright has many advantages: 

Advantages of Infobright:

1) high compression ratio

2) Quick Response to complex analysis and query statements

3) as the database grows, the query and loading performance remains stable.

4) no special data warehouse models, such as star model and snowflake model.

5) No materialized views, complex data partition policies, and indexes are required

6) implementation and management are simple and requires minimal Management

7) compatible with many BI suites, such as Pentaho, Cognos, and Jaspersoft.

Infobright has two versions: ICE and IEE. Currently, ICE is 3.3.1 and supports 64-bit Linux and 32-bit windows. ICE does not support DML, that is, it does not support insert, update, and other operations.

Rough Set) is one of the core technologies of Infobright. Infobright divides DP into three types based on the Knowledge network Knowledge Grid during query execution:

Related DPRelevant Packs), which meets the query Conditions

Non-related DPIrrelevant Packs), which does not meet the query Conditions

Suspicious DPSuspect Packs). The data in DP meets the query conditions.

The following is a case:

, Each column has A total of five DP, with the condition A> 6. Therefore, A1, A2, and A4 are irrelevant DP, A3 is related DP, and A5 is suspicious DP. Therefore, when performing a query, you only need to calculate the sum of the records that meet the conditions in B5, and then add SumB3). SumB3) is known. In this case, you only need to extract the DP B5. From the above analysis, we can know that Infobright can execute some queries very efficiently, and the higher the discrimination of where statements, the better. The high where discrimination can more accurately confirm whether it is related to DP, irrelevant DP, or DP, minimizing the number of DP and reducing the performance loss caused by decompression. The Histogram and CMAP mentioned in the previous chapter are generally used for condition determination, which can effectively improve query performance.

The principle of multi-table join is similar. First, we use Pack-To-Pack To generate the relationship between the two columns of join DP.

For example, select max (X. D) from t join x on t. B = X.C WHERE T. A> 6. Pack-To-Pack generates the relational matrix M between the DP of T. B and X.C. Suppose there is an element crossover between the first DP of T. B and the first DP of X.C, then M [] = 1, otherwise M [] = 0. This effectively reduces the number of DP during join operations.

We have learned about compression earlier. By the way, we mention the compression of DP. 64 k elements in each DP are treated as a sequence, where all null locations are stored separately, and other non-null data is compressed. Data Compression is related to the data type. infobright selects a compression algorithm based on the data type. Infobright automatically adjusts algorithm parameters to achieve optimal compression ratio.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.