Infobright Architecture Analysis

Source: Internet
Author: User

I have been using infobright for a long time, but I still don't know much about the real infobright architecture. The following article is very good.

The overall architecture of infobright is as follows:

  

As shown in, infobright adopts the same architecture as MySQL and is divided into two layers. The upper layer is service and application management, and the lower layer is the storage engine. The default storage engine of infobright is brighthouse. However, infobright supports other storage engines, such as MyISAM, mrg_myisam, memory, and CSV. Infobright organizes data through three layers: dp (data pack), RPA (data pack node), and kN (knowledge node ). On this layer, it is a very powerful knowledge grid ).

Data Block (DP) is the lowest layer of storage. Each 64 K unit in a column forms a DP. DP is smaller than the column, with a better compression ratio, and larger than a single data unit, with better query performance.

The data block node is a one-to-one relationship between the dmns and the DP. RNS records statistics stored and compressed in each DP, including the maximum value, minimum value, number of null values, total number of units, and sum.

KN stores metadata sets pointing to the relationship between DP and columns, such as the value range (miin_max) and column data association. Most of the kn data is generated when loading data, and other tasks are generated when querying data.

On the top of this layer is the knowledge grid, and the knowledge grid architecture is an important reason for the high performance of infobright.

  

The knodge DGE grid can be divided into four parts, such as DSP, histogram, cmap and P-2-P.

As mentioned above. Histogram is used to improve the query performance of numeric types (such as date, time, decimal. Histogram is generated when data is loaded. There are mix, Max, and in histogram, min-max is divided into 1024 segments. If the range of mix_max is smaller than 1024, each segment is a separate value. In this case, kN indicates whether a value is in the binary representation of the current segment.

  

Histogram is used to quickly determine whether the current DP meets the query conditions. As shown in, for example, select ID from customerinfo where ID> 50 and ID <70. It is easy to obtain that the current DP does not meet the conditions. Therefore, Histogram can effectively reduce the number of query DP for those numeric queries.

Cmap is used for text query and is generated when data is loaded. Cmap is used to count the situation where the ASCII value in the current DP is 1-64. As shown in

  

For example, the figure above shows that a has never appeared in the second, third, and fourth positions of the text. 0 indicates no, and 1 indicates yes. The comparison of the text in the query is based on bytes. Therefore, cmap can improve the performance of the text query.

Pack-to-pack is generated during the join operation. It indicates the bitmap of the relationship between the two columns operated in the join two DP, that is, the binary matrix.

Knowledge grid is still complex. There are many details in it. For details, refer to the official White Paper and brighthouse: an analytic data warehouse for ad-hoc queries.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.