BKJIA database channel has previously reported the world of Infobright in the column database. The overall architecture of Infobright is shown below:
As shown in, Infobright adopts the same architecture as MySQL and is divided into two layers. The upper layer is service and application management, and the lower layer is the storage engine. The default storage engine of Infobright is brighthouse. However, Infobright supports other storage engines, such as MyISAM, MRG_MyISAM, Memory, and CSV. Infobright organizes Data through three layers: DP (Data Pack), DPNData Pack Node), and KNKnowledge Node ). On this layer, it is an extremely powerful Knowledge Network (Knowledge Grid ).
Data Block DP) is the lowest layer of storage. Each 64 K Unit in the column forms a DP. DP is smaller than the column, with a better compression ratio, and larger than a single data unit, with better query performance.
The data block node is a one-to-one relationship between the dmns and the DP. RNS records statistics stored and compressed in each DP, including the maximum value, minimum value, number of null values, total number of units, and sum.
KN stores metadata sets pointing to the relationship between DP and columns, such as the value range MIin_Max) and column data association. Most of the KN data is generated when loading data, and other tasks are generated when querying data.
On the top of this layer is the Knowledge network knodge DGE Grid. The knodge DGE Grid architecture is an important reason for the high performance of Infobright.
The knodge DGE Grid can be divided into four parts, such as DSP, Histogram, CMAP and P-2-P.
As mentioned above. Histogram is used to improve the query performance of numeric types such as date, time, decimal. Histogram is generated when data is loaded. There are mix, max, and in Histogram, Min-Max is divided into 1024 segments. If the range of Mix_Max is smaller than 1024, each segment is a separate value. In this case, KN indicates whether a value is in the binary representation of the current segment.
Histogram is used to quickly determine whether the current DP meets the query conditions. As shown in, for example, select id from customerInfo where id> 50 and id <70. It is easy to obtain that the current DP does not meet the conditions. Therefore, Histogram can effectively reduce the number of query DP for those numeric queries.
CMAP is used for text query and is generated when data is loaded. CMAP is used to count the situation where the ASCII value in the current DP is 1-64. As shown in
For example, the figure above shows that A has never appeared in the second, third, and fourth positions of the text. 0 indicates no, and 1 indicates yes. The comparison of the text in the query is based on bytes. Therefore, CMAP can improve the performance of the text query.
Pack-To-Pack is generated during the Join operation. It indicates the bitmap of the relationship between the two columns operated in the join two DP, that is, the binary matrix.
Knowledge Grid is still complex. There are many details in it. for details, refer to the official White Paper and Brighthouse: an analytic data warehouse for ad-hoc queries.