Configuration optimizations for hierarchies,derived dimensions of the "Go" kylin

Source: Internet
Author: User

http://blog.csdn.net/jiangshouzhuang/article/details/51286150

Hierarchies:

In theory, for n dimensions, we can do 2 of the n-dimensional combination of dimensions. However, for some combinations of dimensions, it is sometimes not necessary. For example, if we have three dimensions: Continent, country, city, in hierarchies, the largest dimension is ranked first. When using drill down analysis, we only need a combination of the following three dimensions:
GROUP BY continent
Group by continent, country
Group by continent, country, city

In this example, the combination of dimensions has been reduced from 8 to 3 for a total of 2 of 3, which is a good optimization and is also suitable for scenarios such as Year,quater,month,date.

If we set hierarchy as H1,H2,H3, then the typical scenario would be:
A. Hierarchies on lookup table
Fact table (joins) Lookup table

Column1,column2,,,,,, FK pk,,h1,h2,h3,,,,

B. Hierarchies on fact table
Fact table
Column1,column2,,, H1,h2,h3,,,,,,,

For scenario A, this is a special case, PK on the lookup table, accidentally became part of the hierarchies. For example we have a lookup table for the calendar, Cal_dt is PK (primary key):
A *. Hierarchies on lookup table + its primary key
Lookup Table (Calendar)
Cal_dt (PK), Week_beg_dt, Month_beg_dt, Quarter_beg_dt,,,

For a * this case, you should use the "Derived Columns" optimization scheme.

Derived Columns:
Derived column can be used when one or more dimensions (which must be a dimension of the lookup table, which are referred to as "Derived") can be reduced from the other (usually the associated FK, referred to as "host column").
For example, if we have a table for lookup, we use join to correlate the fact table and use "where dima=dimx". It is important to note in Kylin that if you choose FK as a dimension, then the associated PK will be automatically queryable, without any additional overhead. This is important because the FK and PK are always the same, Kylin can first use the filters/groupby on the FK, and use the PK to transparently replace. This shows that if we want to use Dima (FK), Dimx (PK), DIMB,DIMC in our cube, we are able to safely select only DIMA,DIMB,DIMC.
Fact table (joins) Lookup table

Column1,column2,,,,,, DimA (FK) dimx (PK), dimb, DimC

Here the dimension Dima (the dimension represents FK/PK) has a special mapping to dimb.
DimA dimb DimC
1 A?
2 b?
3 C?
4 A?
In this case, given a value of Dima, the value of dimb is determined, so we say dimb can be obtained from Dima (Derived). When we build a cube containing Dima and dimb, we can simply include Dima, and Mark dimb as derived. Derived column (DIMB) does not participate in the generation of Cuboids:
Original combinations:--Original dimension combination

Abc,ab,ac,bc,a,b,c

combinations when driving B from A:--using derived optimized dimension combination

Ac,a,c

In the case of runtime, such as "SELECT COUNT (*) from fact_table INNER join Looup1 GROUP by Looup1. Dimb", it expects to get query results from DIMB that contain cuboid. However, DIMB has no results in cuboids because of the use of derived optimization. In this case, we modify the execution plan and first follow the group by operation according to Dima (its host column), and we will get intermediate results such as:
DimA COUNT (*)
1 1
2 1
3 1
4 1
Then, Kylin will replace the value of Dima with the value of dimb (because their values are in the lookup table, Kylin can load the entire lookup table into memory and build a mapping relationship), so the intermediate result is:
Dimb COUNT (*)
A 1
B 1
C 1
A 1
Immediately after that, the engine running SQL (calcite) will further aggregate the intermediate result as the final result:
Dimb COUNT (*)
A 2
B 1
C 1
This step occurs during the SQL query run, which is "at the cost of extra runtime aggregation".

Configuration optimizations for hierarchies,derived dimensions of the "Go" kylin

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.