Data Warehouse practice based on Hadoop ecosystem-advanced Technology (17)

Source: Internet
Author: User
Tags hadoop ecosystem

17. Segmented Dimensions
This section describes the implementation techniques for segmented dimensions. A segmented dimension contains segments of successive values. For example, the annual sales order segment dimension may contain three files called "Low", "medium", "High", and each file definition is 0.01 to 15000, 15000.01 to 30000.00, 30000.01 to 99999999.99. If a customer's annual sales order amount is 10000, it is classified as a "low" file.
A segmented dimension can store multiple segments of a collection. For example, there might be a segmented collection for promotional analysis, another for market segmentation, and possibly one for sales regional planning. Segments are typically user-defined and are rarely available directly from the transaction source data.

1. Annual Sales order star mode
This section describes how to implement an annual order segmentation dimension. Two new star patterns are required, as shown in. The fact table of the star pattern uses (correlates to) the existing Customer_dim and a new Year_dim table. The year dimension is a subset of the date dimension. Annual_customer_segment_fact is the only table that uses the Annual_order_segment_dim table. Annual_order_segement_dim is a segmented dimension table.

The Annual_order_segment_dim table stores multiple segmented collections. In the following example, the two segmented collection "Project Alpha" and "grid" are imported into the Annual_order_segment_dim table. Both of these staging sets are categorized according to the user's annual sales order amount. Project Alpha is divided into six segments and the grid is divided into three segments. The following table shows an example of this segment.

Segment Name

Band Name

Start Value

End Value

PROJECT ALPHA

Bottom

0.01

2500.00

PROJECT ALPHA

Low

2500.01

3000.00

PROJECT ALPHA

Mid-Low

3000.01

4000.00

PROJECT ALPHA

Mid

4000.01

5500.00

PROJECT ALPHA

Mid-High

5500.01

6500.00

PROJECT ALPHA

Top

6500.01

99999999.99

Grid

Low

0.01

3000.00

Grid

MED

3000.01

6000.00

Grid

High

6000.01

99999999.99


Each fragment has a start value and an end value. The granularity of the segment is the gap between this paragraph and the next segment. The granularity must be the smallest possible value for the measure, and in the example of the sales order amount is 0.01. The end value of the last fragment is the maximum possible value for the sales order amount. The following script is used to establish a segmented dimension Data Warehouse pattern.
Use DW;      CREATE TABLE Annual_order_segment_dim (segment_sk int, segment_name varchar (), Band_name varchar (50), Band_start_amount Decimal (10,2), Band_end_amount decimal (10,2), version int, effective_date date, Expiry_ Date date) clustered by (Segment_sk) into 8 buckets stored as ORC tblproperties (' transactional ' = ' true '); INSERT into Annual_order_segment_dim values (1, ' Project Alpha ', ' bottom ', 0.01, 2500.00, 1, ' 1900-01-01 ', ' 2200-01-01 '); INSERT into Annual_order_segment_dim values (2, ' Project Alpha ', ' low ', 2500.01, 3000.00, 1, ' 1900-01-01 ', ' 2200-01-01 '); INSERT into Annual_order_segment_dim values (3, ' Project Alpha ', ' mid-low ', 3000.01, 4000.00, 1, ' 1900-01-01 ', ' 2200-01-0 1 '); INSERT into Annual_order_segment_dim values (4, ' Project Alpha ', ' mid ', 4000.01, 5500.00, 1, ' 1900-01-01 ', ' 2200-01-0 1 '); INSERT into Annual_order_segment_dim values (5, ' Project Alpha ', ' Mid_high ', 5500.01, 6500.00, 1, ' 1900-01-01 ', ' 2200 -01-01 '); INSERT intoAnnual_order_segment_dim VALUES (6, ' Project Alpha ', ' top ', 6500.01, 99999999.99, 1, ' 1900-01-01 ', ' 2200-01-01 ');  INSERT into Annual_order_segment_dim values (7, ' grids ', ' low ', 0.01, 3000, 1, ' 1900-01-01 ', ' 2200-01-01 '); INSERT into Annual_order_segment_dim values (8, ' grid ', ' med ', 3000.01, 6000.00, 1, ' 1900-01-01 ', ' 2200-01-01 '); Insert I  Nto Annual_order_segment_dim VALUES (9, ' grid ', ' High ', 6000.01, 99999999.99, 1, ' 1900-01-01 ', ' 2200-01-01 ');    CREATE TABLE Year_dim (Year_sk int, year int);    CREATE TABLE annual_sales_order_fact (customer_sk int, Year_sk int, annual_order_amount decimal (10, 2)); CREATE TABLE annual_customer_segment_fact (segment_sk int, Customer_sk int, Year_sk int);

2. Initial load
This section describes the initial load and tests. The following initial load script imports the data from the Order_date dimension table (a view of the Date_dim table) into the Year_dim table, importing the data from the Sales_order_fact table into annual_sales_order_fact tables. Import the data from the Annual_sales_order_fact table into the annual_customer_segment_fact tables. This script loads all historical data.
Use DW;    INSERT INTO Year_dim  Select Row_number () over (order by T1.year) + T2.sk_max, year from  (select distinct year From Order_date_dim) T1 cross joins (select COALESCE (max (Year_sk), 0) Sk_max from Year_dim) T2; INSERT INTO Annual_sales_order_fact  select A.customer_sk,        year_sk,        sum (order_amount) from    Sales_ Order_fact A,       year_dim C,       Order_date_dim d   where A.order_date_sk = D.order_date_sk and     c.year = D.year     and D.year <   GROUP by A.customer_sk, C.year_sk;    INSERT INTO Annual_customer_segment_fact  select D.segment_sk,        a.customer_sk,        A.year_sk    from Annual_sales_order_fact A,        Annual_order_segment_dim D   where Annual_order_amount >= band_start_amount and     Annual_order_amount <= band_end_amount;  
Executes the initial load script, querying the Annual_customer_segment_fact table to confirm that the initial load was successful.
Select A.customer_sk CSK,       a.year_sk Ysk,       Annual_order_amount amt,       segment_name sn,       band_name bn  From Annual_customer_segment_fact A,       Annual_order_segment_dim B,       Year_dim C,       annual_sales_order_fact D where A.segment_sk = B.segment_sk and   A.year_sk = C.year_sk and   A.customer_sk = D.customer_sk   and A.year_sk = D.year_skcluster by CSK, Ysk, Sn, BN;
The query results are as shown.

Note that this is done by the Client Agent key Customer_sk grouping sum to determine the segmentation, the actual situation may be grouped by customer_number, because no matter how the customer's SCD attribute changes, is generally considered to be a customer.

3. Regular loading
This section explains how to load the script regularly and how to test it. The periodic load is similar to the initial load, except that you do not need to load the Year_dim table. The data in the Annual_sales_order_fact table is imported into the annual_customer_segment_fact tables. The following periodic load script is scheduled to be executed each year, which loads sales data for the previous year.
Use DW;  INSERT INTO Annual_sales_order_fact  select A.customer_sk,        year_sk,        sum (order_amount) from    Sales_ Order_fact A,       year_dim C,       Order_date_dim d   where A.order_date_sk = D.order_date_sk and     c.year = d.year< C9/>and d.year = year (current_date) – 1   GROUP by A.customer_sk, C.year_sk;    INSERT INTO Annual_customer_segment_fact  select D.segment_sk,        a.customer_sk,        C.year_sk    from Annual_sales_order_fact A,        year_dim C,       Annual_order_segment_dim d   where A.year_sk = C.year_sk   and c.year = year (current_date)-1 and   Annual_order_amount >= band_start_amount and     Annual_order_amount & Lt;= Band_end_amount;


Data Warehouse practice based on Hadoop ecosystem-advanced Technology (17)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.