Slow change dimension (slowly changing dimension)

Source: Internet
Author: User

Dimensional modeling of the Data Warehouse, there is a concept called slowly changing Dimensions, Chinese is generally translated into "slow change dimension", often abbreviated to SCD. The slow change dimension is proposed because in the real world, the attributes of the dimension are not static, and it changes slowly over time. This time-varying dimension is commonly referred to as a slow-change dimension, and the problem of dealing with the historical change information of dimension tables is referred to as the problem of processing slow-change dimensions, sometimes referred to as the problem of dealing with SCD.

Personal feeling the definition of slow change dimension on the wiki is more detailed, so translate this article and add personal views. For the original, please refer to: https://en.wikipedia.org/wiki/Slowly_changing_dimension

Type 0:

Type 0 is passive and does not have any action if the dimension value changes. Dimension values retain only the values that are inserted for the first time. It is seldom used now.

Type 1:

This approach overwrites old and new data, and therefore does not track historical data.

For example, table supplier

Supplier_key

Supplier_code

Supplier_name

Supplier_state

123

Abc

ACME Supply Co

Ca

In the example above, Supplier_code is the default primary key, and Supplier_key is the surrogate key. Technically, surrogate keys are not necessary because there is a unique primary key (Supplier_code). However, to optimize the performance of a join, use an integer instead of a character key (unless the number of bytes in the key of the medium character is less than the number of bytes in the integer key).

Assuming Supplier_state is adjusted from CA to IL, this record will be overwritten:

Supplier_key

Supplier_code

Supplier_name

Supplier_state

123

Abc

ACME Supply Co

IL

The disadvantage of the type 1 approach is that there is no history in the Data Warehouse. However it also has an advantage, it is easy to maintain.

If you have an aggregate calculation of state in your fact table, you need to recalculate it when the status changes.

Type 2:

Assuming Supplier_state is adjusted from CA to IL, a new incremental version is created.

Supplier_key

Supplier_code

Supplier_name

Supplier_state

Version.

123

Abc

ACME Supply Co

Ca

0

124

Abc

ACME Supply Co

IL

1

Another way to achieve this is to add an effective date segment.

Supplier_key

Supplier_code

Supplier_name

Supplier_state

Start_date

End_date

123

Abc

ACME Supply Co

Ca

2001-12-1

2015-4-2

124

Abc

ACME Supply Co

IL

2015-4-3

Null

The end date in the second row is null, which is used to represent the current version. But it is more recommended to use an unreachable date as the end date, such as 9999-12-31. This allows you to add indexes to the field and improve query performance, not to replace null values during the query.

Transactions that use a specific surrogate key (Supplier_key) can be permanently bound to a special time period in the dimension table. If you have an aggregate calculation of state in your fact table, you do not need to update it again.

Type 3

This method tracks changes that use separate columns to preserve a limited history. Type 3 retains a limited history because it is limited to specifying the number of columns used to store historical data. The original table structure in Model 1 and type 2 is the same, but Type 3 adds additional columns. In the following example, an additional column is added to the table to record the original state of the supplier_state.

Supplier_key

Supplier_code

Supplier_name

Previous_supplier_state

Effective_date

Current_supplier_state

123

Abc

ACME Supply Co

Ca

2015-4-3

IL

This is used to record the current and most recent history and cannot record earlier values.

Type 4

Type4 is to use history tables to store history. A table is used to store existing data and add a history table to record the change data. They use the same surrogate key.

For example above, the original table is named supplier, and the history table is supplier_history.

Supplier table

Supplier_key

Supplier_code

Supplier_name

Supplier_state

123

Abc

ACME Supply Co

IL

Supplier_history table

Supplier_key

Supplier_code

Supplier_name

Supplier_state

Create Date

123

Abc

ACME Supply Co

Ca

2015-4-3

This method is similar to the database audit table and the capture change data technology feature.

Slow change dimension (slowly changing dimension)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.