Dimensional modeling of the Data Warehouse, there is a concept called slowly changing Dimensions, Chinese is generally translated into "slow change dimension", often abbreviated to SCD. The slow change dimension is proposed because in the real world, the attributes of the dimension are not static, and it changes slowly over time. This time-varying dimension is commonly referred to as a slow-change dimension, and the problem of dealing with the historical change information of dimension tables is referred to as the problem of processing slow-change dimensions, sometimes referred to as the problem of dealing with SCD.
Personal feeling the definition of slow change dimension on the wiki is more detailed, so translate this article and add personal views. For the original, please refer to: https://en.wikipedia.org/wiki/Slowly_changing_dimension
Type 0:
Type 0 is passive and does not have any action if the dimension value changes. Dimension values retain only the values that are inserted for the first time. It is seldom used now.
Type 1:
This approach overwrites old and new data, and therefore does not track historical data.
For example, table supplier
Supplier_key |
Supplier_code |
Supplier_name |
Supplier_state |
123 |
Abc |
ACME Supply Co |
Ca |
In the example above, Supplier_code is the default primary key, and Supplier_key is the surrogate key. Technically, surrogate keys are not necessary because there is a unique primary key (Supplier_code). However, to optimize the performance of a join, use an integer instead of a character key (unless the number of bytes in the key of the medium character is less than the number of bytes in the integer key).
Assuming Supplier_state is adjusted from CA to IL, this record will be overwritten:
Supplier_key |
Supplier_code |
Supplier_name |
Supplier_state |
123 |
Abc |
ACME Supply Co |
IL |
The disadvantage of the type 1 approach is that there is no history in the Data Warehouse. However it also has an advantage, it is easy to maintain.
If you have an aggregate calculation of state in your fact table, you need to recalculate it when the status changes.
Type 2:
Assuming Supplier_state is adjusted from CA to IL, a new incremental version is created.
Supplier_key |
Supplier_code |
Supplier_name |
Supplier_state |
Version. |
123 |
Abc |
ACME Supply Co |
Ca |
0 |
124 |
Abc |
ACME Supply Co |
IL |
1 |
Another way to achieve this is to add an effective date segment.
Supplier_key |
Supplier_code |
Supplier_name |
Supplier_state |
Start_date |
End_date |
123 |
Abc |
ACME Supply Co |
Ca |
2001-12-1 |
2015-4-2 |
124 |
Abc |
ACME Supply Co |
IL |
2015-4-3 |
Null |
The end date in the second row is null, which is used to represent the current version. But it is more recommended to use an unreachable date as the end date, such as 9999-12-31. This allows you to add indexes to the field and improve query performance, not to replace null values during the query.
Transactions that use a specific surrogate key (Supplier_key) can be permanently bound to a special time period in the dimension table. If you have an aggregate calculation of state in your fact table, you do not need to update it again.
Type 3
This method tracks changes that use separate columns to preserve a limited history. Type 3 retains a limited history because it is limited to specifying the number of columns used to store historical data. The original table structure in Model 1 and type 2 is the same, but Type 3 adds additional columns. In the following example, an additional column is added to the table to record the original state of the supplier_state.
Supplier_key |
Supplier_code |
Supplier_name |
Previous_supplier_state |
Effective_date |
Current_supplier_state |
123 |
Abc |
ACME Supply Co |
Ca |
2015-4-3 |
IL |
This is used to record the current and most recent history and cannot record earlier values.
Type 4
Type4 is to use history tables to store history. A table is used to store existing data and add a history table to record the change data. They use the same surrogate key.
For example above, the original table is named supplier, and the history table is supplier_history.
Supplier table
Supplier_key |
Supplier_code |
Supplier_name |
Supplier_state |
123 |
Abc |
ACME Supply Co |
IL |
Supplier_history table
Supplier_key |
Supplier_code |
Supplier_name |
Supplier_state |
Create Date |
123 |
Abc |
ACME Supply Co |
Ca |
2015-4-3 |
This method is similar to the database audit table and the capture change data technology feature.
Slow change dimension (slowly changing dimension)