A concept in a data warehouse for dimensional modeling is slowly changing dimensions, which is often abbreviated as SCD. The slow change dimension is proposed because in the real world, dimension attributes are not static and will change slowly with the loss of time. This time-varying dimension is generally called a slow-changing dimension, and the problem of processing the historical changes of the dimension table is called the problem of processing the slow-Changing Dimension, it is also referred to as SCD.
There are usually three ways to process slow-changing dimensions.
The first method is to directly overwrite the original value. This process is the easiest way to implement. It does not retain historical data and cannot analyze historical changes. The first method is generally referred to as "Type
1 ".
The second method is to add dimension rows. In this way, the support of the proxy key is required. The method is to generate a new dimension record when the dimension attributes change. The primary key is the newly assigned proxy key, and can be associated with the original dimension record through the natural key. The second method is generally referred to as "Type
2 ".
The third method is to add attribute columns. This method adds a column to the attribute for which the historical information needs to be analyzed to record the value before the attribute change. This attribute field uses the type
1. The advantage of this method is that the attribute values of the current and previous changes can be analyzed at the same time. The disadvantage is that only the last change information is retained. The third method is generally referred to as "Type 3 ".
In actual modeling, three methods can be used together, or different methods can be used for different attributes in a dimension table. These methods must be determined based on the actual situation, the purpose is the same, that is, to easily analyze historical changes.
The above data comes from the Kimball data warehouse tool book.
In the real environment, the second method is the most used. In the ETL process, we need to judge and compare the dimension values cyclically. If there are many dimension columns, it takes a lot of time. So here we can use the check sum method. A checksum value is stored before each dimension data load. You only need to compare the checksum value each time. If the value data for the two days is different, a new data entry is inserted in the dimension table, including the effective date of the data. If the data for the two days is the same, it is not processed. This method can greatly reduce the processing time of the system.