Data Warehouse Series: The three types and prototype design of the common slow-changing dimension

Source: Internet
Author: User

In the process of extracting data from an OLTP business database to a DW data warehouse, especially after the first import, the problem is that some data in the business database has been changed to reflect these changes to the Data warehouse. In the Data warehouse, which data should be changed, which can not be changed? With these changes in mind, the dimension tables in the Data warehouse should be designed to meet these needs.

It is clear that the changes in the data in the business database are very natural and normal, such as customer contact, mobile phone number and other information may change with the location of the customer changes, such as the price of goods in different periods of increase and decline in the change. Then in the business database, it will naturally be modified and immediately reflected in the actual business. However, in the Data Warehouse, the main characteristics of its data is static historical data, the second is less change does not delete, the third is regular growth, its role is mainly used for data analysis. Therefore, the analysis of the historical data on the request, there are some data need to be able to reflect the history of the change in the cycle, some data is not needed, then how the data should be controlled.

Suppose that the first time a batch of data was loaded from the business database into the Data Warehouse, then the business database had such a customer's information.

Biwork, a customer who lives in Beijing, is currently a BI development engineer. Suppose Biwork because Beijing air quality PM2.5 and so on reason moved from Beijing to Sanya. So this message should be updated in the business database-

So what should the data warehouse do when the next time you extract such information from a business database? We assume that in the data warehouse to achieve synchronization with the business database, the data Warehouse also directly modify the entry data update. Then we created the report to do some simple statistical analysis, when in the Data Warehouse all the customer biwork sales point to the biwork new location-the city of Sanya, but in fact biwork all previous purchases occurred when Biwork lived in Beijing. This is a very simple example that describes the problems that may arise from data induction and analysis due to changes in some basic information. But sometimes, this kind of scene is certainly possible to exist.

To solve a problem like this, you need to understand a very important concept in the Data Warehouse- Slowly changing the dimension .

Slow gradient Type one (Type 1 SCD)

In the Data warehouse, we can keep the data in the business data and the data warehouse consistent consistently. The Business Key-customerid from the business database can be used in the Customer dimension to track changes in business data, overwriting old business data as soon as changes occur.

The records in the DW get the latest city information from the CustomerID in the business database and are updated directly into the DW.

Slow gradient type two (Type 2 SCD)

Of course, in the Data Warehouse is more of the relative static historical data data collection and analysis, so as possible to maintain the business system from the historical data, can really capture this historical data changes. In the above example, it may be necessary to analyze the results of the biwork in 2012 when the overall level of purchase, but the purchase amount has been reduced since 2013, the reason may be related to the city, the stores in Beijing may be more than in Sanya stores. In a situation like this, it's not easy to update the biwork in the data warehouse directly, but a new piece of data should be added to show that Biwork is now located in Sanya.

But if you just add a new piece of data to the DW, there will still be a new problem, because identifying the customer in the DW is done through CustomerID, the CustomerID from the business database, which is unique. However, a new data in the DW to save the historical information in the business database, there is no guarantee that the data in the DW is unique, other DW data tables associated with this table can not know how to reference this Customer information. In fact, if CustomerID uniquely identifies the Customer as a primary key in the DW, a failure will occur when inserting new data.

So we need to keep the Business key business key because it's the only link to the business database. The part that makes the change is the new addition of a key, a data warehouse keys. In the terminology of the Data Warehouse, the key that uniquely identifies the Data Warehouse table record is called the surrogate key surrogate key, which is usually set to the primary key of the DW table.

In this table above, which--

Customerid-business key business keys, which are used to connect the key to the business database and Data Warehouse, and note that no changes should be made at any time in the business database or Data warehouse. The Dwid-surrogate key surrogate key, typically set to the primary key of the DW dimension table, is used to correlate dimension tables and fact tables within the Data warehouse.

More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/database/basis/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.