Brief introduction
Data integration is a key concept in the Data warehouse. The design and implementation of the ETL (data extraction, transformation and loading) process is an extremely important part of the Data Warehouse solution. ETL processes are used to extract business data from multiple sources, clean up data, then integrate the data, and load them into the Data Warehouse database to prepare for data analysis.
ETL Process Design
Although the actual ETL design and implementation depends to a great extent on the ETL tools selected for the Data Warehouse project, the advanced systematic ETL design will help to build an efficient and flexible ETL process.
Before delving into the design of the ETL process in the Data Warehouse, remember the rule of ETL: "The ETL process should not modify the data, but should optimize the data." "If you find that you need to make changes to your business data, but you're not sure if these changes will change the meaning of the data itself, consult your customer before starting the ETL process."
Modulation of ETL process design
Because of its procedural characteristics and the possibility of hundreds of or thousands of operations, it is extremely important to design the ETL process in an accurate manner so that they become efficient, scalable, and maintainable. ETL Data conversion operations can be roughly divided into 6 groups or modules: data extraction, verification, cleaning, integration, aggregation and loading. To schedule these groups, follow the steps to get the most simplified, best performance, and easy to modify logical order for this process. The order of execution is shown in the following figure.
Figure 1. Functional module design of ETL data conversion process
In the business requirements and data analysis phases of the project, we created the data mapping information. There are many ways to record data maps; ETL Data mapping table is the best way to guide ETL process design. You can also use this table as a way to communicate with business customers about data mapping and ETL process issues. ETL data map tables have different levels, such as entity level and attribute level. Detailed data mapping information at different levels is in each level. The following table is a simplified example of an entity-level ETL data mapping table. Each "X" in the table represents a link to the operation details or the lower-level data map document.
Table 1. ETL Entity Mapping table
Source |
Verify |
Clean |
Transformation |
Integration |
Gathered |
Goal |
Account Customer |
X |
X |
? |
X |
X |
Customer |
Credit Clients |
X |
X |
X |
Loan Clients |
X |
? |
X |
Checking Account |
X |
X |
? |
X |
X |
Account |
Savings Account |
X |
? |
X |
Credit Accounts |
X |
? |
X |
Loan Account |
X |
X |
? |
Implementation of ETL process in DB2 Data Warehouse
Db2®universal Database™data Warehouse Editions provides improved performance and availability for data warehouse functionality. DB2 Data Warehouse Center (DWC) is a visual ETL design and implementation tool that is part of DB2 UDB. This section will see how to design and implement the warehouse ETL process using DB2 UDB (Version 8.2.1) Data Warehouse Center.