Flexible and effective Data Warehouse solution, part 3rd: Design and implement warehouse ETL process

Source: Internet
Author: User
Tags db2 implement

Brief introduction

Data integration is a key concept in the Data warehouse. The design and implementation of the ETL (data extraction, transformation and loading) process is an extremely important part of the Data Warehouse solution. ETL processes are used to extract business data from multiple sources, clean up data, then integrate the data, and load them into the Data Warehouse database to prepare for data analysis.

ETL Process Design

Although the actual ETL design and implementation depends to a great extent on the ETL tools selected for the Data Warehouse project, the advanced systematic ETL design will help to build an efficient and flexible ETL process.

Before delving into the design of the ETL process in the Data Warehouse, remember the rule of ETL: "The ETL process should not modify the data, but should optimize the data." "If you find that you need to make changes to your business data, but you're not sure if these changes will change the meaning of the data itself, consult your customer before starting the ETL process."

Modulation of ETL process design

Because of its procedural characteristics and the possibility of hundreds of or thousands of operations, it is extremely important to design the ETL process in an accurate manner so that they become efficient, scalable, and maintainable. ETL Data conversion operations can be roughly divided into 6 groups or modules: data extraction, verification, cleaning, integration, aggregation and loading. To schedule these groups, follow the steps to get the most simplified, best performance, and easy to modify logical order for this process. The order of execution is shown in the following figure.

Figure 1. Functional module design of ETL data conversion process

In the business requirements and data analysis phases of the project, we created the data mapping information. There are many ways to record data maps; ETL Data mapping table is the best way to guide ETL process design. You can also use this table as a way to communicate with business customers about data mapping and ETL process issues. ETL data map tables have different levels, such as entity level and attribute level. Detailed data mapping information at different levels is in each level. The following table is a simplified example of an entity-level ETL data mapping table. Each "X" in the table represents a link to the operation details or the lower-level data map document.

Table 1. ETL Entity Mapping table

Source Verify Clean Transformation Integration Gathered Goal
Account Customer X X ? X X Customer
Credit Clients X X X
Loan Clients X ? X
Checking Account X X ? X X Account
Savings Account X ? X
Credit Accounts X ? X
Loan Account X X ?

Implementation of ETL process in DB2 Data Warehouse

Db2®universal Database™data Warehouse Editions provides improved performance and availability for data warehouse functionality. DB2 Data Warehouse Center (DWC) is a visual ETL design and implementation tool that is part of DB2 UDB. This section will see how to design and implement the warehouse ETL process using DB2 UDB (Version 8.2.1) Data Warehouse Center.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.