Data warehouse-ETL

Source: Internet
Author: User
Tags ssis

ETL concepts

The three ETL letters represent extract, transform, and load, namely, extraction, conversion, and loading.

(1) Data Extraction: extract the data required by the target data source system from the source data source system;

(2) Data Conversion: Convert the data obtained from the source data source into the form required by the target data source according to business requirements, and clean and process the error and inconsistency data.

(3) data loading: load the converted data to the target data source.

Originally, ETL is a part of building a data warehouse. It extracts data from distributed and heterogeneous data sources, such as relational data and flat data files, to a temporary middle layer for cleaning, conversion, and integration, finally, it is loaded into a data warehouse or a data set to form the foundation of online analysis and processing and data mining. Nowadays, ETL is increasingly applied to data migration, exchange and synchronization in general information systems.

2. Key Technologies in ETL

The main process of ETL is data extraction, data conversion and processing, and data loading. To implement these functions, various ETL tools generally expand functions, such as workflows, scheduling engines, rule engines, script support, and statistics.

 

3. Mainstream ETL tools

There are two types of ETL tools from the vendor perspective. One is the ETL tools provided by the database vendor, such as Oracle warehouse builder and ORACLE data integrator. Third-party tool providers, such as kettle. The open-source world also has many ETL tools with different functions and different strengths and weaknesses.

(1) ORACLE data integrator (ODI)

Odi, formerly sunopsis active integration platform, was acquired by Oracle at the end of 2006 and renamed ORACLE data integrator. It is mainly used in ETL and data integration scenarios. ODI and Oracle's original ETL Tool owb have some notable features, such as the same as owb's ELT architecture, but it supports more heterogeneous data sources than owb, odi provides a call web service mechanism, and ODI interfaces can also be exposed as web services to interact with the SOA environment. Odi can detect events. An event can trigger an interface process of Odi to complete almost real-time data integration.

Odi has the following features:

A. Use CDC as the way to Change Data Capture.

B. The proxy supports parallel processing and Server Load balancer.

C. Complete permission control and version management functions.

D. Support data quality check, cleaning and recycling of dirty data.

E. Support integration with JMS message-oriented middleware.

F. supports Web Services.

(2) SQL server integration services (SSIS)

SSIS is a new member of SQL Server 2005. In the early versions of SQL Server, it has already taken shape. The name at that time is data conversion Service (DTS ). In the first two versions of SQL Server 2005, SQL Server 7.0 and SQL Server 2000, DTS mainly focuses on extraction and loading. By using DTS, you can extract data from any data source and load the data to any data source. In SQL Server 2005, DTS was redesigned and improved to form SSIs. SSIS provides data-related infrastructure such as control flow, data flow, logs, variables, events, and connection management. A control flow is also called a workflow or a task flow. It is more like a workflow. In a workflow, each component is a task. These tasks are executed in a predefined order. You can have branches in the task flow. The execution result of the current task determines which branch to move forward. Data flow is a new concept. Data streams are also called pipelines to solve data conversion problems. A data stream consists of a set of predefined conversion operations. The starting point of a data stream is usually a data source (source table). The ending point of a data stream is usually the destination (target table) of the data ). The execution of data streams can be considered as a pipeline process. In this process, each line of data is a part to be processed in the assembly line, and each transformation is a processing unit in the assembly line.

 

Data warehouse-ETL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.