OLAP--ODS Project Summary--process in ETL

Source: Internet
Author: User

Ods-bi in the construction of ETL to occupy 1/3 of the time, deep feelings. The modeling of BI, from the physical data layer, the logical data layer, the business logic layer at all levels, there are many automated tools to handle.

However, the process in ETL must be designed according to the performance. Summarize the next few parts.

1. Data source/Data target management

To determine the table, file, or restful url,odbc of the data source, it is worth noting that when the data source is associated with an extra primary key relationship, the validity of the ETL data is greatly helpful.

Target data, possibly different topics using different databases, the addresses of individual databases, ODBC,JDBC, and other information. Target data does not need to be indexed in most cases, primary key, because

Bulk storage is used when the target database is large, which is the bulk mode, which causes the library to fail.

2.MAPPING Management

The bulk of the workload, the proposed naming method theme _ Dimension (fact) _source_target, wherein the components are also materialized eight doors, which also involves external calls (SHELL, stored procedures, etc.), with filter filter,

Type converter (Express), Listener (Lookup), dispatcher, stored procedure call, connector component (join), and so on. The use of variables and parameters in ETL is more difficult.

Some principles: try to use the WHERE clause in SQL instead of the filter. Do not use the Lookup,join component for large tables. The data source uses the bulk method, and so on. Can look at the principles of online ETL optimization.

MAPPING is the key to ETL performance and consistency.

Management of 3.WORKFLOW

Mapping completed the process of data ETL, but workflow completed is integrated into the system of the sharp weapon.

In workflow, a topic is usually a workflow, and the entire process can be set to complete the incremental extraction through timestamp settings. At the same time the process of statistical data collection, ETL status monitoring information collection. To the last operation of the message sent, or other shell calls. are in the process.

To advanced points, workflow rollback, breakpoint recovery are to be considered, the implementation is very depressed.

4. Management of the Scheduler

Time Window Control tool, the control of the time stamp of the ODS through the scheduler. Time windows are designed and adjusted to meet business peaks and business characteristics.

Some ETL workers have their own scheduler, also can be developed in other languages package scheduler, so more flexible. You can also use the job in the database to complete. Even the refresh of materialized views on the bi side can be placed in the scheduler.

5. Condition monitoring--operation and maintenance system

Ext.: http://www.cnblogs.com/jerryxing/archive/2013/02/20/2918250.html

OLAP--ODS Project Summary--process in ETL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.