Ods-bi in the construction of ETL to occupy 1/3 of the time, deep feelings. The modeling of BI, from the physical data layer, the logical data layer, the business logic layer at all levels, there are many automated tools to handle.
However, the process in ETL must be designed according to the performance. Summarize the next few parts.
1. Data source/Data target management
To determine the table, file, or restful url,odbc of the data source, it is worth noting that when the data source is associated with an extra primary key relationship, the validity of the ETL data is greatly helpful.
Target data, possibly different topics using different databases, the addresses of individual databases, ODBC,JDBC, and other information. Target data does not need to be indexed in most cases, primary key, because
Bulk storage is used when the target database is large, which is the bulk mode, which causes the library to fail.
2.MAPPING Management
The bulk of the workload, the proposed naming method theme _ Dimension (fact) _source_target, wherein the components are also materialized eight doors, which also involves external calls (SHELL, stored procedures, etc.), with filter filter,
Type converter (Express), Listener (Lookup), dispatcher, stored procedure call, connector component (join), and so on. The use of variables and parameters in ETL is more difficult.
Some principles: try to use the WHERE clause in SQL instead of the filter. Do not use the Lookup,join component for large tables. The data source uses the bulk method, and so on. Can look at the principles of online ETL optimization.
MAPPING is the key to ETL performance and consistency.
Management of 3.WORKFLOW
Mapping completed the process of data ETL, but workflow completed is integrated into the system of the sharp weapon.
In workflow, a topic is usually a workflow, and the entire process can be set to complete the incremental extraction through timestamp settings. At the same time the process of statistical data collection, ETL status monitoring information collection. To the last operation of the message sent, or other shell calls. are in the process.
To advanced points, workflow rollback, breakpoint recovery are to be considered, the implementation is very depressed.
4. Management of the Scheduler
Time Window Control tool, the control of the time stamp of the ODS through the scheduler. Time windows are designed and adjusted to meet business peaks and business characteristics.
Some ETL workers have their own scheduler, also can be developed in other languages package scheduler, so more flexible. You can also use the job in the database to complete. Even the refresh of materialized views on the bi side can be placed in the scheduler.
5. Condition monitoring--operation and maintenance system
Ext.: http://www.cnblogs.com/jerryxing/archive/2013/02/20/2918250.html
OLAP--ODS Project Summary--process in ETL