Technical Architecture Design of the data warehouse system
Author: Cheng Xiaoxu
Statement]
Because the technical solutions described in this article are derived from online production systems, they will not provide complete and detailed technical solutions, nor provide source code. You are welcome to discuss technical solutions and exchange design experiences.
The main function of the data warehouse system is to collect related business data from many external systems and store them in the system database. The system stores all raw data in the basic database of the Data Warehouse after a series of processing and conversion. Then, a series of data needs to be converted to the corresponding data mart through the business, for other upper-layer data application components to perform special analysis.
Therefore, the internal technical components of the system are abstracted from the perspectives of data collection, conversion, storage, analysis, and application.Collection layer, processing layer, application layer, and configuration layerFour logical layers. The technical architecture of the data analysis system is as follows:
650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/1U5413446-0.gif "alt =" "style =" border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: none; border-width: initial; border-color: initial; "/>
Main applications at the collection LayerInterface adapter, network file collection and converter, ESB BusAnd other technologies.
The interface adapter technology is used to adapt to the data collection interfaces of various professional systems and shield the differences between different professional interfaces through the interface adapter;
Network File collection and converter, used to collect file data from various professional systems, mainly for various network data files in specific formats, is responsible for formatting, standardization, and other preprocessing of data formats;
The ESB bus collects data from standard WebService/JMS interfaces.
Main applications at the Processing LayerData Persistence, J2EE, BIAnd other technologies.
Data Persistence mainly uses relational database technology and file database technology. Relational Database technology is used to store business data such as system collection, processing, and computing, as well as various operation parameter configurations and other operation process information of the system. It is a file server, provides data storage and access services for all types of file-type data;
The J2EE technology is a container of all B/S logic components, applications, and Web Services in the system. It provides runtime environments for various B/S components and controls their lifecycles;
BI metadata management provides configuration management for data aggregation, multi-dimensional analysis, data loading, data production, data items, calculation rules, and display methods of various dynamic statistical analysis reports.
Main applications for human-computer interaction at the application layerBI and WebAnd other technologies.
Web technology provides system B/S interface applications, including HTML, CSS, JS, Ajax, etc;
BI technology provides a wide range of presentation methods for various types of dynamic statistical analysis results, such as reports, graphics, dashboards, and so on. The ESB Bus Technology for system interaction, publish WebService calls in the form of a data bus, or issue JMS data, and provide service data sharing services for external systems through a unified data publishing system. Implemented using third-party standard middleware Products, such as SAP BOEBusinessObjects Enterprise, IBM Cognos, MSTR MicroStrategy, and other mainstream BI middleware Products.
Main applications at the configuration layerETL visualization and configurable platform, rule engine, and workflow engineAnd other technologies.
ETL visual configuration technology is used to provide visualized and configurable ETL data extraction, conversion, and loading processing, it mainly targets database data, FTP standard file format data, or intermediate data collected through network files and preprocessed by the converter;
The rule engine is integrated with various adaptation and collectors to provide centralized management of configuration data, including configuration management of interface file collection directories, file name rules, data item conventions, and ETL Data Extraction Rules;
The workflow engine provides a process-based Configuration Wizard for the entire process, including data collection interface configuration, data collection items, ETL parameter configuration, metric Association ing, and bi parameter Association ing.
This article is from the IT bystanders blog, please be sure to keep this source http://cxxsoft.blog.51cto.com/1350418/1083571