Analysis on the relationship between ODS and Edw
Liu Zhijong
(Guangzhou Institute of China Telecom Group Guangzhou 510630)
Summary
This paper focuses on the concept of enterprise operational Data Warehousing (ODS) and enterprise Data Warehouse (EDW), and the relationship between ODS and EDW, including the similarities and differences between the two, the article also describes the industry recognized ODS and EDW two different construction methods are also described separately, and gives the author thinks reasonable construction method.
1 Preface
ODS (Operational Data warehousing) and EDW (Enterprise data warehousing) are important components of China Telecom Enterprise Data architecture, which together constitute the enterprise unified data platform. In the 2007, most provincial telecom companies started the construction of ODS and Edw. The investigation found that the provincial telecom companies in the two systems in the construction process of the two systems in the enterprise data structure of the respective functions and division of the existence of certain doubts and confusion, in order to help you clarify these doubts and confusion, this article on the ODS and Edw in the entire enterprise data structure of the relationship between the detailed elaboration, This includes an analysis of the similarities between the two, and a comparison of different points. Make readers understand the relationship and difference between them in the process of comparison and analysis. At the same time, the paper also expounds and analyses the two viewpoints of how to build ODS and EDW, and gives some suggestions.
2 Enterprise Data Architecture
Edw mainly provides analytical decision service for enterprises. ODS is a useful complement and extension of EDW, which mainly implements the functions of enterprise data integration, sharing and quasi-real-time operation monitoring. The data relationship between production systems, ODS and Edw is shown in 1,
which ADB is the application database; A, B, c indicate different types of data flow: A represents the direct data exchange between application databases in the operating environment, and B represents data exchange between the application databases in the operating environment via ODS; c indicates that data is extracted from the operating environment to the analysis environment.
Operational data in the operating environment of the production system through the ETL (extraction, conversion, loading) process into the ODS, the production system between the quasi-real-time exchange of data by the ODS system, ODS system at the same time will be integrated operating environment of operating data through ETL and other means to Edw. Complete the process of operating data from the operational environment into the analysis environment.
The application database, ODS and EDW of each production system constitute the main body of the whole enterprise data architecture. The following highlights the concepts and roles of the two entities, ODS and EDW, in the enterprise data architecture.
2. 1 The concept and role of ODS
ODS stores detailed data in near real time for operational systems such as OLTP (online transaction processing) systems. The concept of ODS was first proposed by--bill Inmon, the "father of the Data Warehouse". ODS was first introduced in order to find performance requirements that meet fast loading and data consolidation, and to reduce the need for analytics to change and expand the impact of production systems, the solution is to add a data integration layer (also known as the data buffer layer) between production systems and EDW as ODS. Having the function of data integration layer is the main starting point of the concept of ODS. With the development of technology, the function and function of ODS has been extended in recent years, and the prevailing view of the industry is that ODS provides an integrated platform for the storage of the original operational data of the enterprise, and its information comes from different operational application systems. Through the data interface, the information that enters the ODS is reliable and credible under the action of data integration business rules. The integration of data in ODS and the real-time characteristics determine the following 3 main functions of ODS.
A major source of data for Edw is the clean consolidation of operational data and the improved quality of operational data. ODS provides initial cleaning, filtering, and integration of data produced by production systems, storing more detailed and comprehensive enterprise operational data, and the data in the ODS is not only of high data quality, but also more conducive to EDW data acquisition and further conversion, integration and other processing than OLTP systems. is one of the main data sources of Edw.
· Achieve near real-time reporting and querying statistical applications across systems. ODS collects data from different operational application systems. Consolidate shared transaction data across systems to form a holistic view of enterprise-class data. The largest value of ODS is the integration of cross-system data, enabling a number of cross-system reporting and query statistics applications. In addition, ODS can obtain the data it needs from Edw. After EDW statistical analysis of some of the results of data, can be provided to statistical analysts and business personnel for real-time invocation and reference.
· As a data synchronization source for other production systems. ODS captures current and recent transaction data. Data is real-time or quasi-real-time, and data in the ODS can be regularly synchronized with operational system data as needed. Because the data in the ODS is "new". It allows the data to be synchronized with the data in other production systems.
2. 2 The concept and function of EDW
Based on the unified standards and rules of the enterprise, Edw integrates and standardizes the data from different systems from inside and outside the enterprise to eliminate the non-conformance integration and standardization processing (ie, ETL processing), and forms a comprehensive and unified view of enterprise data.
EDW uses multidimensional analysis and data mining methods. Segment the market and customers, support the market management analysis, accurate decision-making and rapid response ability. Provide departmental and enterprise-based, comprehensive data and analytics services for all levels of departments and analytical decision makers. Through the EDW, it solves the problems of data scattered duplication, sharing difficulty and information island, fully exerting the value of data resources, and comprehensively improves the supporting ability of enterprises in the aspects of business decision-making, Operation Management, business development and customer service. The EDW data is subject-oriented,
Integration and non-volatile characteristics determine that EDW has the following two main functions.
· Provide timely, accurate and comprehensive data support and scientific, convenient and systematic analysis tools and methods for business decision-making and marketing at all levels, and provide data support for management, analysis and other requirements in addition to production systems, and realize the separation of business data and analytical data.
To solve the problem that the information acquisition ability and analysis decision-making means of the market and other departments can not adapt to the enterprise environment change and precise management requirements, and through various forms of themes, thematic analysis, support targeted marketing, listing information disclosure, accurate management. Effectively reduce marketing costs, reduce customer churn, look for business opportunities, to enhance the value of enterprises.
3 common points and differences between ODS and EDW
3. 1 The same point as ODS and EDW
It can be seen from the respective concepts and roles of ODS and Edw. ODS and Edw have the following in common.
· Both ODS and EDW are independent systems in the enterprise data architecture, both of which are not directly generating operational data, and the data in the two systems is extracted, transformed, Loaded (ETL) by the data of the operating environment, and further cleanup, integration and other work ( Edw data can be loaded by ODS loading).
· ODS, like Edw, has both fine-grained data. There are summarized data that are summarized according to different dimensions.
· Report class applications based on cross-system consolidation data are available on both ODS and Edw.
3. 2 The difference between ODS and EDW
Although there are some similarities between ODS and Edw. But the two are completely different entities, and the differences between the two are compared from multiple angles.
(1) Using roles
· ODS is mainly for business, channel and other frontline production personnel and frontline managers, in order to achieve quasi-real-time, cross-system operational details of data query, to obtain a fine-grained operational data presentation, such as channel personnel query customer's full view information provided by the ODS data support.
· Edw is mainly for professional analysts, auxiliary decision support staff, etc., in order to achieve statistical analysis based on historical data and data mining
To obtain deep-seated customer characteristics and market development rules, such as the analysis of the business situation of professional analysts by Edw
For support.
(2) Data sources
· Most of the operational data required by ODS is directly sourced from production systems. Some of the analysis results in ODS are derived from EDW, such as customer
Insights, and more.
· The operational data required by EDW, if already present in the ODS, is EDW directly from the ODS. Edw Required operational data, if not available on ODS, EDW directly from the production system to obtain this part of the data.
(3) Data acquisition performance and timeliness
· ODS support OLTP type of data update, data update time is short, data can achieve quasi-real-time update, performance and timeliness are higher than Edw.
· The data in the Edw is generally accessed through bulk loading, the data is slow to update, cannot achieve quasi-real-time update, the data update time is not enough to support real-time reporting and event monitoring requirements.
(4) Data architecture
The ODS organizes the data by focusing on the production-operation process, which focuses on statistics and monitoring.
ODS provides operational data statistics that provide granular operational data that is required by the application. Some coarse-grained aggregate data is also present in the ODS, but the summarized dimensions are small and simple.
Edw is concerned with the deep analysis and mining of historical data. Summarize and organize data according to different topic dimensions from the need of analysis and mining.
EDW provides historical data display and analysis, mainly provides multi-level coarse-grained summary data. The summarized dimensions are many and complex.
(5) Data sharing capability
ODS provides quasi-real-time data sharing services for operational data for other production systems.
Edw generally does not provide such quasi-real-time data sharing services for production systems. The data in the system is only used for the analysis and mining application of this system.
(6) The system provides application data query.
ODS provides data query in production environment, the transaction volume of the query is small, does not consume too much resources, has the definite completion speed. and EDW provides the analysis environment query, the query unit quantity is large, consumes the resources many, the completion speed also is uncertain.
Fixed report.
ODS provides a high-real-time production and Operation report in production environment, while EDW provides analysis and mining reports under analytical environment. Dynamic reports.
ODS provides statistics on fine-grained data for a small number of dimensions, while EDW provides the topic statistics, analysis, and deep-level mining of multi-layered coarse-grained data for multiple dimensions.
ODS provides functions such as performance management and statistics, data quality audits and monitoring management.
EDW provides features such as trend analysis, customer consumer behavior analysis and evaluation.
(7) Data storage
Key Entity data such as customers. ODS preserves current data for a long time, Edw the current and historical data.
Detailed data. The ODS is kept for 1 months to 3 months; Edw is kept for 2 years.
Summarize the data. The ODS is kept for 3 years, and the Edw is kept for 5 years.
Other data. ODS kept for L3 months; Edw kept for 3 years.
(8) Technical characteristics of the system
ODS mainly face the large number of concurrent users, small data access, Edw mainly in the face of small concurrent users, large data volume access.
ODS database optimization focuses on indexing and partitioning technology; Edw database optimization mainly focuses on partitioning technology.
ODS supports data operations of the OLTP type and OLAP (online analytical Processing) type, and EDW supports OLAP type data operations.
(9) System reliability
ODS in operation. Reliability must be ensured.
Relative to ODS. Edw can allow for more offline time.
(1O) System openness
Because of the need to exchange data with a large number of different hardware, database configuration systems. ODS requires a relatively high level of system openness.
Edw generally only gets data. Instead of providing other applications with direct access to multiple modes, a relatively closed database, hardware and software platform can also be used on the solution.
4 ODS and EDW construction programme
From the analysis and comparison of the above-mentioned ODS and Edw, the ODS and Edw are two entities which are completely different from each other. However, in the actual construction of ODS and EDW, the industry has two different voices, one of which is represented by Bill Inmon as an independent system for the construction of ODS. The other is the view that the ODS should be included in the Edw, as represented by Ralph Kimball. As part of the EDW, the building is unified in an independent system. Each of the two options is described below.
The concept of ODS was formally introduced in the book "Building Operational Data Warehousing", written by Bi11 Inmon in 1996. Inmon that analytic decisions need to be based on increasingly real-time and detailed operational data. At the same time, the data must be integrated and subject-oriented. Neither the operational system nor the data Warehouse can meet the corresponding information requirements, so the concept of ODS is put forward, and the independent ODS components are added in the whole IT support system (that is, Inmon Enterprise Information Factory). Bill Inmon proposed the construction of both structures as shown in 2.
As you can see from Figure 2. ODS data is derived from decentralized operational systems, which are integrated in separate ODS. Create subject-oriented, integrated, variable, current-value, detailed operational data in the ODS. Organize storage according to business requirements and performance requirements. and to establish appropriate applications on ODS to meet the requirements of the business. Integrated operational data from ODS is entered into EDW through the ETL process. ODS and Edw were built as two separate systems.
Another point of view, Ralph Kimball, is that in the case of technological developments. Bill Inmon considers the reasons for the existence of ODS alone (ETL limitations do not enable real-time data loading, storage of large amounts of fine-grained data, high-performance queries, and 7x24 h reliability requirements.) Increases the load on the Data warehouse. May even cause a data warehouse crash) to fail. Kimball believes that the software and hardware technologies that support EDW have been developed. Data warehousing technology for Big data storage is no longer a problem. In other words, the Data Warehouse system storage of fine-grained data is also no problem, ETL processing speed is faster, through the high-speed ETL tools can be achieved by any frequency required to extract data into the EDW, and with the development of EDW itself, Edw more and more large. The need to analyze more detailed customer behavior and more specific operational data is also growing. In most cases. Analytical mining must be based on fine-grained data, and fine-grained operational data is increasingly being used in Edw. Thus Kimball thinks in such a case. ODS is no longer necessary as a separate system. Can be seen as the "front edge" of the Data Warehouse system. He redefined ODS as a theme-oriented, integrated, EDW
The storage area of the detail data that is often extended. At the same time, Kimball that the integration of ODS into the environment of the data warehouse would provide greater convenience and benefits to the maintainers and users, including the creation of a separate extraction system. Reduce ETL development and maintenance workloads: Operational detail data is stored in a unified system. Reduce redundant storage of data, and more.
Kimball proposed the construction of both structures as shown in 3.
In the actual construction process of ODS and Edw. Both of these views have different followers. In the system architecture design has been adopted. The authors also once tended to Kimball that ODS should be considered as part of the construction of Edw, but with further research on ODS and Edw. The authors found that Kimball suggested that ODS be built as a part of the EDW, more to be considered. The two aspects of data acquisition frequency and large data volume data storage ability of single system can meet the needs of both ODS and Edw. But whether or not the ODS is built alone requires not only a single system to capture and store a large amount of operational detail data in real time. It is more to consider whether a single system can efficiently support two different types of front-end applications on both ODS and Edw. ODS and EDW need to be carried on the application is very different, in order to more effectively support two different types of applications, the system should be used by the hardware and software technology characteristics are different. If the two are based on the theory of Kimball, they are built in a system. No, absolutely not. But compared with their separate construction. Mixing two different types of work to the same system requires more resources and costs, and more difficult to guarantee service levels, so from the system's stability, performance, cost and other aspects of comprehensive consideration, in principle, the author does not recommend ODS and EDW construction in a system, two entities should be separated as two independent systems construction. However, for the small size of the data, EDW has been completed in the individual provinces, in the EDW database products, hardware equipment, data real-time and application support capabilities and other aspects can better meet the requirements of the function and performance of the ODS application support, the author believes that the ODS and EDW together in a system is also feasible.
Reference documents
1 Lnmon W H. Wang Zhihai and other translations. Data Warehouse (Original book 4th edition). Beijing:
Mechanical industry press. 2006
2 China Telecom Group. China Telecom Ctg-mboss EDA sub-total V1. o specification. 2oo5
3 Baragoin C,marini M,morgan C. Building the operational data store. DB2 UDB IBM redbook,2001
4 Kimball R. Relocating the ODS. DBMS magazine,1997 (10)
5 Lnmon B. The operational data store. 1nfoDB magazine,1995 (2)
Analysis on the relationship between ODS and EDW (reprint)