1. Concept of Data Warehouse
Authoritative data warehouse field W. h. inmon provides a brief and comprehensive definition of a data warehouse: a data warehouse is a collection of theme-oriented, integrated, time-varying, and non-loss-prone data. It is a decision-making process supporting management departments. According to this definition, the Data Warehouse has the following four key features:
1.1 Subject Oriented data set
Data Warehouses are generally organized around topics such as products, vendors, and consumers. Data Warehouses focus on data modeling and analysis by decision makers, rather than processing daily operations and transactions. Therefore, the data warehouse provides a concise view of a specific topic, eliminating data that is useless for decision making.
1.2 Integrated data set
A data warehouse is usually composed of multiple heterogeneous data sources, which may include relational databases, object-oriented databases, text databases, Web databases, and general files.
1.3 Time Variant data set
Data storage provides information from a historical perspective. A data warehouse contains time elements, and the information it provides is always associated with time. Data in a data mining warehouse stores data in a time period, not just data at a certain time point.
1.4 Nonvolatile Data Set
Data Warehouses are physically separated from real-time application data in the operating environment. Therefore, no transaction processing, recovery, and concurrency control mechanisms are required. Data in a data warehouse usually requires only two types of operations: initial loading and data access. Therefore, the data is relatively stable and rarely or never updated.
To sum up, a data warehouse is a semantic consistent data storage that acts as a physical implementation of decision-making support data models and stores the information required for strategic decision-making. A data warehouse is also often considered as an architecture. By integrating data from different data sources, it supports structured and specialized query, analysis report, and decision making.
2. Data Warehouse type
The data warehouse type can be divided into the following three types based on the data types managed by the Data Warehouse and the scope of the enterprise problems they solve: Enterprise Data Warehouse (EDW), operational database (ODS) and Data marketplace (DataMart ).
① An enterprise data warehouse is a general data warehouse. It contains both a large amount of detailed data and a large amount of cumbersome or aggregated data, which is not easy to change and oriented to history. This type of data warehouse is used for Culvert
Create strategic or tactical decisions in multiple enterprise fields. ② Operational databases can be used for decision-making and support for work data, and can also be used as a transitional area when data is loaded to a data warehouse. Compared with EDW, ODS has the following features: ODS is subject-oriented and comprehensive; ODS is changeable; ODS only contains the current and detailed data, it does not include accumulative and historical data. ③ Data marketplace is a type of Data Warehouse. It can contain lightly accumulated and historical Department data and is suitable for the needs of a department in a specific enterprise. Several data marketplaces can form an EDW (which will be highlighted later ). With the development of data warehouses, software tools are upgraded rapidly and new products are emerging. To facilitate tracking of its technological development and better selection of relevant tools, the Data Warehouse constructor should collect extensive files and data in this area to make the best choice.
3. Comparison between data warehouses and traditional databases
Traditional relational databases (RDB) follow the same relational model. Data (Records) are stored in tables and can be used in a unified Structured Query Language (SQL) therefore, its application is often called online transaction processing (OLTP ),
The key is to complete business processing and respond to customers in a timely manner. Relational databases can process large databases, but they cannot be simply stacked and used directly as data warehouses. The data warehouse mainly works
Multi-dimensional data is also called a multi-dimensional database. Multi-dimensional database data is stored in arrays without uniform rules or uniform multi-dimensional models. It can only be classified by category. Yiying
In terms of use, multidimensional databases should have strong query capabilities. multidimensional databases store a wide range of information, but because they complete online transaction analysis (OLAP ), therefore, we do not pursue instantaneous response time. Therefore, we will be recognized by the response in a limited amount of time. In fact, OLAP packages
Interactive Data Queries are accompanied by multiple analysis methods, such as drill-down or successful drill-down to the bottom-layer details. Therefore, the information in the data warehouse can still be expressed in a specific table even though it is multidimensional.
Although there is such a big difference between the data warehouse and the traditional database, the design of the data warehouse is not completely different, but the existing traditional data processing can be used to integrate the information from it, to construct
Data warehouse with different requirements. That is, data flows from dynamic and event-driven traditional work data to static and historical data warehouses. Theoretically, it is possible to introduce expired data strategically from work data.
This transformation is actually impossible because of the actual storage capacity and technical limitations. Therefore, data must be separated from the work data and filtered into the data warehouse. In view of the above factors
To ensure OLAP performance, data warehouses must be separated from traditional work data.