What is a data warehouse

Source: Internet
Author: User
Tags contains query
Data currently, the term Data Warehouse does not yet have a unified definition, the famous data Warehouse expert W.h.inmon in his book, "Building The Data Warehouse", gave the following description: the database Warehouse) is a theme-oriented (Subject oriented), integrated (integrate), relatively stable (non-volatile), data collection that reflects historical change (time Variant) to support management decisions. We can understand the concept of data warehouse from two levels first of all, Data Warehouse is used to support decision-making, analysis-oriented data processing, it is different from the existing operational database; Secondly, the data warehouse is an effective integration of multiple heterogeneous data sources, the integration was reorganized according to the theme, and contains historical data, And the data stored in the Data warehouse is generally no longer modified.

According to the concept of the Data Warehouse, the Data Warehouse has the following four features:
1, theme-oriented. The data organization of operation database is oriented to transaction processing task, each business system is separated from each other, and data in Data Warehouse is organized according to certain subject domain. A topic is an abstract concept that refers to the focus that users are concerned with when using a data warehouse for decision-making, and a topic is usually associated with multiple operational information systems.

2, integrated. Transactional-oriented operational databases are usually related to certain applications, and databases are independent and often heterogeneous. The data in the Data Warehouse is based on the original dispersed database data extraction and cleaning, after the system processing, summary and collation, must eliminate the inconsistency in the source data to ensure that the information in the Data warehouse is the entire enterprise consistent global information.

3, relatively stable. Data in an operational database is usually updated in real time, and the data is changed as needed. Data Warehouse data is mainly used for enterprise decision analysis, the data involved in the operation is mainly data query, once a data into the Data warehouse, under normal circumstances will be retained for a long time, that is, the data warehouse in general have a large number of query operations, but the modification and deletion operations are very small, usually only need to regularly load, refresh.

4, reflect the historical changes. The operation database is mainly concerned with the data in a certain time period, and the data in the data warehouse usually contains the historical information, and the system records the information of the enterprise from a certain point in the past, such as the time when the data warehouse is applied to the current stage, and through this information, can make quantitative analysis and forecast to the development course and future trend of the enterprise.
The construction of enterprise Data Warehouse is based on the existing enterprise business system and the accumulation of business data. Data warehousing is not a static concept, only the information in a timely manner to the users who need the information to improve their business management decisions, information can play a role, information is meaningful. The basic task of data Warehouse is to organize information and reorganize it, and provide it to the corresponding management decision personnel in time. Therefore, from the perspective of industry, Data Warehouse construction is a project, is a process.
The whole data Warehouse system is a four-level architecture, which is represented by the following figure.




Data Warehouse System Architecture
• Data Source: It is the base of data Warehouse system and the data source of the whole system. Typically includes enterprise internal information and external information. Internal information includes various business processing data and various kinds of document data stored in the RDBMS. External information including all kinds of laws and regulations, market information and competitors and so on;

• Data storage and management: The core of the entire data Warehouse system. The real key to data warehousing is the storage and management of data. The organization and management of data Warehouse determines that it is different from the traditional database, and also determines its representation of external data. To decide what products and technologies to use to build the core of the data warehouse, we need to analyze the technical features of the Data Warehouse. For the existing business system data, extraction, cleaning, and effective integration, according to the theme of the organization. Data warehouses can be divided into enterprise-level data warehouses and departmental-level data warehouses (often called data marts) according to the coverage of the data.

· OLAP server: The data to analyze the need for effective integration, according to the multidimensional model to be organized in order to carry out multi-angle, multi-level analysis, and find the trend. Its concrete realization can be divided into: ROLAP, MOLAP and HOLAP. ROLAP basic data and aggregated data are stored in an RDBMS; the basic data and aggregation data of MOLAP are stored in the multidimensional database; HOLAP basic data is stored in an RDBMS and aggregated data is stored in a multidimensional database.

• Front-End tools: includes a variety of reporting tools, query tools, data analysis tools, data mining tools, and various application development tools based on data warehousing or data marts. Data analysis tools are mainly for OLAP server, reporting tools, data mining tools mainly for data warehouse.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.