My view of Data Warehouse (concept article)

Source: Internet
Author: User
Concept | Data 1. What is a data warehouse
The definition of the Data warehouse is given in the book "Building the Data Warehouse" by W.h.inmon, "a data structure that is subject-oriented, integrated, stable, and time-varying in data collection to support management decisions." ", said the popular point: The Data Warehouse technology is commonly known as the Distributed database plus a constraint conditions, and the formation of new data storage and processing methods.

The rules of this restriction are the focus of the formal discussion of this book.
2. Transformation from database à data Warehouse
Why do people have to use data Warehouse technology after having so many database products? The industry's various princes and so flocking. This is precisely the use of computer technology to promote the demand. The process of the great development of database technology in the year was driven by the application requirements of OLTP (on line Transaction process online business processing). The most pressing technical requirement for online business processing is rapid response. Database technology, especially based on E. F. Codd The relational theory of the database technology, the data set into a very small number of redundant entities (Entity), and then they are in a certain relationship (relationship) woven into an organic whole, more perfectly satisfied with the application requirements of OLTP. For each business process it is best to involve only one entity, and business processes add or update to the entity only involve the smallest possible space for the data media (such as record-level blocking technology), and the related updates to other entities maintain consistency and integrity through relationships. The success of the theory and technology that suited the actual needs of OLTP applications at the time has driven the world of relational database products. Oh, is not very understand, it does not matter, see more than a few times to understand, this meaning is: the data by purpose, use frequency classification storage, different applications access different data classes, still do not understand? You are so stupid!
3. Data Warehouse is an important link in Distributed system
This is a typical distributed database design diagram:


Operational data


It should be noted that the data warehouse is not a distributed system, it is a part of it, but if you understand the position of the Data Warehouse, then you know why it is called a data warehouse, not a distributed application system. The reason is very simple, the Data warehouse is the core, the other parts should be centered on it, the final result, then formed a similar structure of the solar system.

Operational data is a variety of data we get from a variety of data sources, this is the most original state of the entire system data, from which I saw my November 11 call record, and I can see what I have on 15th, the details of the bill, and even the takeoff and arrival time of the plane, If you understand what's in it, then you should be able to read the following things.

Characteristics of Operational Data:

1. Real-time: data is almost always the current value.

2. The data source is extremely rich, various from the enterprise external and internal generated data.

3. The response time is highly demanding. (You can't just wait 1 hours to add a billing record.)

So it can also be seen, even if it is let me design an operational database, it is not difficult,:-) in advance, the operation of the design of data to follow: the requirements of the architecture à-complete code à loading data.

The biggest feature of the Data Warehouse is a "stable" word, not to say how it extracts data from the operation of the data, just to say that his data update cycle is at least 24 hours, you should understand that it is impossible to do a real-time thing with its data, yes, its purpose is not to get you to do something that is real-time, It is designed to enable you to use data extracted from an operational database for analysis and statistical purposes. You know what? This is important. This is also for each of the DSS (formerly known as MIS) of the pursuit of the important work of the analyst, not to call the decision Analysis System: P

Want to know his benefits or listen to an expert opinion bar: directly using the data in the online business processing system for decision support data analysis and processing is a lot of trouble, or even can not be achieved. At this time, people will ask why the system has the data I need, but I can not use it! This is not to say that the relational database is not good, but the old product encountered a new task. E-R Type of data structure can perfect the online business processing, but not adapt to large-scale decision support data analysis, especially for enterprise-level decision support data analysis and processing needs. To adapt to this demand, the emergence of the Data Warehouse technology.

The goal of the data warehouse is to provide support information for managing decisions, which is significantly different from the fast response requirements of OLTP (online transaction processing) systems. Just like enterprises in order to develop business reorganization, in order to support the management decision needs to be in accordance with the requirements of the decision business account to reorganize the data in the OLTP system, and according to different decision-making, analysis of the content of the organization to make it easy to use. This theme based model is a multiple data reorganization structure from the user's point of view.

Before the data is loaded into the Data warehouse, the data transformation, or "consolidation" processing, should be done first. This process includes several essential operational steps to make the data complete and uniform, which ensures that the data in use in the Data Warehouse is quality-assured and is detailed thereafter. In short, integration is to ensure that the data is accurate, in place, no more than the expected range of values, no duplication and so on.

ok!, do you understand me? Is it a little complicated? It does not matter, remember a little on the line: Data Warehouse is to give statistical analysis and other work, specially designed data support, so simple, summed up:

Data Warehouse:

1. Data is not real-time, update time is longer.

2. The data source is the operation type data, after a certain pattern extraction.

3. At the time of processing, the requirements of the event are more relaxed.

Its design is relatively complex, but it is certain: the Data warehouse is according to: Data-〉 demand. This also can be understood: Ching. You prepared a lot of carrots can not be a "radish meeting" Mody! So, the purpose of DSS design is: You give me what I need, and I'll tell you what I want. It doesn't sound very awkward. :)

So far, you should have a comprehensive and superficial understanding of the data Warehouse, hehe, the next step should be carefully delve into the great building the "Data Warehouse."

Ma Lei Wednesday, November 29, 2000


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.