Reprinted: "boiled" Data Warehouse

Source: Internet
Author: User

Just a few days ago, a user participated in a job interview for an enterprise. He applied for the DBA of the company and was responsible for data analysis. This company successfully completed the process. Until the person in charge of the company. The owner only gave him an interview question: Let's talk about the differences between the database and the data warehouse you understand.

This public suddenly dumbfounded. As far as technology is concerned, no matter how difficult the technology is, it is hard to find him. This open-ended problem is his weakness. After Yi Yu returned, he asked questions on the forum. It seems that our DBA is highly technical but lacks theoretical literacy. Although there are some imaginary things in the theory, it can guide the practice and constantly refine, generalize, and supplement its own theory. Theory cannot exist without practice, and practice cannot do without theory. DBAs, imagine how painful it would be if one day you find that the bottleneck restricting your progress is not your practical experience, but your theoretical accumulation!

Next, we will "boil" a data warehouse together.

Warehouse
Bytes
C ā ng
<Name>
(Pictogram. The Oracle font is like a Geer, and the middle is like a door. below is the entrance and exit, which together represents the concept of a warehouse. Meaning: granary)
Same as the original [granary; barn]
Warehouse, warehouse, and warehouse. -- Text
Warehouse knows etiquette. -- Han Jia Yi
The Tibetan Emperor collects data from the godwarehouse. -- Liji-monthly order. Note: "The Valley of Tibetan sacrifices is a godwarehouse ."
Another example is the barn, the grain storage, and the grain storage)
A building [warehouse; storehouse] for storing materials. Such as: salt warehouse; add warehouse; fill warehouse; warehouse (where money and food are stored)

In short, databases are designed for transactions and data warehouses are designed for topics.

Databases generally store online transaction data, while data warehouses generally store historical data.

Database Design is designed to avoid redundancy as much as possible. Generally, it is designed to comply with the rules of the paradigm. Data Warehouse design is intended to introduce redundancy and adopt an anti-paradigm design.

A database is designed to capture data. A data warehouse is designed to analyze data. Its two basic elements are dimension tables and fact tables. Dimensions are the definitions of these things, such as time, department, and dimension tables. The fact table contains the data to be queried and the dimension ID.

In terms of concept, it is a bit obscure. Any technology serves applications, which can be easily understood in combination with applications. Take banking as an example. The database is the data platform of the transaction system. Every transaction made by the customer in the bank will be written into the database and recorded. Here, we can simply understand it as using database accounting. A data warehouse is a data platform for analysis systems. It obtains data from the transaction system and summarizes and processes the data to provide decision-making basis for decision makers. For example, the current deposit balance of a bank's branch is what happens in a month. If there are more deposits and more consumption transactions, it is necessary to set up an ATM in the region.

Apparently, the transaction volume of a bank is huge, usually measured in millions or even tens of millions of times. The transaction system is real-time, which requires timeliness. It takes tens of seconds for the customer to save a sum of money, which requires the database to store data for a short period of time. The analysis system is post-event. It must provide all valid data within the specified time period. The data is massive, and the aggregation and calculation are slower. However, as long as the data can be effectively analyzed, the goal is achieved.

A data warehouse is generated in order to further explore data resources and make decisions when a large number of databases exist. It is by no means a "large database ". What are the differences between data warehouses and traditional databases? Let's take a look at the definition of data warehouse by W. H. inmon: a topic-oriented, integrated, time-related, and unchangeable data set.

"Theme-oriented": traditional databases mainly process data for applications and may not store data based on the same topic. Data Warehouses focus on data analysis and are stored Based on topics. This is similar to the difference between a traditional farmer's market and a supermarket-cabbage, radish, and coriander are sold at a stall if they are sold at a small price, cabbage, radish, and coriander are separated. That is to say, the food (data) in the market is collected (stored) by vendors (applications), while the supermarket stores food by type (with the same subject.

"Time-related": When the database saves information, it does not emphasize that there must be time information. The data warehouse is different. for decision-making purposes, the data in the data warehouse must indicate the time attribute. In decision making, the time attribute is very important. They are also customers who have purchased Nine-car products. One is that they have bought nine-car products in the last three months, and the other is that they have never bought nine-car products in the last year. This is different for decision makers.

"Unchangeable": the data in the data warehouse is not up-to-date, but comes from other data sources. The data warehouse reflects historical information, rather than the daily transaction data processed by many databases (some databases, such as the telecom billing database and even real-time information processing ). Therefore, the data in the data warehouse is rarely or never modified. Of course, adding data to the Data Warehouse is allowed.

The emergence of data warehouses is not to replace databases. Currently, most data warehouses are managed by relational database management systems. Databases and Data Warehouses complement each other.

In addition, the purpose of the data warehouse solution is to serve as the basis for front-end query and analysis. Due to the large redundancy, the storage required is also large. To better serve front-end applications, the data warehouse must have the following advantages; otherwise, it is a failed data warehouse solution.

1. High Efficiency. The analysis data requested by the customer is generally divided into days, weeks, months, quarters, and years. It can be seen that the daily data for the cycle requires the highest efficiency and requires 24 hours or even 12 hours, the customer can see yesterday's data analysis. Some enterprises often encounter problems with poorly designed data warehouses because of their large daily data volume. data can only be provided 1-3 days later. Obviously, this is not acceptable.

2. Data quality. The customer must view all kinds of information and obtain accurate data. However, because the data warehouse process is divided into at least three steps and two ETL operations, the complex architecture will be more hierarchical, if the data source contains dirty data or the code is not rigorous, the data may be distorted. If the customer sees the wrong information, it may cause the wrong decision analysis, resulting in losses, rather than benefits.

3. scalability. The architecture design of some large data warehouse systems is complicated because of the scalability in the next 3-5 years. In this way, customers do not have to spend too much money to reconstruct the data warehouse system, so they can run stably. It is mainly reflected in the rationality of data modeling. There are more middle layers in the data warehouse solution, so that there is enough buffer for massive data streams, so that the data cannot run as long as the data volume is large.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.