Databases and Data Warehouses

Source: Internet
Author: User

It cannot be simply said that the data warehouse is an extended version of the database, which is generated for different services. The traditional database is used for online transaction processing (OLTP), mainly for application and business processing. Data Warehouses are mainly used for Online Analytical Processing (OLAP. I personally feel that there is no difference in size, but usually the Data Warehouse is larger than the database because the data warehouse is formed by integrating different Heterogeneous Databases.

In short,The database is a transaction-oriented design (ProgramData Warehouse is designed for topics (statistical applications).

Databases generally store online transaction data, while data warehouses generally store historical data.

Database Design is designed to avoid redundancy as much as possible. Generally, it is designed to comply with the rules of the paradigm. Data Warehouse design is intended to introduce redundancy and adopt an anti-paradigm design.

A database is designed to capture data. A data warehouse is designed to analyze data. Its two basic elements are dimension tables and fact tables. Dimensions are the definitions of these things, such as time, department, and dimension tables. The fact table contains the data to be queried and the dimension ID.

In terms of concept, it is a bit obscure. Any technology serves applications, which can be easily understood in combination with applications. Take banking as an example. The database is the data platform of the transaction system. Every transaction made by the customer in the bank will be written into the database and recorded. Here, we can simply understand it as using database accounting. A data warehouse is a data platform for analysis systems. It obtains data from the transaction system and summarizes and processes the data to provide decision-making basis for decision makers. For example, the current deposit balance of a bank's branch is what happens in a month. If there are more deposits and more consumption transactions, it is necessary to set up an ATM in the region.

Apparently, the transaction volume of a bank is huge, usually measured in millions or even tens of millions of times. The transaction system is real-time, which requires timeliness. It takes tens of seconds for the customer to save a sum of money, which requires the database to store data for a short period of time. The analysis system is post-event. It must provide all valid data within the specified time period. The data is massive, and the aggregation and calculation are slower. However, as long as the data can be effectively analyzed, the goal is achieved.

A data warehouse is generated in order to further explore data resources and make decisions when a large number of databases exist. It is by no means a "large database ". What are the differences between data warehouses and traditional databases? Let's take a look at the definition of data warehouse by W. H. inmon: a topic-oriented, integrated, time-related, and unchangeable data set.

"Theme-oriented": traditional databases mainly process data for applications and may not store data based on the same topic. Data Warehouses focus on data analysis and are stored Based on topics. This,Similar to the difference between a traditional farmer's market and a supermarket, cabbage, radish, and coriander are sold at a stall, cabbage, radish, and coriander are separated. That is to say, the food (data) in the market is collected (stored) by vendors (applications), while the supermarket stores food by type (with the same subject.

"Time-related": When the database saves information, it does not emphasize that there must be time information. The data warehouse is different. for decision-making purposes, the data in the data warehouse must indicate the time attribute. In decision making, the time attribute is very important. They are also customers who have purchased Nine-car products. One is that they have bought nine-car products in the last three months, and the other is that they have never bought nine-car products in the last year. This is different for decision makers.

"Unchangeable": the data in the data warehouse is not up-to-date, but comes from other data sources. The data warehouse reflects historical information, rather than the daily transaction data processed by many databases (some databases, such as the telecom billing database and even real-time information processing ). Therefore, the data in the data warehouse is rarely or never modified. Of course, adding data to the Data Warehouse is allowed.

The emergence of data warehouses is not to replace databases. Currently, most data warehouses are managed by relational database management systems. Databases and Data Warehouses complement each other.

In addition, the purpose of the data warehouse solution is to serve as the basis for front-end query and analysis. Due to the large redundancy, the storage required is also large. To better serve front-end applications, the data warehouse must have the following advantages; otherwise, it is a failed data warehouse solution.

1. High Efficiency. The analysis data requested by the customer is generally divided into days, weeks, months, quarters, and years. It can be seen that the daily data for the cycle requires the highest efficiency and requires 24 hours or even 12 hours, the customer can see yesterday's data analysis. Some enterprises often encounter problems with poorly designed data warehouses because of their large daily data volume. data can only be provided 1-3 days later. Obviously, this is not acceptable.

2. Data quality. The customer needs to view all kinds of information and must obtain accurate data. However, because the data warehouse process is divided into at least three steps and two ETL operations, the complex architecture will be more hierarchical, so the data source has dirty data orCodeNot rigorous, it can lead to data distortion. If the customer sees the wrong information, it may cause analysis of wrong decisions, resulting in losses, rather than benefits.

3. scalability. The architecture design of some large data warehouse systems is complicated because of the scalability in the next 3-5 years. In this way, customers do not have to spend too much money to reconstruct the data warehouse system, so they can run stably. It is mainly reflected in the rationality of data modeling. There are more middle layers in the data warehouse solution, so that there is enough buffer for massive data streams, so that the data cannot run as long as the data volume is large.

This article from the csdn blog, reproduced please indicate the source:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.