Differences between databases and Data Warehouses

Source: Internet
Author: User

Databases are designed for transactions and data warehouses are designed for topics. Databases generally store online transaction data, while data warehouses generally store historical data.

Database Design is designed to avoid redundancy as much as possible. Generally, it is designed to comply with the rules of the paradigm. Data Warehouse design is intended to introduce redundancy and adopt an anti-paradigm design.

A database is designed to capture data. A data warehouse is designed to analyze data. Its two basic elements are dimension tables and fact tables. Dimensions are the definitions of these things, such as time, department, and dimension tables. The fact table contains the data to be queried and the dimension ID)
A data warehouse is generated in order to further explore data resources and make decisions when a large number of databases exist. It is by no means a "large database ". What are the differences between data warehouses and traditional databases? Let's take a look at the definition of data warehouse by W. H. Inmon: a topic-oriented, integrated, time-related, and unchangeable data set.

"Theme-oriented": traditional databases mainly process data for applications and may not store data based on the same topic. Data Warehouses focus on data analysis and are stored Based on topics. This is similar to the difference between a traditional farmer's market and a supermarket-cabbage, radish, and coriander are sold at a stall if they are sold at a small price, cabbage, radish, and coriander are separated. That is to say, the Food Data in the market is stored in the heap according to the hawker application), while the supermarket stores the food in the same category as the topic.

"Time-related": When the database saves information, it does not emphasize that there must be time information. The data warehouse is different. for decision-making purposes, the data in the data warehouse must indicate the time attribute. In decision making, the time attribute is very important. They are also customers who have purchased Nine-car products. One is that they have bought nine-car products in the last three months, and the other is that they have never bought nine-car products in the last year. This is different for decision makers.

"Unchangeable": the data in the data warehouse is not up-to-date, but comes from other data sources. The data warehouse reflects historical information, which is not the type of daily transaction data processed by many databases, such as the telecom billing database or even real-time information ). Therefore, the data in the data warehouse is rarely or never modified. Of course, adding data to the Data Warehouse is allowed.

The emergence of data warehouses is not to replace databases. Currently, most data warehouses are managed by relational database management systems. It can be said that databases and data warehouses complement each other and each has its own merits

In addition, the purpose of the data warehouse solution is to serve as the basis for front-end query and analysis. Due to the large redundancy, the storage required is also large. To better serve front-end applications, the data warehouse must have the following advantages; otherwise, it is a failed data warehouse solution.

1. High Efficiency. The analysis data requested by the customer is generally divided into days, weeks, months, quarters, and years. It can be seen that the daily data for the cycle requires the highest efficiency and requires 24 hours or even 12 hours, the customer can see yesterday's data analysis. Some enterprises often encounter problems with poorly designed data warehouses because of their large daily data volume. data can only be provided 1-3 days later. Obviously, this is not acceptable.

2. Data quality. The customer must view all kinds of information and obtain accurate data. However, because the data warehouse process is divided into at least three steps and two ETL operations, the complex architecture will be more hierarchical, if the data source contains dirty data or the code is not rigorous, the data may be distorted. If the customer sees the wrong information, it may cause the wrong decision analysis, resulting in losses, rather than benefits.

3. scalability. The architecture design of some large data warehouse systems is complicated because of the scalability in the next 3-5 years. In this way, customers do not have to spend too much money to reconstruct the data warehouse system, so they can run stably. It is mainly reflected in the rationality of data modeling. There are more middle layers in the data warehouse solution, so that there is enough buffer for massive data streams, so that the data cannot run as long as the data volume is large.

(

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.