Comparative analysis---Relational model and multidimensional model of Data Warehouse database design method

Source: Internet
Author: User

There are two kinds of database design models widely used in Data warehouse: Relational type and multidimensional type. It is generally believed that the relational model in the design method of Data Warehouse is "Inmon" method and multidimensional model is "Kimball" method.

First look at the relational model, where relational data exists in a form called "normalization". Data normalization refers to the fact that database design decomposes data into very low granularity, and that standardized data exists in an isolated pattern, which requires strict data relations in the data table. Generally follows the 3NF paradigm. A database with relational design generally has a strong flexibility and versatility (can support multiple views of the data).

Looking at multidimensional models, multi-dimensional models typically have star, Snowflake, and promiscuous (also called galaxy) modes. The greatest advantage of multidimensional model design is the efficiency of access.

The difference between the two types of models

As the basis of data warehouse design, there are many differences between star-shaped connection and relational structure. The most important difference is in terms of flexibility and performance. The relational model is highly flexible, but it is not ideal for users in terms of performance. Multidimensional models are highly efficient in meeting user needs, but are not flexible.

Another important difference is that the scope of the design is different. Inevitably, multidimensional design can only be done within a limited range, meaning that database design can only be optimized under a set of requests. If all the different group requests are added to the design, optimization becomes meaningless.

There is no specific optimization approach to performance when using the relational model. Since the relational model requires data to be stored at the lowest level of granularity, new data can be added indefinitely. Obviously, adding data to a relational model will never stop. Because of this, relational schemas are suitable for large-scale data (such as an enterprise model), whereas multidimensional models are suitable for small-scale data such as a department or even a sub-department.

origin of the differences :

The relational environment is designed by the origin data model. Multidimensional models are modeled on the request of the end user. In other words, the relational model is designed through a pure data model and other patterns, while multidimensional models are shaped by processing requests.

In terms of applicability: Because the relational model is formed through abstract data, the model itself is very flexible. However, this flexibility is not optimized for direct data access execution. If you want a high-performance relational model, the best way to do this is to extract the data from the model and reconstruct a pattern that is appropriate for fast access.

Multidimensional models are fast and efficient in accessing data directly. From an architectural point of view, the relational model in the Data Warehouse Design Foundation is a better model for supporting the Data warehouse, because the data warehouse needs to support many different groups of users based on different agendas and multiple observational data. In other words, the data warehouse is not optimal for accessing a given user. Instead, the data warehouse can support multiple different users in a number of ways.

Relational patterns, data is stored in the lowest-granularity and standardized form; Relationships between relational tables are already defined and contain a key table with foreign keys, and new tables can define new summary and filter criteria for the basic datasets in the relational table; that is, it is simple to create a relational table in one form. Re-shape these tables in another form, which is ideal for a data warehouse environment.

In addition, the relationship model supports the future of unknown requirements, supporting the needs of moderate change with the multi-dimensional model incomparable advantages.

So as you can see from the reasons discussed above , relational models are ideal for data warehouses, and star joins are best for data marts.  

the difference between the independent bazaar and the subordinate bazaar :

A stand-alone bazaar is a data mart created directly from a historical application. Establishing an independent data mart does not require a "holistic thinking" consideration.

Corresponding to the stand-alone data mart is the subordinate data mart. Dependent data marts are built using data from the Data warehouse. Its data source is not dependent on historical data or operational data, but only on data warehouses. In summary, the Dependent Data mart requires pre-planning, long-term observation, global analysis, and collaboration and coordination of demand analysis across different sectors of the enterprise.

After establishing multiple independent data marts, users will soon find that the information between the data marts is not uniform and not synchronized, and that each additional data mart will have an increasing amount of detail data redundancy, requiring a lot of resources to establish the interface program, and maintaining these programs becomes a burden. Therefore, the Independent data mart is not suitable for solving information problems in the enterprise.

Of course, if the enterprise adopts a subordinate data mart and creates a data warehouse before any data mart is established, then what architectural aspects inherent in the independent data mart will not appear.

In other words, an independent data mart represents a short-term, limited-scope solution that does not need to take into account global and panorama. On the other hand, the Subordinate data Mart requires a long-term and global outlook. But independent data marts do not provide a solid foundation for enterprise information, and subordinate data marts can provide a real long-term basis for information decisions.

Summary: The database design in the Data Warehouse is recommended to adopt the relational pattern design method, and the data Mart recommends adopting the multidimensional model design method, in which the Data Mart recommends adopting the subordinate Type Data mart construction method.

Comparative analysis---Relational model and multidimensional model of Data Warehouse database design method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.