Data Warehouse Introduction (VII)-Star model and snowflake model

Source: Internet
Author: User
Tags advantage


Multidimensional data modeling organizes data in an intuitive way and supports high-performance data access. Each multidimensional data model is represented by multiple multidimensional data patterns, and each multidimensional data pattern consists of a fact table and a set of dimension tables. The most common form of multidimensional models is the star pattern. In star mode, the fact table is centered, and multiple dimension tables are radiated in their four weeks and connected to the fact table. On the basis of star type, the snowflake pattern is developed, and the characteristics of the two are compared.

One, star-shaped model

The entity in the Star Center is the indicator entity, which is the center of the basic entity and query activity that the user is most concerned about, and provides quantitative data for the query activity of the Data Warehouse. Each indicator entity represents a series of related facts and completes a specified function. The entity on the star corner of the star is a dimension entity that restricts the scope of access by restricting the user's query results and filtering the data so that fewer rows are returned from the Indicator entity query. Each dimension table has its own attributes, and the dimension tables and fact tables are associated with keywords.

Although a star pattern is a relational model, it is not a normalized model. In Star mode, the dimension table is deliberately non-normalized, which is the basic difference between the star pattern and the model of the OLTP system.

There are two main reasons for using star mode: To improve the efficiency of queries. The advantage of using star pattern design Data Warehouse is that the data organization has been preprocessed, the main data are in the large fact table, so as long as the fact table is scanned, it can be queried without having to join a large number of tables and query access efficiency is higher. At the same time, because the dimension table is generally very small, even can be placed in the cache, and the fact table to connect with the faster speed; For non-computer professional users, star mode is more intuitive, through the analysis of Star mode, it is easy to combine a variety of queries.

Conclusion:

1> of non-formal;
2> each dimension in a cube is connected to the fact table (via primary key and foreign key);
3> does not exist in the gradient dimension;
4> has redundant data;
5> query efficiency may be relatively high;
6> does not need to consider the regularization factor too much, design maintenance is relatively simple.

second, snowflake model

In practice, with the addition and change of the fact table and the dimension table, the star pattern produces many derivative modes, including Galaxy pattern, constellation pattern, two-level dimension table and snowflake pattern.

Snowflake mode is a further hierarchy of Star schema dimension table, and some dimension tables are expanded into fact tables, which can not only cope with the queries of different levels of users, but also can integrate the source data through the hierarchy to minimize the amount of data storage, thus improving the query function.

The dimension table of snowflake pattern is based on the paradigm theory, so it is a design pattern between the third and star patterns, usually the canonical structure of the third paradigm in some data organization, and some data organization adopt the fact table and the dimension table structure of the star pattern. In some cases, the snowflake pattern is formed when the star pattern organizes the data to reduce the dimension table hierarchy and handle many-to-many relationships by normalizing the data tables.

The advantage of Snowflake mode is that it reduces storage space to some extent, and the normalized structure is easier to update and maintain. The same snowflake pattern also has many drawbacks: the snowflake pattern is more complex, the user is not easy to understand, the browsing content is relatively difficult, and the additional connection will degrade the query performance. In a data warehouse, "snowflake" is not usually recommended. Because in the Data Warehouse, query performance is more important than OLTP systems, and snowflake mode degrades the performance of the Data Warehouse system.

Conclusion:

1> regularization;
2> less data redundancy;
3> Some data need to be connected to obtain, may be less efficient;
4> Standardized operation is more complex, resulting in complex design and later maintenance;

In practical applications, a mixture of the above two models can be taken: 5>

Such as: the middle layer using snowflake structure to reduce data redundancy, data mart part of the use of star to facilitate data extraction and analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.