I. Overview
In the multidimensional analysis of business intelligence solutions, the common models can be divided into star model and snowflake model According to the relationship between fact table and dimension table. When designing a model of logical data, you should consider whether the data is organized according to the star model or the Snowflake model.
When all the dimension tables are connected directly to the fact table, the entire diagram is like a star, so the model is called the Star model, as shown in Figure 1.
A star architecture is an unstructured structure in which each dimension of a cube is directly connected to the fact table, there is no gradient dimension, so there is some redundancy in the data, such as the city C of state a province B and the city D two records of state a province B in the Geographical dimension table, then the state A and the province The information of B is stored two times, that is, redundancy exists.
Figure 1-star model in the Sales Data Warehouse
When one or more dimension tables are not directly connected to the fact table, but are connected to the fact table through other dimension tables, their diagrams are like multiple snowflakes connected together, so the snowflake model is called. The snowflake model is an extension of the star model. It is further hierarchical to the dimension table of the star model, the original dimension table may be extended to small fact table, forming some local "level" area, these decomposed tables are connected to the primary dimension table instead of the fact table. As shown in Figure 2, the geographical dimension table is decomposed into national, provincial, city and other dimension tables. The advantage of this is that it improves query performance by minimizing data storage and combining smaller dimension tables. snowflake structure goes beyond data redundancy.
Figure 2-snowflake model in sales Data Warehouse
star Model because of the redundancy of the data so many statistical queries do not need to do external connections, so in general, the efficiency is higher than the snowflake model. star structure does not need to consider many regularization factors, design and implementation are relatively simple. Snowflake model because of redundancy, some statistics need to be linked to the table to produce, so the efficiency does not necessarily have a star model high. Regularization is also a complex process, the corresponding database structure design, data ETL, as well as the maintenance of the latter are more complex. Therefore, in the premise of the redundancy acceptable, the actual use of the star model used more, but also more efficient.
second, the use of choice
The Star model (stars schema) and snowflake model (snowflake schema) are the two common methods used in data warehouses, and the comparisons between them are discussed from four perspectives.
1. Data Optimization
the snowflake model uses normalized data, which means that the data is organized inside the database to eliminate redundancy, so it can effectively reduce the amount of data. by referential integrity, both the business level and the dimensions are stored in the data model.
▲ Figure 1 Snowflake model
In comparison, the star model is useful for the inverse normalization of data. in the star model, the dimension is directly referred to as the fact table, and the business level is not deployed through referential integrity between dimensions.
▲ Fig. 2 star Shape model
2. Business model
A primary key is a single unique key (data property) that is selected for special data. In the example above, advertiser_id will be a primary key. A foreign key (reference property) is just a field in a table that matches a primary key in another dimension table. In the example we cited, advertiser_id would be a foreign key to the account_dimension.
In the snowflake model, the business level of the data model is represented by a relationship between the primary key-foreign key of a different dimension table. In the star model, all the necessary dimension tables have only foreign keys in the fact table.
3. Performance
The third difference is the difference in performance. Snowflake models have a lot of connectivity between dimension tables and fact tables, so performance is low. For example, if you want to know the details of advertiser, the Snowflake model asks for a lot of information, such as Advertiser Name, ID, and the address of those advertisers and customer tables that need to be connected and then connected to the fact table.
The star model has fewer connections, and in this model, if you need the information above, you can just connect the Advertiser dimension table to the fact table.
4.ETL
The snowflake model loads the data mart, so the ETL operation is more complex in design and cannot be parallelization due to the limitations of the subordinate model.
The star model loads the dimension table and does not need to add a subordinate model between the dimensions, so the ETL is relatively simple and can achieve high parallelism.
Summary
Snowflake models make dimensional analysis easier, such as "for specific advertisers, what customers or companies are online?" Star models are used to make metrics analysis more appropriate, such as "What is the income of a given customer?"
Reproduced from: http://blog.csdn.net/nisjlvhudy/article/details/7889422