When building a data warehouse, we will certainly mention the dimension modeling method. This method is first proposed by Kimball, and its simplest description is to build the Data Warehouse, data mart according to the fact table and dimension table. In the dimension modeling method system, the dimension is the angle that describes the fact, such as date, commodity, address, etc., the fact is to measure the indicator, such as the number of users, sales and so on. According to the general book introduction, dimensional modeling will also be divided into star model, snowflake model, each has advantages and disadvantages, but rarely directly answer a question, that is, why the Data Warehouse to use dimensional modeling ?
The basic judgment of this question is whether the data should be open to business people to use ? The database structure table constructed by dimension modeling is more consistent with the intuition of ordinary people, and is easy to be understood by ordinary people, which facilitates the popularization and use of data. The following is an example of a supermarket cashier small ticket to illustrate the normal three-paradigm model and dimension model.
The three-paradigm data model is shown below:
The dimension model is illustrated as follows:
The minimum data granularity of the above two models are small ticket items, which can be easily seen, and the dimension model is to expand the hierarchical structure of the relational model. From the above example, the basic reason for adopting the dimension modeling method is to:
The data structure is simple . Before deciding whether to use dimensional modeling, you must answer the question,"is the data model open to business people directly ", and if the answer is yes, you should adopt a method of dimension modeling. The concept of dimensional models is somewhat scholarly, but essentially, it is the hierarchical data structure that is expanded to a single level, a bit similar to the fact that a business process is aggregated into an Excel sheet page.
However, the cost of dimensional modeling is also obvious, is that its flexibility is poor, data redundancy is more, so, in many books put forward a compromise approach, that is, "snowflake model", while also the seriousness of the comparison between the Snowflake model and star model (i.e. the above example) of the pros and cons, so that many beginners heart floated a lingering doubt , "Since the snowflake model has the advantages of the relational model and the advantages of the dimension model, why should there be a star model?" stealing thought, this is completely superfluous the practice, the dimension modelling pursues is uses simple, increases the first level association, increases the use complexity to increase more than one point, will give the most does not understand the technical business personnel to bring the enormous obstacle, is one kind of trifles practice .
Based on this consideration, in the process of building the data Warehouse, the detail layer and the market layer adopt different modeling methods respectively, namely:
The detail layer adopts the traditional three-paradigm relation model . This level of data model to describe the business process clearly, the source data (that is, business systems) implicit, ambiguous concepts, such as active users, VIP users and so on. The goal of this level of data model is to express business processes flexibly, to ensure data consistency, uniqueness, correctness, to keep data synchronization with the source data at the least cost, and that the data model of this level is not recommended to be used directly by the business people who do not know the technology, therefore, the use of the relational three-paradigm model is the best choice.
The market tier uses the dimension model . A market layer is a collection of data for a specific department or person based on business topics, sub-topics, and data models that are open to business people for data mining and business analysis. Because most of the salesman does not understand the database technology, the lack of the business requirements to transform the relational data structure of logical thinking, more difficult to write a complex SQL statement, therefore, the simpler the data model, the more they can be accepted, therefore, the level of the data model built, to follow the business process to organize, Each fact table represents an independent business process, and there is no direct dependency between the fact tables, so that the business can easily map the analysis requirements to the fact table, use tools or hand-write simple SQL, and extract the statistical data for analysis.
Above, it is the basic judgment that the data Warehouse adopts dimension modeling and relational modeling.
Data Warehouse Series-Why dimensions are modeled