The author from 98 into the field of database and data warehousing has been nearly eight years of time, the data modeling work contact more, innovative dare not talk about, this article will be working experience summed up for everyone to discuss and correct.
To bring up data modeling, one thing is to emphasize that data Modeler and DBA are quite different, for the data modeler, a deep understanding of the business is the first, different modeling methods and techniques for the business needs to serve. But this article temporarily throws open the business not to discuss, mainly focuses on the modelling method and the skill experience summary.
From the current database and data warehousing modeling methods, mainly divided into four categories.
The first class is the most familiar three-paradigm modeling of relational databases, and we usually use the three-normal modeling method to build various operational database systems.
The second type is the three-paradigm Data warehouse model advocated by Inmon, which is different from the three-paradigm modeling of operational database system. Inmon Data Warehouse Modeling method is divided into three layers, the first layer is the entity relationship layer, also is the enterprise's business data Model layer, in this layer and enterprise's Operation database system modeling method is the same; the second layer is the data item set layer, The modeling method in this layer is different from the modeling method of operation database system based on the factors such as the frequency of data generation and the frequency of access. The third layer is the concrete realization of the second layer.
The third category is the dimension modeling of the Data warehouse advocated by Kimball, which we generally call star structure modeling, and sometimes adding some snowflake models in it. Dimension modeling is a kind of user-oriented, easy to understand, access-efficient modeling method, but also the author prefers a modeling approach.
The fourth class is a more flexible mode of modeling, usually used in the background of the data preparation area, modeling the way eclectic, to meet the needs of the purpose, built tables do not provide interface to users, mostly temporary tables.
The following is a brief discussion of some of the experiences of class fourth modeling methods.
The data preparation area has one of the biggest characteristics, is not directly to the user, so the table in the data preparation area operation of the people only ETL engineers. ETL engineers can determine the scope of the data in the table and the lifecycle of the data themselves. Here are two examples:
1 temporary table with small data range
When the amount of data that needs to be consolidated or cleaned is too large, we can create a temporary table of the same structure, leaving only some of the data we need to process in the temp table. This makes it much more efficient to compute for some items in the table, whether they are updated or not. The processed data is sent into a table that is ready to be loaded into the data warehouse, and is last loaded into the Data warehouse.
2 temporary table with redundant fields
Because tables in the data preparation area are used only by themselves, building redundant fields can be very useful without taking risks.
For example, the author has encountered such a demand in the project, Customer table {Customer ID, customer net deduction}, debt table {Debt ID, customer ID, debt balance, debt net deduction value}, that is, customer and debt is a one-to-many relationship. Among them, the customer net deduction value and the debt balance is known, needs to calculate the debt net deduction value. The calculated rule is to distribute the net deduction value of the customer by proportion of the balance of the debt. At this time, we can add a few redundant fields to two tables, such as Customer table {Customer ID, customer net deduction value, customer balance}, debt table {Debt ID, customer ID, debt balance, debt net deduction value, customer balance, customer net deduction value}. This way, through three of SQL, you can complete the entire calculation process directly. The balance of the debt is aggregated to the customer's balance, the customer balance and the customer net deduction are redundant to the debt table, the debt net deduction value can be calculated directly by the formula of the debt balance x customer net deduction/customer balance.