First, preface
Recently in the design of data warehouse data logical model, taking into account the large-volume data storage in the Distributed Data Warehouse technology architecture model, the traditional face-relational data warehouse data storage model needs to be reformed. A set of data storage models that are truly suitable for distributed data warehouses are designed.
Second, the basis of the fact table design
The fact table records the operational events that occur in the real world, and the resulting degrees of value . The fact table is designed to be completely dependent on physical activity and is not affected by the resulting report. In fact tables, in addition to numeric measures, fact tables always contain foreign keys , which are used to correlate dimensions associated with them, or to include degenerate dimension keys and date/time stamps .
Iii. Traditional Models
In the case of the FS-LDM data storage Model event topic domain data storage model design, its event topic domain data logical model structure is as follows:
The main problems of the traditional model are as follows: if there is a small amount of data, it is not a problem to perform multi-table associations, but when a single table stores massive amounts of data in a distributed data warehouse, it is obvious that the pattern will face challenges.
Iii. a new principle of distributed model-dimension modeling
(1) By the value of the key: The key value of the unique dimension table, unless necessary, otherwise not introduce a dimension table, such as the IP Address dimension table, using IP as the primary key of the dimension table, the fact table stores the IP value;
(2) Reasonable sub-table: Traditional relational data Warehouse has multiple table integration impulse, such as event fact table, various acount ind,finance Ind, etc., used to extend the universality of the table, and try to store all the data in a single table. Distributed Data Warehouse design, on the contrary, because of the single-table data size problem, if you want to meet the performance of analysis and processing, reasonable data in accordance with the business of the table storage. such as financial related events, account-related events, separate into a table. More conducive to data calculation and analysis.
Iv. examples of distributed dimension models
| serial number |
field English |
field Chinese |
field explanation |
|
|
|
| 1 |
event_id |
Event ID |
Record identification |
|
|
Identity |
| 2 |
Tm |
Time |
Time stamp |
|
|
Dimension of |
| 3 |
Domain |
Domain |
|
|
|
Dimension-Value |
| 4 |
Ip |
IP Address |
IP Address |
|
|
Dimension-Value |
| 5 |
Os |
Operating system |
Operating system |
|
|
Dimension-Value |
| 6 |
user_id |
User ID |
User ID |
|
|
Dimension-FK |
| 7 |
Date |
Date |
Date (new) |
|
|
Dimension-Value |
| 8 |
From |
Source |
Accessing the source system |
|
|
Dimension-Value |
V. To be continued before the end
Distributed Data Warehouse data storage model design, follow-up will continue to update, please pay attention to QQ Group: Distributed Data Warehouse modeling 398419457.
Data Warehousing Special Topic (3)-thinking on the design of the fact table of distributed data Warehouse