ArticleDirectory
- 1.1.1 define facts
- 1.1.2 build an attribute tree
- 1.1.3 trim and port the attribute tree
- 1.1.4 define dimension
- 1.1.5 define measurement
- 1.1.6 generate fact Mode
Three basic system methods can be used in the data mart design: data-driven, demand-driven, and hybrid. They differ in the proportion of the source database analysis and end user demand analysis stages. The method selection will greatly affect the conceptual design approach.
Data-driven methods include Design Based on Object-link mode, Design Based on Link mode, and design based on XML mode. The conceptual entity-relational model is more expressive than the relational logic model. Therefore, the former is generally considered a better design source. However, the actual situation is that the company often cannot provide a precise and complete entity-link model (lost, incomplete documents, or other reasons ). Then we can only do it according to the logic mode of the database. On the other hand, most of the Web data is in XML format. The design based on the XML pattern can derive a data mart conceptual pattern from the XML source pattern.
1. Data-driven method design 1.1 Design Based on entity-Relational Model
The entity-relational model-based technology used in the design of a data mart concept that complies with the dimensional fact model (DFM) includes the following steps:
(1) define facts.
(2) For each fact:
A. Create an attribute tree.
B. Trim and port the attribute tree.
C. Define a dimension.
D. Define a measurement.
E. Create a fact mode.
Select related facts from the data source mode. Then, create an attribute tree in semi-automatic mode. This is a transitional structure that can be used to determine the boundary of the fact mode to clear irrelevant attributes and modify the dependencies linked to these irrelevant attributes (corresponding to step (2). B ).Attribute treeLink the data mart and Data Source mode. This link isKey to the data preparation process. Then, it is relatively easy to convert the attribute tree to the fact mode (step (2). E. Step A is based onAlgorithm; Step CDE is property-based. Steps 1 and B need to have a deep understanding of the company's business model.
1.1.1 define facts
Facts usually correspond to dynamic events in the company. In object-link mode,FactIt may correspond toEntityOr E1, E2,..., n yuan between en entitiesLinkR. For the latter, R can be converted to an entity (Materialized Process). To this end, add a new entity F and replace each branch of R with the binary relationship (RI) between F and EI. If min (E, A) and max (E, A) are usedMinimum base levelAndMaximum base level(Base level indicates that entity e participates in relationship a on the corresponding level. Generally, min (E, A) ε {0, 1}, max (E, A) ε {1, n}), then: min (F, RI) = max (F, RI) = 1, min (EI, RI) = min (EI, R), max (EI, RI) = max (EI, R ).
Note: sometimes different entities may be candidates for expressing individual facts. It is recommended that the entity selected as a fact should be an entity that constructs an attribute Tree Containing as many attributes as possible.
1.1.2 build an attribute tree
Attribute tree
Given a related part of the object-relational data source mode, and an object F that is classified as a fact, the attribute tree meets the following requirements:
- Each node corresponds to a data source mode attribute (simple or composite attribute ).
- Root corresponds to the identifier of the F object.
- For each node v, all subsequent attributes corresponding to V are determined by the function.
1.1.3 trim and port the attribute tree
1.1.4 define dimension
1.1.5 define measurement
1.1.6 generate fact Mode
1.2 relationship-based Design
1.3 XML-based Design
2. Hybrid Method Design
3 requirement-Driven Method Design
References:
Data warehouse design: modern principles and methods Matteo golfarelli, Stefano Rizzi