Original Address http://blog.itpub.net/23659908/viewspace-1118762/
Thank you.
model design of Data Warehouse A. Data Modeling Methodology
The design of Data warehouse model follows the design principle of "top-down and gradual refinement".
Model design is divided into three stages:
1, conceptual model
The scope and use of the business, from the height of the abstract summary, that is, the division of the subject area.
Generally divided into 8 subject fields:
Customers, services, service use, accounts, settlement, resources, customer service, marketing
Why should I divide the subject field?
The Division of subject domain is based on the application and needs of the business, and is used to achieve tight coupling between data and business.
2, logical model
The topics in the conceptual model are refined to define the relationships between entities and entities, and the attributes of entities.
That is, define the role of specific tables, table and table constraints, table fields. Form ER diagram.
The design of these entities is based on business rules, it can be said that the main face of this phase is the business. Which is " business-driven modeling ."
3, physical model
According to the logical model, the table, index and so on are built in the database. Data Warehouse, in order to meet high-performance requirements, can increase redundancy, hidden table constraints, such as anti-third paradigm Operation .
This phase, mainly for the database, hardware, performance.
Paradigm :
First paradigm: The fields of a database table are single properties and cannot be divided.
Second paradigm: There is no partial function dependency of a non-critical field on any candidate key field in the database table.
(Partial function dependencies refer to the existence of certain fields in a combination keyword that determine non-critical fields.) This requires that all properties depend on the primary key.
third normal form: No non-critical field in the database table has a transitive function dependency on any candidate key field.
Paradigm is backward-compatible.
For example:
Student ID |
Student Name |
Student Department |
Course ID |
Course Name |
Results |
60100 |
Tom |
School of Education, Department of psychology, Class 1 |
English_1 |
English 1 |
80 |
1) violate the first paradigm. Because: The student department can be divided into: College, Department, class
2) violate the second normal form. Because: The key field is the student ID and course ID, but there is a "course ID" that determines the course name and course credits.
3) violate the third normal form. Because: The key field is the student ID, but there is a possible name and credit dependency "course ID".
star model and snowflake model
First, they all consist of a fact table and a set of dimension tables.
Star model, also known as dimension modeling .
The difference is:
Star Model : the dimension table is directly connected to the fact table, and the figure is like a star.
such as counties and cities as the same dimension are in the city table.
* Dimension preprocessing, the dimension will be in advance classification, sorting and other preprocessing.
Snowflake Model : Some dimension tables are not directly connected to the fact table, but are relayed through the dimension table, like snowflakes.
For example:
Figure 1: Star model
Figure 2 Snowflake Model
In terms of performance, Star Model query performance is good.
To improve performance, you can allow violations of the third paradigm, appropriate redundancy, and hidden constraints between tables.
Dimension Modeling
The business dimension is merged into the data model, hence the name dimension is modeled.
Or, in order to analyze the convenience (commercial application requirements), the different levels of the same dimension (such as the city ID, county ID) are merged into the fact table (such as user wide table).
The dimension model is also a star model.
It emphasizes that the dimension is preprocessed first, and multiple dimensions are assembled into a fact table to form a wide table , such as the user consolidated view above. Contains more than 20 dimensions. This can combine each dimension to form a flexible report query.
B. Layered design Principles
The principle of layered design is adopted in the data warehouse of telecom industry.
In general, it is divided into three layers: interface layer, intermediate summary layer and application layer.
Application layer |
data mart |
City Data Mart, data mining |
application layer |
KPI report, Cagnos, theme analysis, index gallery |
Middle tier |
depth Rollup level |
Information aggregation: User Unified view, 3G user Unified view, fixed-line user Consolidated view |
Business expansion: User behavior, value-added services, group business, International business |
Mild summary Layer |
List totals, user attribute aggregations, fee totals, group customer totals, and so on |
Interface layer |
Save Storage Layer |
Interface Backup, incremental total amount, reduce I/O (Common data and historical data) |
interface Layer |
Day interface, monthly interface, incremental interface, full-volume interface |
Particular emphasis is placed on:
The middle tier is the most important layer of the Data Warehouse. Directly determines the performance of the Data Warehouse.
The general practice is:
1) Data summary. Small granularity summary of the underlying data by dimension
2) Information aggregation. Aggregates information from multiple tables into a single table. The advantage of this is that you avoid using table associations to improve query performance. c. Thematic domain design methodology
If the layered design, is the horizontal design principle, then the topic domain is the longitudinal processing method.
The specific approach is from the business, a high degree of abstraction and induction, the data into a different subject area.
Sub-domain Benefits: Business tight coupling, easy to expand data, easy to use.
The domain is to have a distinct table naming rule, such as:
User Information Domain--user
Communication Behavior--call
Data Service--gprs
Accounting--bill
Customer Service--serv
Data structure diagram of XX sub-system: