Model design of Data Warehouse

Source: Internet
Author: User

Model design of Data Warehouse A. Data Modeling Methodology

The design of the Data warehouse model follows the design principle of "top-down and gradual refinement".

The design of the model is divided into three stages:

1 , the conceptual model

The scope and use of the business, from the height of the abstract summary, that is, to divide the subject domain.

Generally divided into 8 subject fields:

Customer, service, service use, accounts, settlement, resources, customer service, marketing

Why divide the Subject field?

The topic domain is divided according to the application and needs of the business, and it is used to achieve the purpose of data and business tight coupling.

2 , the logical model

Refine the topics in the conceptual model to define the relationship between the entity and the entity, and the attributes of the entity.

That is, the table is defined as the table's role, the table's fields are constrained. Form ER diagram.

These entities are designed based on business rules, so to speak, this phase is primarily about business. " Business-driven modeling "

3 , the physical model

In accordance with the logical model, in the database to build tables, indexes and so on. Data Warehouse, in order to meet the requirements of high performance, can increase the redundant, hidden table constraints and other anti-third paradigm Operation .

This stage, mainly for the database, hardware, performance.

Paradigm :

First paradigm: The fields of a database table are single attributes and cannot be divided.

Second paradigm: A partial function dependency of a non-critical field on any of the candidate key fields does not exist in the database table.

(Part of a function dependency is when some of the fields in a composite keyword determine non-critical fields). That is, all properties are required to depend on the primary key.

Third Paradigm : a non-critical field in a database table does not have a transfer function dependency on any of the candidate key fields.

The paradigm is backwards compatible.

For example:

Student ID

Student Name

Student Departments

Course ID

Course Name

Results

60100

Tom

School of Education, Department of psychology, Class 1

English_1

English 1

80

1) violate the first paradigm. Because: The student department can be divided into: College, Department, class

2) violate the second paradigm. Because: The key field is the student ID and course ID, but there is a "course ID" that determines the course name and course credits.

3) violate the third paradigm. Because: The key field is the student ID, but there may be name and credit dependent "course ID".

Star model and Snowflake model

First, they are all made up of a fact table and a set of dimension tables.

Star model, also known as dimension modeling .

The difference is:

Star Model : The dimension table is connected directly to the fact table, and the pattern is like a star.

such as counties and cities as the same dimension are in the city table.

* Dimension preprocessing, the dimension will be pre-classification, sorting and other preprocessing.

Snowflake Model : Some dimension tables are not directly connected to the fact table, but are passed through the dimension table, and the shapes are like snowflakes.

For example:

Figure 1: Star-shaped model

Figure 2 Snowflake Model

From the performance point of view, Star Model query performance is good.

To improve performance, you can allow for violations of the third paradigm, appropriate redundancy, and hiding the constraints between tables.

Dimension modeling

The business dimension is fused into the data model, and the name dimension is modeled.

Or, for analysis convenience (commercial application requirements), the different levels of the same dimension (such as the city ID, county ID) are fused into the fact table (such as the User wide table).

The dimension model is also a star model.

It emphasizes the preprocessing of dimensions, the collection of multiple dimensions into a fact table, and the formation of a wide table , such as the user unified view above. Contains more than 20 dimensions. This makes it possible to combine the dimensions to form a flexible report query.

B. Layered design principles

The Data Warehouse in telecom industry adopts the principle of layered design.

In general, it is divided into three layers: interface layer, intermediate summary layer and application layer.

Application layer

Data mart

City data mart, data mining

Application layer

KPI report, Cagnos, topic analysis, Indicator library

Middle layer

depth Rollup layer

Information Aggregation: User Unified View , 3G user Unified view, fixed phone user Unified view

Business development: User behavior, Value-added business, group business, International business

Mild summary layer

List summary, user attribute aggregation, cost summary, group customer summary, etc.

Interface layer

storage layer

Interface Backup, incremental transfer total, reduced I/O ( Common data and historical data)

interface Layer

Day interface, monthly interface, incremental interface, Full-volume interface

Special emphasis is placed on:

The middle tier is the most important layer of data warehousing. Directly determines the performance of the Data Warehouse.

The general practice is to:

1) Data summary. Summarize the underlying data in small granularity by dimension

2) Information aggregation. Aggregates information from multiple tables in a single table. The benefit of this is to avoid using table associations to improve query performance.

C. Topic Domain design methodology

If layered design is a horizontal design principle, then the subject domain is the vertical processing method.

The specific approach is to divide the data into different subject domains from the perspective of business, high abstraction and generalization.

The benefits of a domain: tight coupling, easy data development, and ease of use.

The domain is to have obvious table naming rules, such as:

User Information Domain--user

Communication Behavior--call

Data Service--gprs

Accounting--bill

Customer Service--serv

Data architecture diagram of XX sub-system:

Model design of Data Warehouse (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.