Model design of Data Warehouse

Last Update:2016-05-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Model design of Data Warehouse A. Data Modeling Methodology

The design of the Data warehouse model follows the design principle of "top-down and gradual refinement".

The design of the model is divided into three stages:

1 , the conceptual model

The scope and use of the business, from the height of the abstract summary, that is, to divide the subject domain.

Generally divided into 8 subject fields:

Customer, service, service use, accounts, settlement, resources, customer service, marketing

Why divide the Subject field?

The topic domain is divided according to the application and needs of the business, and it is used to achieve the purpose of data and business tight coupling.

2 , the logical model

Refine the topics in the conceptual model to define the relationship between the entity and the entity, and the attributes of the entity.

That is, the table is defined as the table's role, the table's fields are constrained. Form ER diagram.

These entities are designed based on business rules, so to speak, this phase is primarily about business. " Business-driven modeling "

3 , the physical model

In accordance with the logical model, in the database to build tables, indexes and so on. Data Warehouse, in order to meet the requirements of high performance, can increase the redundant, hidden table constraints and other anti-third paradigm Operation .

This stage, mainly for the database, hardware, performance.

Paradigm :

First paradigm: The fields of a database table are single attributes and cannot be divided.

Second paradigm: A partial function dependency of a non-critical field on any of the candidate key fields does not exist in the database table.

(Part of a function dependency is when some of the fields in a composite keyword determine non-critical fields). That is, all properties are required to depend on the primary key.

Third Paradigm : a non-critical field in a database table does not have a transfer function dependency on any of the candidate key fields.

The paradigm is backwards compatible.

For example:

Student ID	Student Name	Student Departments	Course ID	Course Name	Results
60100	Tom	School of Education, Department of psychology, Class 1	English_1	English 1	80

1) violate the first paradigm. Because: The student department can be divided into: College, Department, class

2) violate the second paradigm. Because: The key field is the student ID and course ID, but there is a "course ID" that determines the course name and course credits.

3) violate the third paradigm. Because: The key field is the student ID, but there may be name and credit dependent "course ID".

Star model and Snowflake model

First, they are all made up of a fact table and a set of dimension tables.

Star model, also known as dimension modeling .

The difference is:

Star Model : The dimension table is connected directly to the fact table, and the pattern is like a star.

such as counties and cities as the same dimension are in the city table.

* Dimension preprocessing, the dimension will be pre-classification, sorting and other preprocessing.

Snowflake Model : Some dimension tables are not directly connected to the fact table, but are passed through the dimension table, and the shapes are like snowflakes.

For example:

Figure 1: Star-shaped model

Figure 2 Snowflake Model

From the performance point of view, Star Model query performance is good.

To improve performance, you can allow for violations of the third paradigm, appropriate redundancy, and hiding the constraints between tables.

Dimension modeling

The business dimension is fused into the data model, and the name dimension is modeled.

Or, for analysis convenience (commercial application requirements), the different levels of the same dimension (such as the city ID, county ID) are fused into the fact table (such as the User wide table).

The dimension model is also a star model.

It emphasizes the preprocessing of dimensions, the collection of multiple dimensions into a fact table, and the formation of a wide table , such as the user unified view above. Contains more than 20 dimensions. This makes it possible to combine the dimensions to form a flexible report query.

B. Layered design principles

The Data Warehouse in telecom industry adopts the principle of layered design.

In general, it is divided into three layers: interface layer, intermediate summary layer and application layer.

Application layer	Data mart	City data mart, data mining
Application layer	KPI report, Cagnos, topic analysis, Indicator library
Middle layer	depth Rollup layer	Information Aggregation: User Unified View , 3G user Unified view, fixed phone user Unified view
Business development: User behavior, Value-added business, group business, International business
Mild summary layer	List summary, user attribute aggregation, cost summary, group customer summary, etc.
Interface layer	storage layer	Interface Backup, incremental transfer total, reduced I/O ( Common data and historical data)
interface Layer	Day interface, monthly interface, incremental interface, Full-volume interface

Special emphasis is placed on:

The middle tier is the most important layer of data warehousing. Directly determines the performance of the Data Warehouse.

The general practice is to:

1) Data summary. Summarize the underlying data in small granularity by dimension

2) Information aggregation. Aggregates information from multiple tables in a single table. The benefit of this is to avoid using table associations to improve query performance.

C. Topic Domain design methodology

If layered design is a horizontal design principle, then the subject domain is the vertical processing method.

The specific approach is to divide the data into different subject domains from the perspective of business, high abstraction and generalization.

The benefits of a domain: tight coupling, easy data development, and ease of use.

The domain is to have obvious table naming rules, such as:

User Information Domain--user

Communication Behavior--call

Data Service--gprs

Accounting--bill

Customer Service--serv

Data architecture diagram of XX sub-system:

Model design of Data Warehouse (GO)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Model design of Data Warehouse

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Model design of Data Warehouse

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support