Model design __DW of Data Warehouse

Source: Internet
Author: User
Original Address http://blog.itpub.net/23659908/viewspace-1118762/

Thank you.


model design of Data Warehouse A. Data Modeling Methodology

The design of Data warehouse model follows the design principle of "top-down and gradual refinement".

Model design is divided into three stages:

1, conceptual model

The scope and use of the business, from the height of the abstract summary, that is, the division of the subject area.

Generally divided into 8 subject fields:

Customers, services, service use, accounts, settlement, resources, customer service, marketing

Why should I divide the subject field?

The Division of subject domain is based on the application and needs of the business, and is used to achieve tight coupling between data and business.

2, logical model

The topics in the conceptual model are refined to define the relationships between entities and entities, and the attributes of entities.

That is, define the role of specific tables, table and table constraints, table fields. Form ER diagram.

The design of these entities is based on business rules, it can be said that the main face of this phase is the business. Which is " business-driven modeling ."

3, physical model

According to the logical model, the table, index and so on are built in the database. Data Warehouse, in order to meet high-performance requirements, can increase redundancy, hidden table constraints, such as anti-third paradigm Operation .

This phase, mainly for the database, hardware, performance.

Paradigm :

First paradigm: The fields of a database table are single properties and cannot be divided.

Second paradigm: There is no partial function dependency of a non-critical field on any candidate key field in the database table.

(Partial function dependencies refer to the existence of certain fields in a combination keyword that determine non-critical fields.) This requires that all properties depend on the primary key.

third normal form: No non-critical field in the database table has a transitive function dependency on any candidate key field.

Paradigm is backward-compatible.

For example:

Student ID

Student Name

Student Department

Course ID

Course Name

Results

60100

Tom

School of Education, Department of psychology, Class 1

English_1

English 1

80

1) violate the first paradigm. Because: The student department can be divided into: College, Department, class

2) violate the second normal form. Because: The key field is the student ID and course ID, but there is a "course ID" that determines the course name and course credits.

3) violate the third normal form. Because: The key field is the student ID, but there is a possible name and credit dependency "course ID".

star model and snowflake model

First, they all consist of a fact table and a set of dimension tables.

Star model, also known as dimension modeling .

The difference is:

Star Model : the dimension table is directly connected to the fact table, and the figure is like a star.

such as counties and cities as the same dimension are in the city table.

* Dimension preprocessing, the dimension will be in advance classification, sorting and other preprocessing.

Snowflake Model : Some dimension tables are not directly connected to the fact table, but are relayed through the dimension table, like snowflakes.

For example:

Figure 1: Star model

Figure 2 Snowflake Model

In terms of performance, Star Model query performance is good.

To improve performance, you can allow violations of the third paradigm, appropriate redundancy, and hidden constraints between tables.

Dimension Modeling

The business dimension is merged into the data model, hence the name dimension is modeled.

Or, in order to analyze the convenience (commercial application requirements), the different levels of the same dimension (such as the city ID, county ID) are merged into the fact table (such as user wide table).

The dimension model is also a star model.

It emphasizes that the dimension is preprocessed first, and multiple dimensions are assembled into a fact table to form a wide table , such as the user consolidated view above. Contains more than 20 dimensions. This can combine each dimension to form a flexible report query.

B. Layered design Principles

The principle of layered design is adopted in the data warehouse of telecom industry.

In general, it is divided into three layers: interface layer, intermediate summary layer and application layer.

Application layer

data mart

City Data Mart, data mining

application layer

KPI report, Cagnos, theme analysis, index gallery

Middle tier

depth Rollup level

Information aggregation: User Unified view, 3G user Unified view, fixed-line user Consolidated view

Business expansion: User behavior, value-added services, group business, International business

Mild summary Layer

List totals, user attribute aggregations, fee totals, group customer totals, and so on

Interface layer

Save Storage Layer

Interface Backup, incremental total amount, reduce I/O (Common data and historical data)

interface Layer

Day interface, monthly interface, incremental interface, full-volume interface

Particular emphasis is placed on:

The middle tier is the most important layer of the Data Warehouse. Directly determines the performance of the Data Warehouse.

The general practice is:

1) Data summary. Small granularity summary of the underlying data by dimension

2) Information aggregation. Aggregates information from multiple tables into a single table. The advantage of this is that you avoid using table associations to improve query performance. c. Thematic domain design methodology

If the layered design, is the horizontal design principle, then the topic domain is the longitudinal processing method.

The specific approach is from the business, a high degree of abstraction and induction, the data into a different subject area.

Sub-domain Benefits: Business tight coupling, easy to expand data, easy to use.

The domain is to have a distinct table naming rule, such as:

User Information Domain--user

Communication Behavior--call

Data Service--gprs

Accounting--bill

Customer Service--serv

Data structure diagram of XX sub-system:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.