Chapter 2
Data Model and organization of Data Warehouses
Key points of this Chapter
-Data Model of the Data Warehouse
Conceptual Model
Logical Model
Physical Model
-Basic concepts of Data Warehouse data organization
Granularity
Dimension
Metadata
Data Segmentation
-Data Organization of a data warehouse
How data is organized in a data warehouse
Data Storage Organization of Data Warehouses
2
Data Organization in a data warehouse
High-level comprehensive integration level
Mild comprehensive level
Dangdang comprehensive level
Early Period
Multi-level data
3
Data Model of Data Warehouse-
Differences from Database System Data Models
The data model of the Data Warehouse does not contain pure operational data.
The data model of the Data Warehouse expands the code structure.
As part of the code.
Some topic-oriented data models are added to the data warehouse.
Export data.
4
Star chart model
Physical Data Model
Conceptual Model
Logical Model
Physical Model
User-oriented requirements
Fine
Informatization
Layer
Times
More detailed
Technical details
Data Model of Data Warehouse
Information Package Diagram
5
Information Package Diagram (Conceptual Model)
Information Package diagram: it is the first or highest level of the data model of the data warehouse. Large
Most business data is multidimensional, but the traditional data model represents a number larger than three dimensions.
It is difficult to argue. The information package diagram simplifies this process and allows users to design
Multi-dimensional information package and establish contact with developers and other users. This model is concentrated in
The information package provides a visual view of the analyst's thinking model.
.
Work:
-Determining system boundaries: Decision type, required information, and original information
-Determine the topic domain and its content: the public key code, contact, and attribute group of the topic domain.
-Dimension determination: such as time dimension, sales location dimension, Product Dimension, and group dimension
-Confirm category: Detailed category of the corresponding dimension
-Determining indicators and facts: numerical information used for analysis
6
Information Package Diagram
Information Package:
Dimension
Category
Indicator and fact blank information package chart Style
7
Information Package Diagram
[Example] plot the sales analysis information package.
Solution: Determine the dimension and category of the information package based on the actual needs of the sales analysis.
And metrics and facts:
(1) dimensions: including date, sales location, sales product, and age group
Gender dimension.
(2) category: Determine the detailed category of each dimension, for example, the date dimension includes year (10 ),
Quarterly (40), monthly (120), and other categories. The numbers in brackets indicate
Number of categories; sales location dimensions include country (15), Region (45 ),
City (280), district (880), and store (2000 ).
The numbers also indicate the quantities of different types. Similarly, sales can be determined.
Product, age group, gender, and other detailed categories.
(3) indicators and facts: Determine the numerical information used for analysis, including
Sales volume, actual sales volume, and forecast deviation.
8
Sales Analysis Information Package Diagram
Indicators and facts:
Predicted sales volume, actual sales volume, and forecast Deviation
Store
(2000)
Zone (880)
Product (240)
City (280)
Month
(120)
Product group (48)
Region (45)
Quarter
(40)
Gender Group (2)
Age group (8)
Product (6)
Country (15)
Year (10)
Date sales location sales product age group gender
Information Package:
Sales Analysis
Dimension
Category
9
Star chart model (Logical Model)
Star chart: the second layer of the data model of the Data Warehouse is final.
Add a star chart model with some details to the data structure. And traditional customs
Compared with the system model, the star chart model simplifies
System, defining data entities from the perspective of supporting decision-making, more suitable for large
Complex query.
A star chart consists of three logical entities:
-Metrics
-Dimension
-Detailed category
10
Star chart model (Logical Model)
[Example] sales analysis star chart model.
Time Dimension
Product Dimension
Region dimension
Group dimension
Other dimensions
Sales Analysis:
Actual Sales
Predicted sales
Prediction Deviation
11
Physical Data Model
Physical Data Model: The third layer of the data model. It is a star graph model.
Data warehouse implementation, such as physical access methods and data
Storage Structure.
During physical design, data is often used based on the importance of data.
Frequency and response time requirements are classified, and different classes
The data is stored in different storage devices. High importance,
Frequently accessed and highly responsive data is stored in High-Speed Storage
On devices, such as hard disks; low access frequency or response time to access
Low data can be stored on low-speed storage devices.
12
Granularity-first form
Granularity: the degree to which data in a data warehouse is comprehensive
It affects both the data volume in the data warehouse and the data volume.
According to the types of questions that can be answered by the repository.
The smaller the granularity, the lower the overall level, and the more types of answers to queries;
The higher the degree of integration, the higher the query efficiency.
Small-granularity data can be stored in low-speed storage in a data warehouse
Large-granularity data is stored in High-Speed storage.
13
Granularity-second form: Sample Database
Sample Database: In the analysis process, there are many exploration processes sometimes
The purpose of the analysis does not require accurate results. You only need to obtain relative accuracy.
The sample database can be extracted based on the data that reflects the trend.
Sample Database granularity: The granularity is determined based on the sampling rate,
Different sample databases with different sampling granularity can have the same comprehensive level
Don't, it is based on a certain sampling rate from the detailed database or mild integrated number
A subset extracted from the database.
The sample database is extracted based on the importance of the data.
Using a sample database to collect important data for analysis can improve the analysis efficiency.
Rate, but also helps to grasp the main factors and main contradictions.
14
Dimension
Dimension: is a physical feature (such as time, location, product, and so on). It is
A basic way to express information in a data warehouse.
Index. Generally, a report only contains rows and columns. However, in a data warehouse
Most of the stored data is represented in multi-dimensional (3D or above) views.
.
For example:
-Data in a sales system can be divided into time dimension, Product Dimension, and geographical location.
Maintenance;
-Data in a financial system can be divided into time dimension, expenditure dimension, and income dimension.
And so on;
-Data in the decision-making support system of an enterprise can be divided into cost, expense, and sales.
Sales revenue, profit, and stock value.
15
Aggregation
In data warehouse technology, each dimension can contain multiple layers.
In turn, some layers can provide users with a certain level of data.
For example, in the geographical location dimension, all blocks constitute the ground.
It consists of all regions and cities. Aggregation refers
Move data in different layers of dimensions to form different layers of dimensions
Data sets, allowing users to observe not only in one dimension
And observe the data at different layers of the dimension.
16
Decomposition and Synthesis
Decomposition and synthesis further segment data in one dimension or
The process of another standard combination. For example, when the data is observed by geographical location
The user can first observe the data in a country (such as China). However
Then you can choose to observe the data of a region (such as East China), and then
To observe the data of a province or city (such as Shanghai ).
Is the process of data decomposition. Synthesis is the inverse process of decomposition, such as the user
Start with the province and city as the observation object, and then observe
Is a process of data synthesis.
17
Segmentation and its standards
Split: data is dispersed into physical units so that
Separate processing to improve data processing efficiency and split data
A unit is called a shard.
Data segmentation criteria: by date, region, business area, or
A combination of multiple segmentation criteria.
The purpose of data segmentation: To facilitate data reconstruction, indexing, and duplication
Group, recovery, monitoring, Scanning
18
Data Segmentation Method
Vertical Split: Vertical Split divides a table vertically into two parts. This type
Partitioning helps you divide a large number of columns into two independent tables.
Are associated by a keyword segment.
Horizontal segmentation: The table is divided into two parts by row. This type
Segmentation is used to store important local data closely related to users, thus reducing
Network query.
Graphic Splitting: a graph is divided into two parts by multiple distribution systems. Yes
Get a table from the specified server or establish a connection between multiple servers
All required data. This type of split is used to make small, static
Tables that are unstable and grow larger are separated.
19
Metadata
Metadata: data used to describe data. It describes and determines
BIT data components, their origins and their entry into the data warehouse
Activities in the process; descriptions of data and operations (input,
Computing and output ). Metadata available files exist in metadatabase.
To effectively manage data warehouses, you must design a description
Metadata with strong content.
20
Types of metadata
Conversion metadata: to migrate data from a transaction processing environment to a data warehouse
Instead, the metadata contains information and information about all source data.
Service description, data structure definition, data extraction and data transmission calculation
Rules for data integration and data purification, data access and transmission
Record.
DSS metadata: Used in a data warehouse with multi-dimensional vendors of end users
Establishes ing between industry models/front-end tools. This metadata is often called
DSS metadata is often used to develop more advanced decision support tools.
21
Metadata content in the data warehouse
Metadata about source data: all physical data structures in the data source;
Business definition of all data items; the frequency of updating each data item
And who or which process updates the description; the validity of each data item
Value; a list of data items with the same business meaning in other systems.
Metadata about Data Warehouse ing.
Metadata about system security.
22
Similar to the data dictionary of traditional database systems.
The topic description of the data warehouse.
Description of external and unstructured data.
Record System definition.
Logical model definition.
Conversion rules for data entering the data warehouse.
Data extraction history.
Granularity.
The definition of data segmentation.
Generalized index.
Description of the storage path and structure.
23
Data Organization of a data warehouse
A data warehouse is a new analysis and processing environment.
Data storage and organization technologies.
The data structure of a data warehouse is different from that of a general database system.
System, basic data that needs to be obtained from the original Business Database
And the integrated data is divided into different levels. In the data warehouse,
Data can be divided into four levels by granularity: Early details
Level, current level of detail, mild level of detail, and high level of detail.
24
How data is organized in a data warehouse
Relational table-based storage: The main problem with this method is that
After the model is defined, data extraction from the database often requires the compilation of independent and complex
Therefore, the versatility is poor and difficult to maintain.
Multi-dimensional database storage: the multi-dimensional database is directly organized
The data organization form of OLAP analysis operations. Such database products are also compared
And the implementation methods are also different. Its data organization adopts multi-dimensional data structure
And has a dimension index and corresponding metadata.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.