Discussion on data modeling methods in data warehouse construction

Source: Internet
Author: User

Introduction:The main content of this article is not to introduce some data models of the existing popular main industries, but to share some of my experiences in data warehouse construction projects. We hope to help you summarize a set of methods that meet the current industry standards and meet the data warehouse construction standards in most industries during the data warehouse project construction.

The so-called "water" is impermanence. Different industries have different characteristics. Therefore, from a business perspective, their data models vary widely. Currently, the most popular data warehouse vendors in the industry are IBM and NCR. In addition to providing a powerful data warehouse platform, there are also data models for specific industries.

For example,In the banking industry, IBM has its own bdwm (banking data warehouse model), while NCR has its own FS-LDM model. In the telecommunications industry, IBM has tdwm (telecom data warehouse model), while NCR has its own TS-LDM model.Therefore, we can see that different companies have their own understanding of a specific industry, so there will be different companies targeting a specific industry model. For different industries, the same company also has different models, depending on the different business characteristics of different industries.

For example, the tdwm model of IBM contains the following nine concepts:

Figure 1. tdwm conceptual model of IBM

Many may ask why your model has nine concepts, not ten, and 11? What is the basis of your data warehouse model? In fact, this is a frequently asked question when we introduce our data models to our customers. I hope readers can find their answers when reading this article.

Although each industry has its own model, we find that data models in different industries share common basic characteristics in terms of data modeling methods.

One of the main purposes of this article is to hope that readers can read this article and combine their own experience in data warehouse construction, when building a data warehouse, we can summarize a set of modeling methods suitable for ourselves, which can help customers better play the role of data warehouse.

The main line of this article is to answer the following three questions:

  • What is a data model?
  • Why do we need a data model?
  • How to build a data model

Finally, we will introduce a specific data warehouse modeling example at the end of this article to help you understand the entire data modeling process.

I. What is a data model?

A Data Model is a tool and method used to abstract and describe the real world. It is a ing of transactions in the real world through abstract entities and links between entities.Here, the data model abstracts the relationships between entities. By defining and describing the relationships between entities, you can express the specific business relationships in the actual business.

The data warehouse model is a specific data model in the data model for a specific data warehouse application system. Generally, our data warehouse model is divided into several levels, as shown in 2.

Figure 2. Data Warehouse Model

Through the above figure, we can easily see that during the modeling process of the entire data warehouse, we need to go through four steps:

  • Business ModelingTo generate a business model, which mainly solves the business-layer decomposition and programming.
  • Domain ModelingGenerate a domain model, which is mainly used to abstract the business model and generate a domain conceptual model.
  • Logical ModelingTo generate a logical model. It mainly refers to the logic of the conceptual entities of the domain model and the relationships between entities at the database level.
  • Physical ModelingThe physical model is mainly used to solve physical and performance problems of different relational databases.

Therefore,The model design and architecture of the entire data warehouse involve both business knowledge and specific technologies. We need to understand a wealth of industry experience. At the same time, we also need certain information technologies to help us implement our data models. The most important thing is that we also need a very suitable methodology to guide us to abstract our own business, process and generate models for each stage.

Ii. Why do we need a data model?

In the construction of a data warehouse, we have repeatedly stressed the need for a data model. Why is the data model so important? First, we need to understand the history of Data Warehouse Construction.

The development of data warehouse has gone through three steps:

  • Simple report stage:At this stage, the main goal of the system is to solve the reports required by business personnel in daily work and generate some simple summary data that can help leaders make decisions. Most of the forms of this stage are database and front-end report tools.
  • Data mart stage:At this stage, data is collected and sorted according to the needs of a business department, and multi-dimensional reports are presented according to the needs of business personnel, data that provides guidance for specific businesses and specific leadership decision-making data.
  • Data Warehouse stage:At this stage, data of the entire enterprise is collected and organized based on a certain data model, and cross-Department data can be provided according to the needs of various business departments, completely consistent business report data can be generated through the data warehouse to provide guidance for the business, while providing comprehensive data support for leadership decision-making.

Through the development phase of Data Warehouse Construction, we can see that the important difference between data warehouse construction and data mart construction lies in the support of data models. Therefore, the construction of data models is of decisive significance for the construction of our data warehouses.

Generally, Data Model Construction helps us solve the following problems:

  • To improve the business process.In the business model construction phase, we can help our enterprises or management agencies comprehensively sort out their own businesses. Through the construction of the business model, we should be able to fully understand the business architecture diagram of the organization and the running status of the entire business, and be able to classify and program the business according to specific rules. At the same time, it helps us further improve business processes, improve business efficiency, and guide the production of our business departments.
  • Establishes a comprehensive data perspective to eliminate information islands and data differences.The data warehouse model construction can provide an overall data perspective for enterprises. No longer, each department only pays attention to its own data, outlines internal links between departments to help eliminate information islands between departments. More importantly, data model construction can ensure data consistency of the entire enterprise, the data differences between different departments will be effectively resolved.
  • Solve business changes and data warehouse flexibility.Through the construction of data models, the implementation of underlying technologies and the presentation of upper-layer services can be well separated. When the upper-layer business changes, the underlying technical implementation through the data model can easily complete business changes, so as to achieve the flexibility of the entire data warehouse system.
  • Help build the data warehouse system.Through data warehouse model construction, developers and business personnel can easily define the scope of system construction and plan long-term objectives, so that the entire project team can clarify the current task, accelerate the construction of the entire system.
Iii. How to build a data model

Since building a data model is a very important part of the entire data warehouse construction, how to build our data warehouse model is a problem we need to solve. Here we will detail how to create a suitable data model.

1) Data Warehouse data model architecture

The data model architecture of the Data Warehouse is closely related to the overall architecture of the Data Warehouse. First, let's take a look at the data model of the entire data warehouse. We can clearly see that the entire data model architecture is divided into five parts, each of which has its own unique functions.

Figure 3. Data Warehouse data model architecture

As we can see,The data model of the entire data warehouse can be divided into about five parts:

  • System record ):This is the main data warehouse business data storage area, where the data model ensures data consistency.
  • Internal management domain (housekeeping ):This part mainly stores the metadata used by the data warehouse for internal management. The data model can help you manage the metadata in a unified manner.
  • Summary of area ):This part of data comes from the summary of the system record domain. The data model ensures the performance of topic Analysis in the analysis domain and meets the requirements of some report queries.
  • Analysis area ):This data model is mainly used for the analysis of specific theme businesses in each business section. These data models can be separately stored in the corresponding dataset.
  • Feedback field (FEEDBACK area ):Optional. This part of the data model is mainly used for the feedback data of the corresponding front-end. The data warehouse can set this area as needed.

By dividing the data regions of the entire data warehouse model, we can understand that a good data model not only abstracts the business, but also provides specific guidance on the implementation technology, it should cover all aspects from business to implementation technology.

2) data warehouse modeling Stage Division

We have introduced several layers of the data warehouse model. The following describes the main work of Data Modeling at different stages of these layers:

Figure 4. Data warehouse modeling phase division

We can clearly see that the data modeling of a data warehouse is roughly divided into four stages:

1.Business ModelingThis part of modeling mainly includes the following parts:

  • Divide the business of the entire Organization. Generally, define the business work between each part according to the division of the business department, and clarify the relationship between different business departments.
  • Gain an in-depth understanding of the specific business processes of each business department and program them.
  • Propose methods and procedures for modifying and improving the workflow of the business department.
  • Define the scope of data modeling, and define the objectives and stages of the entire data warehouse project.

2.Domain Concept ModelingThis part of modeling mainly includes the following parts:

  • Extract key business concepts and abstract them.
  • Groups business concepts and aggregates similar grouping concepts based on the business main line.
  • Refine the grouping concept, clarify and abstract the business processes in the grouping concept.
  • Clarify the associations between grouping concepts to form a complete domain conceptual model.

3.Logical ModelingThis part of modeling mainly includes the following parts:

  • Business concepts are materialized, and specific attributes are taken into account.
  • The event is materialized, and its attribute content is considered
  • Describe the entity and consider its attribute content

4.Physical ModelingThis part of modeling mainly includes the following parts:

  • Make technical adjustments for specific physical platforms
  • Make adjustments to specific platforms for model performance considerations
  • Make corresponding adjustments based on specific platforms for management needs
  • Generate and complete the final execution script.

From the perspective of the division of various stages in the Data Modeling phase of the data warehouse, we can understand the main work and workload of modeling the entire data warehouse, we hope it will be helpful for our actual project construction.

3) data warehouse modeling method

In fact, everything follows its own law. There are also many data warehouse modeling methods. Each modeling method represents a philosophical point of view and a method of generalization and generalization. Currently, there are many popular data warehouse modeling methods in the industry. Here we mainly introduce several methods, such as the paradigm modeling method, Dimension Modeling Method, and entity modeling method, in essence, each method is to look at the problems in our business from different perspectives. either at the technical or business level, it actually represents a philosophical world view. The following describes the modeling methods in detail.

1. paradigm modeling (third normal form, 3nf)

The paradigm modeling method is a commonly used method for building data models. This method is primarily promoted by inmon and mainly used to store data in relational databases, A technical method used. Currently, most of our modeling methods in relational databases use the three-paradigm modeling method.

The paradigm is the basic theory of database logic model design. A relational model can be decomposed without loss from the first paradigm to the fifth paradigm. This process can also be called standardization. In the model design of data warehouse, the third paradigm is generally used, which has strict mathematical definitions. According to its meaning,A relationship that conforms to the third paradigm must have the following three conditions:

  • Each attribute value is unique and does not have ambiguity;
  • Each non-primary attribute must be completely dependent on the entire primary key, rather than a portion of the primary key;
  • Each non-primary attribute cannot depend on the attributes in other links, because in this case, such attributes should be included in other links.

Because the paradigm is developed based on the entire relational database theory, I will not introduce it here, interested readers can obtain this knowledge by reading relevant materials.

According to inmon, the data warehouse model construction method is similar to the enterprise data model of the business system. In the business system, the enterprise data model determines the data source, and the enterprise data model is also divided into two layers: The topic domain model and the logic model. Similarly, the topic domain model can be seen as a conceptual model of the business model, while the logical model is an instance of the domain model on the relational database.

Figure 5. paradigm Modeling

When switching from a business data model to a data warehouse model, the domain model of the data warehouse, that is, the conceptual model, and the logic model of the domain model also exist. Here,The data model in the business model is slightly different from the data warehouse model. The main difference is:

  • The domain model of the Data Warehouse should contain the relationship between the Domain Models of the enterprise data model and the definition of each topic domain. The concept of the domain model of the data warehouse should be wider than that of the subject domain model of the business system.
  • The logical model of a data warehouse needs to abstract entities, object attributes, object subclasses, and object relationships from the logical model of the Business System Data Model.

From the author's point of view, the biggest advantage of inmon's paradigm modeling method is that from the perspective of relational databases, combined with the data model of the business system, data warehouse modeling can be easily realized. However, its disadvantages are obvious. Because the modeling method is limited to relational databases, the flexibility and performance of the entire data warehouse model are limited in some cases, especially when the underlying data of the Data Warehouse is summarized to the data in the data mart, some modifications are required to meet the corresponding requirements. Therefore, I suggest readers refer to this mode for actual use.

2. dimensional modeling

Kimball first proposed this concept in dimensional modeling. The simplest description is to build a data warehouse and a data mart based on fact tables and dimension tables. The most widely known name for this method is star-schema ).

Figure 6. dimensional modeling

Is a typical star architecture. The star mode is widely used in many preprocessing operations on each dimension, such as pre-statistics, classification, and sorting by dimension. Through these preprocessing, the Data Warehouse processing capability can be greatly improved. EspeciallyFor the 3nf modeling method, star mode has obvious performance advantages.

At the same time, another advantage of the dimensional modeling method is that dimensional modeling is very intuitive and closely centered around the business model, which can intuitively reflect the business problems in the business model. Dimensional modeling can be completed without special abstraction. This is also the advantage of dimensional modeling.

However, the disadvantages of the dimensional modeling method are also very obvious. Because a large amount of data preprocessing is required before the star schema is built, a large amount of data processing work will be performed. In addition, when the business changes and you need to re-define the dimension, you often need to re-pre-process the Dimension Data. In these processes, a large amount of data redundancy is often caused.

Another disadvantage of the dimensional modeling method is that, if you only rely on dimensional modeling, data source consistency and accuracy cannot be guaranteed, and at the bottom of the data warehouse, it is not particularly suitable for dimensional modeling.

Therefore, from the perspective of the author, the field of dimensional modeling is mainly applicable to the data mart layer. Its biggest role is to solve the performance problems in data warehouse modeling. It is difficult for dimensional modeling to provide an abstract method to fully describe the complex relationships between real business entities.

3. entity modeling

The entity modeling method is not a common method in data warehouse modeling. It comes from a genre of philosophy. In a philosophical sense, the objective world should be segmented, and the objective world should be divided into entities and relations between entities. This abstract method can be introduced in the modeling process of the data warehouse, and the entire business can also be divided into entities, and the relationship between each entity is, the description of these relationships is what we need to do for data modeling.

Although the substantive law seems to be abstract, it is easy to understand. That is, we can divide any business process into three parts: entity, event, and description, as shown in:

Figure 7. entity modeling

It represents an abstract meaning. If we describe a simple fact: "James drove to school ". Taking this business fact as an example, we can regard "Xiao Ming" and "school" as an entity, and "school" describes a business process, here we can be abstracted as a specific "Event", while "Driving" can be seen as a description of "going to school.

From the above example, we can understand that the abstract induction method we use is actually very simple,Any business can be considered as three parts:

  • EntityIt mainly refers to the specific conceptual subject in the domain model and the object that has a business relationship.
  • EventIt mainly refers to the process of completing a business process between conceptual subjects, specifically a specific business process.
  • DescriptionIt is a special description of objects and events.

Entity modeling can easily divide business models. Therefore, entity modeling is widely used in business modeling and domain conceptual modeling. From the author's experience, if there is no ready-made industry model, we can use the entity modeling method to clarify the entire business model with the customer and divide the domain conceptual model, abstract A specific business concept. Based on the customer's usage characteristics, you can create a data warehouse model that meets your needs.

However, the entity modeling method has its own inherent defects. Since the Entity description method is only a method that abstracts the objective world, it is doomed that this modeling method can only be confined to the business modeling and domain conceptual modeling stages. Therefore, in the logical and physical modeling stages, it is the phase in which paradigm modeling and dimensional modeling give full play to their strengths.

Therefore, I suggest that you refer to the above three data warehouse modeling methods when creating your own data warehouse model and adopt different methods at different stages, this ensures the modeling quality of the entire data warehouse.

Iv. Data Warehouse creation examples

The above describes some abstract modeling methods and theories, which may be difficult to understand. Therefore, I will give an example here. Readers can follow our example, to get a preliminary understanding of the general process of data warehouse modeling.

Readers familiar with the Social Security Industry know that social security in our country is mainly divided into pension, unemployment, work injury, childbirth, medical insurance and labor market. In these six business fields, the current status of elder care and career systems have been basically improved, and some data has started online testing. However, for work-related injuries, childbirth, medical care, and labor market, the development of some areas is relatively mature, while some areas are not yet mature.

1. Business modeling stage

Based on the above background, we can easily divide the corresponding business in the business modeling stage. Therefore, in the business modeling stage, we basically determine our data warehouse construction objectives, construction methods, and long-term planning. For example:

Figure 8. Business modeling stage


Here, we clearly divide the entire business into several main business lines, such as elder care, unemployment, work injury, childbirth, medical care, and labor, then, based on these big modules, we can consider the business theme to be analyzed in the specific business main line within each business main line.

Therefore, the Business modeling stage is actually a process of organizing the business with the business personnel. In this process, not only can our technicians better understand the business, but also, it can also find some unreasonable links in the business process to improve and improve.

At the same time, another important task in the business modeling stage is to determine the scope of our data modeling. For example, in some business modules with insufficient data preparation, we can consider not building the corresponding data model first. When the conditions are mature, we can consider the problem of data modeling.

2. domain concept modeling stage

The domain concept modeling stage is an important stage of data warehouse data modeling. As we have fully clarified the corresponding business scope and processes in the business modeling stage, our main task in the conceptual modeling stage in this field is to abstract concepts. The working layers of Conceptual modeling in the entire field are shown in:

Figure 9. Domain conceptual modeling stage

We can clearly see that domain conceptual modeling uses the entity modeling method to abstract entities, such as entities, events, and descriptions behind the numerous business representations through the entity modeling method, in this way, we can find out the associations between abstract entities after the business representation, ensuring the consistency and relevance of our data warehouse data according to the data model.

From the graph, we can divide the entire abstract process into four layers:

  • The abstract method layer is the core method of the entire data model, and the entity division of domain conceptual modeling is implemented through this abstract method.
  • The domain concept layer is the core part of our entire data model. Different Levels of abstract methods determine different domain concepts. For example, here, we can use the "participant" concept. At the same time, you can divide it into three concepts: "individual", "company ", and "agency. When building our own models, we can refer to the business situation and the needs of our own models, and select a concept with a high degree of abstraction or a low degree of abstraction. Relatively speaking, concepts with a high degree of abstraction are complicated to understand and need to be understood by professional modeling experts. concepts with a low degree of abstraction are more suitable for the understanding of general business personnel, easy to use. I suggest that you use entities with low abstract concepts to facilitate communication between business personnel and technical personnel.
  • The specific business layer is mainly used to solve specific business problems. From this figure, we can see that the specific business layer is actually only a combination of different entities in the domain concept model. Therefore, the complete data model of the data warehouse should be able to meet the needs of flexible and variable front-end services, and its own model architecture is highly flexible. This is also a feature of the data warehouse model.
  • The business main line layer, which is mainly divided into large business areas. Generally, this layer has been completed in the business modeling stage. We generally use this big business main line to divide the entire big business model framework.

Through domain conceptual modeling, data warehouse models have been abstracted into entities, and the model frameworks have been built. The following work is to inject effective muscles into these frameworks.

3. Logic modeling stage

After the domain concept modeling, although the model framework has been completed, there is still a lot of meticulous work to complete. At this stage, we also need to do a lot of work, including:

  • For example, after the above conceptual model, we need to instantiate the abstract entities such as "person" and "company. Mainly, we need to consider the attributes of "person", which of the following attributes are used in the business module related to "person, we all need to attach these attributes to the "person" entity of our data model, such as the "person"'s age, gender, educational level, and so on. Similarly, we also need to do this for other attributes.
  • Find the relationships between abstract entities and use their examples. Here, we mainly consider the example of the abstract concept of "Event". For example, we must consider the property of the "event" of pension collection, the attributes of the "event" for unemployed workers to be trained must be considered.
  • Find and describe the relationship between abstract events. Here we mainly want to provide a complete "Description" for "events ". For example, consider the regions, events, and other factors in "events.

All in all, in the logical modeling stage, we mainly consider some meticulous attributes of abstract entities. In the logical modeling phase, we can concatenate the entire conceptual model into an organic entity to fully express the relevance between businesses.

At this stage, I suggest you refer to the 3nf modeling method to express the attributes of an object and the relationship between the object and the object. For example, at this stage, we can use modeling tools such as Erwin to create a 3nf-compliant relational data model.

4. Physical Modeling stage

The physical modeling stage is the last process of data modeling. This process is actually a process of putting the previous Logical Data Model into practice. Considering the differences in the data warehouse platform, the physical modeling process of the data model may be slightly different. At this stage, our main work is:

  • Generate a table creation script. Different data warehouse platforms may generate different scripts.
  • Optimize different data warehouse platforms, such as creating MQT tables for DB2 data warehouses to accelerate report generation.
  • To meet the needs of data mart, generate some fact tables and dimension tables based on the dimensional modeling method.
  • To meet the ETL vehicle and metadata management needs of data warehouses, generate tables maintained by data warehouses, such as log tables.

After the physical modeling stage, the entire data warehouse model has been completed. We can create a data model to meet our own needs for the current industry according to our own design.

Here, I use a Data Modeling example to give readers a perceptual knowledge about data warehouse modeling. It is hoped that when using these data warehouse modeling methods to create your own data model, you can create a suitable data model based on your actual business needs and your understanding of the abstract capabilities.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.