Discussion on data modeling method in Data Warehouse construction

Source: Internet
Author: User

The so-called water no fixed, the soldier impermanence method. Different industries, have different characteristics of the industry, therefore, from a business perspective, its corresponding data model is very diverse. At present, the mainstream of the industry is the data Warehouse manufacturers are mainly IBM and NCR, these two companies in addition to provide a more powerful data warehouse platform, but also have their own industry-specific data model.
For example, in banking, IBM has its own BDWM (Banking Data Warehouse model), and NCR has its own FS-LDM models. In the telecommunications industry, IBM has TDWM (Telecom Data Warehouse model), and NCR has its own TS-LDM models. Therefore, we see that different companies have their own understanding of an industry, so there will be different companies targeting an industry model. And for different industries, the same company will have different models, which depends on the different business characteristics of different industries.
For example, IBM's TDWM model contains the following 9 concepts, such as:

Figure 1. IBM's TDWM conceptual model

Many people may ask, why are your models 9 concepts instead of 10 and 11? What is the basis of your Data warehouse model? In fact, this is a question we often ask when we introduce our data model to our customers, and I hope readers will find their answers when they finish reading this article.
Although each industry has its own model, but we found that the data model of the different industry, in the method of data modeling, but all have common basic characteristics.
One of the main purposes of this article is to hope that readers can read this article, at the same time, combined with their own experience in the construction of Data Warehouse, in the construction of the data Warehouse can be summed up a set of suitable for their own modeling methods, can better help customers to play the role of Data Warehouse.
The main thread of this paper is to answer the following three questions:
· What is a data model
· Why a data model is needed
· How to build a data model
Finally, at the end of this article, we introduce a concrete example of data Warehouse modeling to help you understand the whole data modeling process.

What is a data model
The data model is a kind of tool and method of abstract describing the real world, it is a kind of mapping that represents the interrelationship of the real world transaction through the form of the relation between entity and entity. Here, the data model shows the abstraction of the relationship between entities and entities, and expresses the actual business relationships by defining and describing the relationships between entities and entities.
The Data warehouse model is a specific data model for a specific Data Warehouse application system in the data model, in general, our data Warehouse model is divided into several levels, as shown in 1.
Figure 2. Data Warehouse Model

From the graph above, we can easily see that throughout the Data Warehouse modeling process, we need to go through a general four processes:
· Business modeling, generating business models, primarily addressing business-level decomposition and sequencing.
· Domain modeling, the generation of domain models, mainly to the business model to abstract processing, the generation of domain conceptual model.
· Logical modeling, the generation of logical models, is mainly the domain model of the concept of entities and the relationship between the database hierarchy of logic.
· Physical modeling, the generation of physical models, the main solution, the logical model for different relational database physics and performance of some specific technical issues.
Therefore, in the entire data warehouse model design and architecture, both business knowledge and specific technology, we need to understand the rich industry experience, but also need some information technology to help us to achieve our data model, most importantly, we also need a very applicable methodology, To guide ourselves to the abstraction, processing, and generation of models for our business.

Why do I need a data model
     in the construction of the Data Warehouse, We have repeatedly emphasized the need for a data model, so why is the data model so important? First, we need to understand the history of building the entire data warehouse.
     The development of the data Warehouse has gone through three processes:
     ·  Simple reporting phase: this stage, The main goal of the system is to address the reporting needs of business people in daily work and to generate some simple summary data that will help leaders make decisions. Most of this phase is in the form of database and front-end reporting tools.
    ·  Data mart stage: This stage, mainly according to the needs of a business unit, a certain amount of data collection, collation, according to the needs of business personnel, the presentation of multidimensional reports, to provide specific business guidance data , and can provide specific leadership decision-making data.
    ·  Data Warehouse phase: This stage, mainly in accordance with a certain data model, the entire enterprise data collection, collation, and can according to the needs of various business units, provide cross-departmental, fully consistent business reporting data , the ability to generate data that is instructive to the business through a data warehouse, while providing comprehensive data support for leadership decisions.
     through the development stage of data warehouse construction, we can see that the key difference between the construction of data warehouse and the construction of data mart lies in the support of data model. Therefore, the construction of data model, for our data warehouse construction, has a decisive significance.


In general, the construction of the data model can help us solve some of the following problems:
· conduct comprehensive business grooming and improve business processes . At the stage of business model construction, we can help our enterprise or the management organization to comb the business of this unit comprehensively. Through the construction of the business model, we should be able to fully understand the business frame composition of the unit and the operation of the entire business, to be able to classify and program the business according to specific rules, while helping us to further improve the business process, improve business efficiency and guide the production of our business units.
· establish a full range of data perspectives to eliminate information silos and data differences . Through the model construction of Data Warehouse, it can provide an overall view of the data of the enterprise, it is no longer the individual departments just pay attention to their own data, and through the construction of the model, outlines the internal relations between departments, help eliminate the problem of information silos between various departments, more importantly, through the construction of data model, Can ensure the consistency of the entire enterprise data, the differences between the various departments will be effectively resolved.
· address business changes and data warehousing flexibility. through the construction of the data model, the realization of the bottom technology and the presentation of the upper layer business can be separated well. When the upper business changes, through the data model, the underlying technology implementation can be very easy to complete the business changes, so as to achieve the flexibility of the entire data Warehouse system.
· help build the Data Warehouse system itself. through the model building of Data Warehouse, it is easy for developers and business staff to define the scope of the system construction and plan the long-term goal, so that the whole project team can clear the current task and speed up the whole system construction.

How to build a data model
Since the construction of data model is a very important part of the whole data warehouse construction, how to build our Data warehouse model is a problem we need to solve. Here we are going to detail how to create a data model that works for you.


Data Warehouse Data Model architecture
The architecture of the data model for the Data warehouse and the overall architecture of the data warehouse are closely related, so let's start with a look at a few parts of the data model for the entire data warehouse. From what we can clearly see, the architecture of the entire data model is divided into 5 parts, each of which has its own unique functionality.
Figure 3. Data Warehouse Data Model architecture

From what we can see, the data model of the entire data warehouse can be divided into roughly 5 parts:
· System record fields (Systems of records): This section is the primary Data Warehouse business data store, where the data model guarantees data consistency.
· Internal admin domain (housekeeping): This part mainly stores the metadata for internal management of the Data Warehouse, where the data model can help to manage the unified metadata.
· Rollup Fields (Summary of area): This part of the data from the System record field summary, the data model here to ensure the analysis of the topic analysis domain performance, to meet the partial report query.
· Analysis domain: This part of the data model is primarily used for specific thematic business analysis across business segments. This part of the data model can be stored separately in the appropriate data mart.
· Feedback field (Feedback area): Optional, this part of the data model is mainly used for the corresponding front-end feedback data, the Data warehouse can be based on the needs of the business to set up this region.
By dividing the data region of the entire data warehouse model, we can see that a good data model is not only an abstraction of the business, but also a concrete guide to the implementation technology, which should cover the various parts from the business to the implementation technology.

Data Warehouse Modeling Phase Partitioning
We've covered several levels of the Data warehouse model before, so let's talk about the main content of the data modeling work for different stages of these levels:
Figure 4. Data Warehouse Modeling Phase Partitioning

As we can clearly see, data modeling of data warehouses is broadly divided into four phases:
Business modeling, this part of the modeling work, mainly consists of the following parts:
· Division of the entire unit of business, generally in accordance with the Division of Business Division, the various parts of the business work of the definition, clarify the relationship between the business units.
· Gain insight into the specific business processes within each business unit and programmatically.
· Propose methods and procedures for modifying and improving the business unit workflow.
· The scope of data modeling is defined, and the objectives and phases of the entire Data Warehouse project are divided.
Domain concept modeling, this part of the modeling work, mainly consists of the following parts:
· Extract key business concepts and abstract them.
· Group Business concepts and aggregate similar grouping concepts according to the business line.
· Refine the grouping concept, clarify the business process within the grouping concept and abstract it.
· Clarify the association between grouping concepts and form a complete domain conceptual model.
Logical modeling, this part of the modeling work, mainly consists of the following parts:
· The business concept is manifested and its specific attributes are considered
· Events are manifested and their property contents are considered
· The description is manifested, and the content of its attributes is considered
Physical modeling, this part of the modeling work, mainly consists of the following parts:
· Make appropriate technical adjustments for a specific physical platform
· Adapt the specific platform to the performance considerations of the Model
· Adapt to the needs of management, combined with a specific platform to make appropriate adjustments
· Generate the final execution script and refine it.
From the different stages of the data modeling phase of the data warehouse, we can understand the main work and workload of the whole data Warehouse modeling, and hope to be able to help us in the actual project construction.

modeling method of Data Warehouse
The universe, the surface of the colorful, in essence, all things follow its own laws. There are also many ways to model data warehouses, each of which represents a philosophical perspective and represents a way to generalize and generalize the world. At present, the industry more popular Data Warehouse modeling method is very much, here mainly introduce the paradigm modeling method, dimension modeling method, solid modeling method and several other methods, each method is essentially from a different perspective of our business problems, whether from the technical level or business level, in fact, represents a philosophical worldview. Let us give you a detailed introduction to these modeling methods.
1. Paradigm Modeling Method (third normal form,3nf)
The paradigm modeling method is a common method in constructing data model, which is mainly advocated by Inmon, which mainly solves the data storage of relational database and a technical method of using it. At present, the modeling method in relational database is mostly based on the three-paradigm modeling method.
Paradigm is the basic theory of database logical model design, a relational model can be decomposed from the first paradigm to the fifth paradigm, and this process can also be called normalization. In the model design of Data Warehouse, the third paradigm is generally adopted, which has strict mathematical definition. According to the meaning of its expression, a relationship conforming to the third paradigm must have the following three conditions:
· Each attribute value is unique and does not have ambiguity;
· Each non-primary attribute must be completely dependent on the entire primary key, not part of the primary key;
· Each non-primary property cannot depend on attributes in other relationships, because this attribute should be attributed to other relationships.
Because the paradigm is based on the whole relational database theory based on the development, therefore, I do not introduce more here, interested readers can read the corresponding materials to obtain this knowledge.
According to Inmon's point of view, the Data warehouse model is similar to the enterprise data model of the business system in terms of building methods. In the business system, the enterprise data Model determines the source of the data, and the enterprise data model is divided into two levels, that is, the subject domain model and the logical model. Similarly, the subject domain model can be regarded as the conceptual model of the business model, while the logical model is the instance of the domain model on the relational database.
Figure 5. Paradigm Modeling Method

When we move from the business data model to the Data warehouse model, we also need the domain model of the Data Warehouse, the conceptual model, and the logical model of the domain model. Here, the data model in the business model is slightly different from the model of the Data Warehouse. The main difference is:
· The domain model of the Data warehouse should include the relationship between the domain models of the enterprise data model, as well as the subject domain definitions. The concept of a data warehouse domain model should be more extensive than the subject domain model of a business system.
· The logical model of a data warehouse needs to abstract entities from the logical model in the data model of the business system, attributes of entities, subclasses of entities, relationships of entities, and so on.
From the author's point of view, the biggest advantage of Inmon's paradigm modeling method is that it combines the data model of the business system and makes it easier to model the data warehouse from the point of view of relational database. But its disadvantage is also obvious, because the modeling method is limited to the relational database, in some cases it restricts the flexibility of the entire data warehouse model, performance, especially considering that the data warehouse of the underlying data to the data mart data aggregation, need to make certain modifications to meet the corresponding needs. Therefore, the author suggests that readers use this modeling approach in their actual use.

2. Dimension Modeling Method
Dimension modeling method, Kimball first proposed this concept. The simplest description is to build the data warehouse and data mart according to the fact table and the dimension table. The most widely known name of this method is the star pattern (Star-schema).
Figure 6. Dimension Modeling Method

is a typical star schema in this architecture. The star pattern is widely used, it is a lot of preprocessing for each dimension, such as pre-counting, classifying and sorting according to the dimension. Through these preprocessing, it can greatly improve the processing ability of the Data warehouse. In particular, for the 3NF modeling approach, star mode has a significant performance advantage.
At the same time, the other advantage of dimension modeling method is that the dimension modeling is very intuitive and closely surrounds the business model, which can directly reflect the business problems in the business model. You do not need to undergo a special abstraction, that is, dimension modeling can be done. This is also the advantage of dimensional modeling.
However, the disadvantage of the dimensional modeling method is also very obvious, due to the need for a large number of data preprocessing before building the star pattern, it will result in a lot of data processing work. Furthermore, when the business changes and the definition of the dimension needs to be re-processed, it is often necessary to re-preprocess the dimension data. In these and processing, it often leads to a large amount of data redundancy.
The disadvantage of another dimension modeling method is that if we just rely on the simple dimension modeling, it can't guarantee the consistency and accuracy of the data source, and at the bottom of the data warehouse, it is not particularly applicable to the dimension modeling method.
Therefore, from the author's point of view, the domain of dimension modeling is mainly applicable to the Data Mart layer, and its biggest function is to solve the performance problem in Data Warehouse modeling. Dimensional modeling is difficult to provide an abstract method that completely describes the complex relationships between real business entities.


3. Solid Modeling method
Entity modeling is not a common method in Data Warehouse modeling, it originates from a genre of philosophy. In the sense of philosophy, the objective world should be subdivided, and the objective world should be divided into one entity, and the relationship between entity and entity. Then we can fully introduce this abstract method in the modeling process of the data warehouse, the whole business can also be divided into a single entity, and the relationship between each entity and the description of these relationships is the work we need to do with data modeling.
Although the substantive law seems to have some abstraction, it is easy to understand. That is, we can divide any business process into 3 parts, entities, events and descriptions, as shown in:
Figure 7. Solid Modeling Method

     expresses an abstract meaning, if we describe a simple fact: "Xiao Ming drives to school". Take this business fact as an example, we can think of "Xiao Ming", "school" as an entity, "school" describes a business process, we can be abstracted here as a specific "event", and "drive" can be regarded as the event "school" a description.
From the above example we can understand that the abstract induction method we use is actually very simple, any business can be seen as 3 parts:
       entity, which refers to a particular conceptual subject in a domain model, refers to the object in which the business relation occurs.
       event, which mainly refers to the process of completing a business process between the conceptual bodies, especially a particular business process.
       description, which is primarily a special description of entities and events.
     because of the entity modeling method, it can easily realize the division of business model, so the entity modeling method has a wide application in the business modeling stage and the domain concept modeling stage. From the author's experience, there is no ready-made industry model, we can adopt the method of solid modeling, together with customers to clear the whole business model, the division of the domain concept model, abstract the specific business concept, combined with the customer's use characteristics, can create a matching their own needs of the Data warehouse model.
     However, the entity modeling method also has its own congenital defects, because the entity description method is only an abstract objective world method, so it is doomed that the modeling method can only be confined to the business modeling and domain concept modeling phase. Therefore, in the logical modeling phase and the physical modeling phase, it is the stage of paradigm modeling and dimension modeling to exert their strengths.
      Therefore, the author suggests that when creating your own data warehouse model, you can refer to the three Data Warehouse modeling methods described above, using different methods at various stages, This enables the quality of the entire data warehouse model to be guaranteed.

Data Warehouse Building Example
Described above are some of the abstract modeling methods and theories, may be relatively difficult to understand, so, I give an example here, the reader can follow our example, to get an initial understanding of the entire Data Warehouse modeling process.
Background introduction
Readers familiar with the social security industry can know that at present, our country's social security is mainly divided into old age, unemployment, work injury, fertility, medical insurance and labor market 6 major areas of business. In these 6 business areas, the current situation of the old-age and career system has been basically improved, a part of the data began to network detection. And, for work injury, fertility, medical and labor market this piece of business, some local development is more mature, and some places are not mature enough.
1. Business Modeling Phase
Based on the above background, we can easily divide the corresponding business in the business modeling phase. Therefore, in the business modeling phase, we basically determine the objectives of our data warehouse construction, construction methods, and long-term planning and so on. Such as:
Figure 8. Business Modeling Phase

Here, we have clearly divided the entire business into a number of major business lines, such as: old age, unemployment, work injury, fertility, medical, labor and so on a few large parts, and then we can according to these large modules, in each business main line, to consider the specific business thread within the business topic needs to be analyzed.
Therefore, the business modeling phase is actually a process of combing business with the business people, in this process, not only can help our technical staff to better understand the business, on the other hand, can also find some unreasonable links in the business process, to improve and improve.
At the same time, another important part of the business modeling phase is to determine the scope of our data modeling, for example: Within a business module where some data is not well-prepared, we can consider not building a corresponding data model. When conditions are fully ripe, we can consider the problem of data modeling again.

2. Domain Concept modeling phase
Domain Concept modeling phase is an important stage of data warehouse data modeling, because we have fully cleared the corresponding business scope and process in the business modeling stage, therefore, our most important work in the concept modeling phase of this domain is to carry on the abstraction of concept, the working level of the whole domain concept modeling is as follows:
Figure 9. Domain Concept Modeling phase

     from what we can clearly see, domain concept modeling is the use of solid modeling method, from the complex business representation behind the entity modeling method, abstract entities, events, descriptions and other abstract entities, In order to find out the correlation between abstract entities after business representation, we can ensure the consistency and relevance of data warehouse data according to the data model.
     from the graph, we can divide the whole abstract process into four levels, namely:
    ·  abstract method layer, the core method of the whole data model, The Division of Domain conceptual modeling entities is achieved through this abstract approach. The
    ·  domain concept layer, which is a central part of our entire data model, because different degrees of abstraction determine the difference in our domain concepts. For example: Here we can use the concept of "party", and at the same time, you can divide him into three concepts: "Personal", "Company", and "agency" three concepts. When we build our own models, we can refer to the business situation and the needs of our own models, choose a high degree of abstraction or the concept of a low level of abstraction. Relatively speaking, the concept of high abstraction, understanding is more complex, requires professional modeling experts to understand, and the concept of low level of abstraction, more suitable for general business people's understanding, the use of more convenient. It is suggested here that readers can choose entities with lower abstract concepts to facilitate communication and communication between business people and technicians.
    ·  specific business layer, mainly to solve specific business problems, from this diagram we can see that the specific business layer, in fact, is only the domain conceptual model of the entity between some of the different combinations. Therefore, the data model of the complete data warehouse should be able to the needs of the corresponding flexible front-end business, and its own model architecture has very strong flexibility. This is also one of the functions of the Data Warehouse model.
     ·  Business Main Line layer, this level is mainly divided into large business areas, generally in the business modeling phase has been completed this division. We typically use this large business line to divide the entire business model into a large framework.
      through domain concept modeling, Data Warehouse model has been abstracted into one entity, the framework of the model has been built, the following work is to give these frameworks to inject an effective body.


3. Logical modeling Phase
     after domain concept modeling, although the framework of the model has been completed, there are still a lot of meticulous work to be done. Generally at this stage, we also need to do a lot of work, mainly including:
    ·  instance of each abstract entity, for example: After the conceptual model above, we need to "people" and "Company" Are instantiated by these abstract entities. The main point is that we need to consider the attributes of "people" including those that are used in the business module, which are all related to "people", and we all need to attach these attributes to the "person" entity of our data model, such as "people" Age, gender, education level, etc. Similarly, we need to do this work for other properties as well.
    ·  finds links between abstract entities and adds them to the instance session. Here, we mainly consider an instance of the abstract concept of "event", for example: the attribute of the "event" of the pension levy is taken into account, and the attributes of the "event" for the Unemployed worker training are taken into account.
     ·  Find out the relationship of the abstract event and explain it. Here we are mainly to the "event" to complete the "description." For example, the "events" in the region, events and other factors such as considerations.
      In conclusion, in the logical modeling phase, we mainly consider some fine attributes of abstract entities. Through the logical modeling phase, we are able to fully concatenate the entire conceptual model into an organic entity in order to fully express the relationship between the business.
      At this stage, I suggest that you can refer to the 3NF modeling method to express the attributes of the entity and the relationship between the entity and the entity. For example: At this stage, we can use modeling tools such as ERWIN to make a relational data model that conforms to 3NF.


4. Physical modeling Phase
The physical modeling phase is the last process of modeling the entire data, and this process is actually a process of landing the previous logical data model. Considering the differences in the Data Warehouse platform, the physical modeling process of the data model may be slightly different, at this stage our main work is:
· Generates a script that creates a table. Different Data warehouse platforms may generate different scripts.
· For different data warehouse platform, to do some corresponding optimization work, for example, for DB2 Data Warehouse, create some of the table, to speed up the generation of reports and so on.
· For the needs of data mart, according to the method of dimension modeling, some facts table, dimension table and so on are generated.
· For the Data Warehouse ETL vehicle and metadata management needs, to generate some data warehouse maintenance tables, such as: Log table.
Through the physical modeling phase, the entire data warehouse model has been completely completed, we can according to our own design to the current industry to create a data model to meet their needs.
Here, the author through a data modeling sample, I hope to give the reader a data warehouse modeling perceptual understanding. It is hoped that readers can use these data warehouse modeling methods to create their own data model, according to the actual needs of business and their own grasp of abstraction ability to create a suitable data model.

Original address: http://www.ibm.com/developerworks/cn/data/library/techarticles/dm-0803zhousb/

Discussion on data modeling method in Data Warehouse construction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.