Dw2.0-roles and roles of business models and data models during development

Source: Internet
Author: User

Dw2.0The content is very broad. It includes all the data in the enterprise, as well as external data; it includes yesterday's data, as well as today's data. Among some large multinational enterprises,Dw2.0The scope of the content is quite large.

And,Dw2.0The data in is also saved to a very fine granularity.

In this way, to meet the various needs of enterprisesDw2.0The Design and Construction of is very complex and requires advanced technologies.

AlthoughDw2.0The design faces such difficulties and challenges. In fact, it can be well completed. This requiresDw2.0In the process of building, the relationship between large enterprise perspectives and trivial details is well balanced.

So, suchDw2.0How is the environment building model implemented? When building such a large model, where should the designer start?

BuildDw2.0The starting point should be the modeling process. In buildingDw2.0There are two basic models, one is the business processing process model (Process Model), The other is the data model (Data Model). The business processing process model includesHipoCharts, function breakdown charts, and data flow charts. Data models includeERDGraph, physical data planning, data item set, and so on.

The business processing process model is applied to the data mart environment. The data model is applied to the integration, nearline, and archive areas.

For the interaction zone design, you must also refer to the business processing process model and data model.

Data Model

The data model isDw2.0The core of the integration, nearline, and archive areas has three different levels. The three levels areERDLevel (Entity Relationship digoal, Object relationship diagram ),DISLevel (Data item set, Data item set) and physical model level (Physical Model). Each level has its own uniqueness, and each level is a necessary part of the data model.

This kind of modeling for different levels has a major feature, that is, although there are significant differences between different levels, they are both technically related and knowledge-based. Modeling at each level provides different levels of data.

ERDThe level of data modeling is the highest level, reflecting the characteristics of the data from the highest perspective. It is a highly abstract data. Make an analogy,ERDLevel is like a globe. Just as the globe shows the relationships between all continents and all countries and their locations,ERDShows the relationship between data sets.ERDThe essence of a data model is the full set of highly abstract models.

DISThe level is medium-level modeling,DISA model is like a map of a country or state in the Earth, such as a map of Texas. People don't expect this map to provide information around Singapore or Africa. This map is much more detailed than the globe, recording some cities that are invisible to the globe on the state map.DISFocus on a subset of the globe and contain more detailed information not found on the globe.

The physical model is a low-level modeling and the most detailed modeling. A physical model is like a map of a city. Its range ratio isDISAndERDSmall. From another perspective, physical models are more detailed than the other two levels. Physical models can record locations such as post offices, schools, and parks. City maps contain a smaller range than state maps, but contain more details.

The three levels of data modeling are closely related.

ERDModel

ERDThe model consists of two main parts: one is the entity, or the enterprise's theme domain, and the other is the association between these theme domains.

An object is the highest level of abstraction of data. As an example of the highest level of abstraction, we consider an entity and a customer entity. Customer entities can include enterprise customers or individual customers. They can include former customers or potential customers. In short, customers of various types and States are highly abstracted into a customer entity.

The associations between entities are also included inERD. Link (RelationshipIs used to identify the associations between entities. Generally, you need to add a brief description to the associations.

For example, the customer may place an order, the consignor may accept a package, and an order may include a product.

A feature of a link is that the link is directly associated with two entities. A link cannot be associated with three or more entities.

A special type of link is recursion. Recursive relationships describe the relationships that occur on two identical entities. A simple example of recursive relationships is the relationship between father and daughter. Both father and daughter belong to the same entity. This entity is a person. Therefore, a recursive relationship is established on the human object.

BecauseERDBuilt on such a highly abstract levelERDNot very large. Before you establish other data modelsERDModel.

DISModel

DISA model is a part of the data model at the intermediate level. InDISIn the model, attributes, keys, and other data and relationships are identified.

ERDEach entity (or topic domain) in the model corresponds to oneDISModel.DISThe model is refinedERDEvery entity of the model.

CorrespondsDISEach part has a different physical data model.

EveryDw2.0Developers are facing a problem, that is, starting DevelopmentDw2.0Whether or not all detailed data models should be fully established first. The answer is no. If data warehouse development starts until all data models are fully established, no data warehouse will be created. Data Warehouse CreationCubeIs an incremental development mode. First, create a part of the data warehouse, and then create another part.

For example, the completeERDThe model needs to be created and then createdERDTheDISModel, and then createDISThe physical data model of the model. At this time,ERDThe data model corresponding to other entities in the data model has not yet been created. Although most data models have not yet been created, you can develop these models first.

Then, gradually establishDISModel and physical data model.

Then create the next one.

With this gradual iteration process, the entire data warehouse is established and the complete data model is also established.

This iterative Data Warehouse creation process may go through many iterations, which are completed by multiple projects.

From a certain point of view, the complete data model of an enterprise is like a piece of puzzle, which can be put together at a time to form a complete pattern.

(There is another problem here. we mentioned in the previous chapter that the results of each iteration can be integrated only because iteration uses the same data model, in this chapter, we do not need to build all the data models. After multiple iterations, we gradually build a complete data model. This seems to be a conflict.

in my understanding, the same data model mentioned in the previous chapter refers to ERD model, inmon ERD the topic domain model, that is, the data warehouse is the topic created by topic. This model is a highly abstract data model. It should be established when the data warehouse starts development. It is the basis of the complete data model of the enterprise, dis ERD ">.

In this chapter, a complete data model is not required.DISAnd physical data models,ERDThe model should be created first.InmonThe physical data model mentioned here should be different from the physical data model in terms of scope. The physical data model here should correspond to a table and its related physical characteristics, we often say that the physical data model is a physical model for all tables in a project.

In shortERDThe topic domain model should be established at the beginning of Data Warehouse creation.ERDBased on the model, you can gradually developDISAnd physical data models.)

Data Model Source

How should we establish an enterprise data model? There is one way to design it based on the situation of the enterprise by means of customization. This method of designing an enterprise data model from the ground up is expensive and takes a long time. This approach is usually unnecessary.

We can consider that the data models of the two insurance companies are basically the same, the data models of the two banks are basically the same, and the data models of the two manufacturing industries are also very similar. That is to say, data models in the same industry are similar.

So if several banks have already built their own data models, why should we re-invent the wheel. Why not work with them? You can refer to the data model they have already created. In reality, you can buy or rent their data models, and then modify, increase, or decrease according to your enterprise's needs. Building an enterprise data model on the basis of a general data model is much faster than self-design, and the cost is much lower, especially when data models in this industry are mature.

If you cannot find another enterprise that is willing to share the data model with you, there are two areas in the internet or books that can provide the data model. The first place is an Internet station with a free data model. This website isHttp://www.inmoncif.com. The other isLen silversonThe general data model can be used inAmazon.comBuy. Of course, there are many other places where you can find the common data model.

(Len silversonThe title is 《The data model Resource BookThere are two books in total. These two books have been translated into Chinese. The book is titled "Data Model Resource Manual". You can download the first English version on the Internet, the English version of the second version has not been seen yet.)

When we get or consider using a general data model, we should note that the general data model is not a perfect data model for us. No matter how complete and complex the general data model is, we need to modify and increase or decrease it based on the actual situation of the enterprise.

In fact, people are better at modification, increase and decrease, and people are not good at designing from a blank sheet of paper. The role of a general data model is to shift people from the Creator's location to the editer's location.

Note that there are two types of common data models: common operation data models and general data warehouse data models. When building a data warehouse based on a general data model, we should note that the data model is based on the general data warehouse data model, rather than the general operation data model.

Create a data model

The first step to establish a data model is to select a topic and create a topic domain model, that is, a high-level entity model. After the topic domain model is created, you need to divide all the data into two types: raw data (Primitive data) And derived data (Derived Data).

Raw data is the data with the lowest granularity and cannot be subdivided. Examples of raw data include the customer name, customer address, and customer deposit amount.

Derived Data is data that can be processed by more fine-grained data through computation, aggregation, and other data processing. Examples of derived data include the number of customers, monthly sales, and daily transaction volume.

Do not include derived data in the data model. A good reason is that the data model should not contain derived data. The reason is that derived data changes very quickly. If the derived data is included in the data model, defining, calculating, and using the derived data will often change the data model. This change will cause great problems for data modelers. Therefore, derived data should not enter the data model.

Another reason that the data model should not contain derived data is that derived data will make the data model too inflated. Derived data will make the data model too large and not practical. Furthermore, due to frequent changes in derived data, the data model will never be completed.

After removing derived data, the next step is to select a topic for development. Sometimes, if a topic is too large, you can select a subset of the topic for development.

Generally, the more deeply the modeler understands the business, the more perfect the model is.

After selecting a topic, the next step is to develop the topicDISModel.

DISThe model describes the data in the topic. Analyze the business requirements related to the topic and find the related keys, attributes, identifiers, and tables. These keys, attributes, and identifiers must be createdDISIn different parts of the model.

DISThe first part of the data in the model is the root data (Root set). The root data is the characteristics and identifiers of the topic. For example, when the topic is a person, the root data includes the identity of the person, such as the ID card number. If the subject is an enterprise, the root data includes the unique identifier of the enterprise, such as the business registration number.

The root data also includes other attributes directly related to the topic.

DISThe second part of the model is type data (Type of data). Type Data describes different categories of the topic. For example, if the subject is a customer, different categories include customer categories, such as historical customers and potential customers. It can also include other classification methods of customers, such as high-consumption customers, commercial customers, and individual customers. The category of the customer in each classification method should be unique.

DISThe third part of the model is repetitive data (Multiply recurring data). For example, the subject is a person. repetitive data includes the education experience of the person, the information of multiple children of the person, and the work unit that the person once attended.

DISThe model content includes

-Key

-Attribute

-Foreign key

Note that a key is only an identifier and is not necessarily unique.

Repetitive data depends on the Root data in a common hierarchical relationship.

In addition, you must note that there may be repetitive data in the type data, and there may also be type data in the repetitive data. (This statement is not clear. In my understanding, the type data here is the type table we commonly use, and the repetitive data is the sub-table of the primary entity, as mentioned above, a person has three work experiences and creates a work experience table with repetitive data. There are three records in the table. If the description of a type table is complex, some attributes of the type table may also be created. Some attributes of a child table such as work experience may also be described in a type table.)

During data warehouse development, all pure operational data should be deleted. For example, a telephone number is a typical pure operational data. InDISIf there is no time element in the model, you need to add a time element.

Dw2.0Another consideration for modeling is to set the default value. Sometimes the source system cannot provideDw2.0All data values are required. In this case, it is necessary to provide default values for data that is not present in the source system.

Merge data items inDw2.0May also be used in modeling. Data in different places in the source system, ifDw2.0If they are often used together, you can consider building them together. That is to say, if different data items are often used together, you can consider creating them in the same physical table. For example, the repeat group technology is used.

Dw2.0InDISAnother common technique in the model is to create Summary data (Summarizations). If data is often used at a higher level, that is, at a higher granularity, you can create a summary data for the data. Summary data is also derived data. Generally, summary data cannot be created in the data model. However, if summary data is widely used, it can also be created.

DISAfter the model is created, you can create a physical data model. The physical model stores all attributes with physical characteristics. When creating a physical model, various physical information is included, such as the table name, related table, key, index, attribute name, attribute type, and attribute length.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.