"Data Modeling" reading notes

Source: Internet
Author: User
Tags abstract definition comparison contains require time limit domain
Notes | data
The recent bookstore has found a good book for data Modeling-data modeling: tools and techniques for analyzing and designing (Workbench:tools and techniques for analytics and design), by Steve Hoberman. After rough, it feels like the book is truly worthy of the praise of translators and foreign experts: "This book is full of techniques and techniques that are useful for improving data models and design, and it is also very enjoyable to read-a great combination!" Any data modeler should have a book by Steve Hoberman on data modeling tools and technologies. ”

Even though I have a certain ego in my knowledge of data modeling, I have benefited a lot from reading the book. With the idea of sharing in good books, I list the summary and tips for each chapter of the book to make it easier for the friends who don't have the book at the moment to refer to the data modeling. The tools and templates described in this book can be downloaded from the author's web site at the address:
Www.wiley.com/compbooks/hoberman

Chapter One: Using anecdotes, analogies, and presentations to illustrate the concept of data modeling

In the general day-to-day communication. We might say and hear a lot of stories, or anecdotes. These stories involve a wide range of topics. Some examples are events that occur around us on weekends, or experiences that are related to our work projects. These anecdotes help to strengthen our relationship with the people around us, enhance our pleasure and have an educational role for us. We can visualize what is expressed by language. Sometimes, when the story ends, it leaves us with information or knowledge that we had never thought of before. Anecdotal information is extremely effective when interpreting the concept of data modeling. There are several reasons for this:
They build a lasting image.
They are fascinating and enjoyable.
They are amplified by the relationship between people.
They reduce stress.

There are three simple steps to successfully fabricate and tell an anecdote about data modeling:
1) define a topic. Keep in mind that the anecdote you are telling has a specific goal or proposition, that is, the story is to explain the concept or terminology of a data modeling.
2 Choose your story. We can choose a variety of story types. We have to consider choosing a short story that is interesting and useful and that is able to convey the intent of the subject clearly and unambiguously.
3 Drill your story. Once you've found the right story, rehearse it until you're confident it will be able to fully express your thesis in two minutes. Avoid telling tales of procrastination.

Data Model analogy
An analogy is to compare two or more concepts to each other to emphasize similarities or differences between them. Analogy is a good technique for introducing exotic things or new things, especially when it comes to the knowledge of computers that are not computer professionals. Hoberman's most common analogies in data modeling are as follows (he uses these analogies to easily impress management with a one-fold rise in wages ^_^):
The principal domain model is a condescending viewpoint.
The data model is a design diagram.
The enterprise model is a world map.
The standard is urban planning.
The Meta Data Warehouse is a library.
The Data Warehouse is the "heart".

Chapter II: Metadata Bingo Game
In simple terms, that is, through the bingo card game, to motivate the project team members to determine the data model and determine the effectiveness of the metadata. The Meta Data Bingo game emphasizes "total win", and if you're lucky, everyone wins at the end of the game.

Chapter III: Ensuring the definition of high quality
This chapter focuses on a tool called Definition checklist, which contains guidelines to ensure that the quality of the definition is at its highest level.

The fourth chapter: the project plan of the Data Modeler
This chapter focuses on four tools for determining the data modeling phase, tasks, tools, and timelines:
• Tools for Data modeling phase: Used to determine the data modeling step at the highest level.
• Phase-task-tools: Extracts stages of the data modeling phase and decomposes them into data modeling tasks.
• Priority triangles: You can take two extreme values from the following three items: high quality, shortest time and lowest cost, but you never have to have three.
• Reliable Estimating tool: The principal domain workload time limit determines the percentage of the entire project for each data modeling phase, depending on the type of application. The task effort tool extracts each task that is identified in the phase-task-tools, and lists the percentages that they should represent for the entire data modeling work product. The combination of these two tools allows you to provide the project manager with a reasonable estimate with a certain degree of accuracy.

The fifth chapter: the analysis of subject area
This chapter explores five key tools that are useful for the main domain analysis phase of data modeling. They should be completed individually in the following order:
1 The principal domain Checklist: A complete list of the principal domains in the new application, as well as definitions and synonyms (or aliases) for each principal domain.
2 The principal domain crud (Create Read Update Delete) Matrix: Contains differences and repetitions of the principal domain between the new application and the existing application, determining the scope of the application.
3 In-the-know Template: Identify the people and documents needed to complete the data-sheet work product for this new application and be used as resources.
4 The principal domain family tree: Contains the source application and several other key information for each principal domain, clarifying where the principal domain data will come from.
5 The Force matrix of the principal domain: use a spreadsheet format to record the publishing level of each metric and fact body field.

Sixth chapter: Modeling of the principal domain
This chapter describes the powerful tools for modeling three team principal domain information:
·“ Business Cleanup Board model.
·“ Application Cleanup Board model.
·“ The early reality check model.

The seventh chapter: Logical Data Analysis
This chapter focuses on four logical data analysis tools that should be used in the following order:
1 Data Element Family tree: A complete list of data elements that contain the application, as well as the source and transform information for each data element, along with several other key data element metadata.
2 data element granularity matrix: In a spreadsheet format, to record each metric and the fact of the publishing level.
3 Data Quality record Template: Show the data elements of each element of the data and some actual data comparison.
4 Data Quality Confirmation Template: The result of recording the metadata of each data element and the comparison of some actual data.

Eighth chapter: Standardized tour and the Reverse Normalization Survival Guide (highly recommended: Is the best I have ever read the relational database of standardized technical documents)
Normalization is a process of eliminating redundancy and applying rules to better understand and express the dependencies and participation that exist between data elements. Normalization consists of 6 levels, and the highest level is the fifth normal form (5NF). General technical documents are considered to reach 3NF, Steve Hoberman to us to indicate a higher goal: 5NF. Graeme Simsion wrote a book titled "Data Modeling Essentials", in which he wrote: "Higher-level paradigms are often misunderstood and overlooked by practitioners, or cited to support unreliable modeling times." "But we need to understand these higher-level normalization because they represent additional canonicalization opportunities and help us further reduce redundant information and improve design flexibility." While the remaining three levels of normalization are likely to produce only a small number of changes, they still have some opportunities to improve flexibility and efficiency. Here is the definition of BCNF&4NF&5NF (much easier to understand than the mathematical formulas listed in the domestic textbook ^_^):
bcnf=3nf+ the following rules:
Each data element relies entirely on the key, the entire key, and does not depend on any other data element except that it depends on the key.
4nf=3nf+ the following rules:
To break up two or more entities in a primary key that have three or more externally constructed data elements and no constraints between the cutting keys.
5nf=4nf+ the following rules:
There are three or more foreign key data elements in the primary key, and the entity decomposition with constraints between the foreign key data elements becomes the many-to-many relationship that all constraints require.

When we climbed the peak of 5NF, then according to the actual demand situation to "reverse normalization" to increase data redundancy, thus simplifying development, improve query speed. Reverse normalization is a process that, after defining a reliable, fully normalized data structure, allows you to selectively introduce repetitive data to facilitate the implementation of specific requirements. Steve Hoberman's "Reverse Canonical Survival Guide" provides a set of measurable scoring criteria for how to increase redundancy appropriately. By examining the 6 questions of each relationship, after accumulating the scores of each problem, we will reverse normalize the relationship when the score is greater than 10 o'clock.

Scoring rules for the "reverse normalization of the Survival Guide":
1. What is the type of relationship: This problem determines the type of relationship we are analyzing. What relationship does the parent entity have to the child entity?
Hierarchical relationship (20 points)
Equal relations (-10 points)
Determine relationship (-20 points)
2. Participation rate: This issue determines the participatory nature of each entity in a relationship. In other words, what are the number of child entity values for a given parent entity value? The closer the parent-child relationship is to "one-to-one", the greater the chance that we will reverse-normalize it.
Up to "a pair of five" ratios (20 points)
As many as "a pair of 100" ratio (-10 points)
Ratio of over "pair 100" (-20 points)
3. How many data elements are in the parent entity
Less than 10 data elements (20 points)
The number of data elements is between 10 and 20 (-10 points)
More than 20 data elements (-20 points)
4. What is the usage rate: When users need information from the child, do they usually need information from the parent? In other words, how much is the coupling or correlation between the two entities?
Strong correlations (30 points)
The correlation between each other is weaker or unrelated (-30 points)
5. A placeholder for the parent entity: in the near future, do we intend to add more data elements or relationships to the parent entity? If the answer is "no", then the feasibility of reverse normalization is even stronger.
Yes (20 points)
No (-20 points)
6. What is the rate of change: the problem is to determine whether the frequency of inserts and updates is similar between two entities during the same time period. If one of the entities rarely changes and another entity changes frequently, then we are very inclined to maintain their normalized state and place them in their respective tables.
Same (20 points)
Different (-20 points)

How to use the Reverse Normalization Survival Guide:
1 The relationship in the model is sorted according to the priority level
2) Select a relationship
3 answer questions about this relationship
4 If the score is equal to or greater than 10, the reverse normalization
5) Return to step two until all relationships are completed.

Nineth Chapter: Abstract Security Guides and components
You've seen me. "Talking about database design skills (ON)" Friends should remember my second example: the design of the commodity Information table on the online e-commerce platform. This chapter goes to the theoretical stage with the method I used in the example above, the object-oriented design is used to extract the common attributes of all goods, abstract them into a superclass, and then add a table to record the details between the different entities to derive the superclass, so that the design flexibility can be realized. Abstraction is extremely useful in any situation where the following two conditions are present:
Design needs to be maintained permanently: require that you do not modify database design as much as possible
Requirements may change: application requirements change, and business process reengineering or functional upgrades required
Data Warehouse: When a new classification type is passed from the source application, we do not need to make any changes to the design of the data warehouse, but simply add a new line to the classification type entity
Meta-Data Warehousing library: Similar to the requirements of the Data Warehouse

Of course, abstraction can greatly increase the amount of work and complexity of development, and people usually focus on very short-term applications and immediate costs, without caring about the much higher costs in the future. So I really agree with the idea of agile software development: In the first place, almost no upfront design, but once the requirements change, at this time as a pursuit of excellence, the programmer should review the entire architecture design, in this revision to meet similar changes in the future system architecture.

Abstract components are small, abstract model fragments that can be reused in many modeling situations, no matter what the industry, organization, or even the subject area is modeled. Once you have used abstractions more than once in the key-mode phase, you will begin to see trends in the abstract structure that appears. These "abstract components" have the following purposes:
Speed up Design
Speed up development
Provide a common and useful organization

Tenth chapter: The technique of data model beautification
This chapter focuses on how to improve the visual appearance of the logical and physical data models, so that our design transcends direct application requirements. Five categories of landscaping techniques are discussed in this chapter:
Logical data Element Arrangement tips: These techniques are a recommended way to sort the data elements of each entity in your logical data model.
Physical data element Sorting tips: These techniques focus on the optimal layout of each entity in the data model.
Physical Layout tips: These techniques focus on the optimal layout of each entity in the data model
Relationship Layout tips: These techniques focus on how to adjust overlapping relationship lines and the relationships that appear to cross (rather than bypass) unrelated entities
Tips for attracting attention: these techniques focus on some of the elements, entities, or relationships that are prominent in our involvement.

The 11th chapter: Planning a prosperous data modeling career
A list of ten warnings for data modelers:
1 Remember: flexibility, accuracy and background
2 modeling is just a small part of your job.
3) Try other roles
4 Understand the 95/5 rule: 95% of the time will be spent on 5% of the data elements
5 data modeling is never boring: if you've been working on data modeling and you've found yourself often bored, you really should be changing. This may not be boring in the field of data modeling itself, but your particular task, company, or industry is no longer exciting. Take a risk and try a different project or industry to work on data modeling.
6) Standing in the forefront of technology
7 try not to involve emotional factors in the model: modelers must understand that people's opinions in the review process are not for the creator of the model, but for the content of the model. That is the old saying: not to the person.
8 Let your creativity spread its wings: it is important to be creative when considering new ways of documenting data needs and improving design. Being creative may mean modifying some of the tools in this book. This may also mean presenting your own spreadsheet or other tool.
9 Simple theory is too expensive: in the design activity process, you have to ensure that the idea is kept in mind. The departments and organizations that pay for this application expect to see practical results that can be seen.
10 Become a great storyteller: storytelling is an important part of the job as a data modeler. To help the group civilize and influence project managers and others who lack understanding of our industry, we need to tell stories or anecdotes.

Finally, I personally feel that Steve Hoberman's idea of "abstract components" is very similar to the "design pattern" in object-oriented design. That is, database experts in multiple data modeling, the various projects in the abstraction of similar parts, extraction of specific modeling fragments, in the future, only in the new project to refine these model fragments, you can quickly build a suitable database schema for the project. However, these modeling fragments are not uniform, standard-setting, and there are currently no books published in this category. I am summing up my own experience in this area, but the level of self-knowledge is limited, dare not to swim in front of high man, only hope that their future release of the relevant articles can play the role of quoted Jade, for Chinese programmers to take the lead in the field of data modeling "design mode."


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.