I recently found a good book on Data Modeling in a bookstore-data modeler's workbench: tools and techniques for analysis and design ), author Steve Hoberman. After a rough review, I feel that this book is indeed worthy of the praise of the translators and foreign experts: "This book is full of techniques and Techniques useful for improving data models and design, it is also very interesting to read-a great combination! Any data modeler should have a book about data modeling tools and technologies from Steve Hoberman ."
Although I have a certain degree of self-confidence in the knowledge of data modeling, I have benefited a lot after reading this book. Based on the ideas shared by everyone in a good book, I listed the summary and tips and suggestions of each chapter in the book to facilitate the reference of friends who do not have the book for data modeling. The tools and templates introduced in this book can be downloaded from the author's website:
Www.wiley.com/compbooks/hoberman
Chapter 1: Use anecdotes, analogy, and presentations to illustrate the concept of Data Modeling
In general daily communication. We may tell and hear many stories, or hear stories that involve a wide range of topics. Some examples are things that happen to us on weekends, or experiences related to our work projects. These anecdotes help strengthen our relationships with people around us, enhance our pleasure, and educate us. We can visualize what is expressed by language. Sometimes, when the story ends, it leaves us with information or more that we never thought of before. Anecdotes are extremely effective when interpreting the concept of data modeling. The reasons are as follows:
They build a persistent image.
They are fascinating and pleasant.
They increase the relationship between people.
They reduce the pressure.
There are three simple steps to create a data modeling story:
1) define a topic. Make sure that your story has a specific goal or topic, that is, to explain a concept or term of data modeling.
2) select your story. There are many types of stories we can choose from. We should consider choosing a brief story that is interesting and useful and can clearly convey the theme intent.
3) drill your story. Once you find a suitable story, you have to practice it until you are confident that it can fully express your topic within two minutes. Avoid telling the drag-and-drop story.
Data Model analogy
The analogy is to compare two or more concepts to emphasize the similarities or differences between them. Analogy is a good technique for introducing external things or new things, especially when introducing computer expertise to non-computer professionals. Hoberman's most common analogy in data modeling is as follows (he uses these analogy to easily impress the management to double his salary ^_^ ):
The subject domain model is a condescending point of view.
A Data Model is a design drawing.
The enterprise model is a map of the world.
The standard is urban planning.
The metadata repository is a library.
A data warehouse is a "heart ".
Chapter 2: Metadata bingo games
To put it simply, bingo.com is used to mobilize the enthusiasm of project team members, determine the data model, and determine the effectiveness of metadata. The metadata bingo game emphasizes "win-win". If you are lucky, everyone can win at the end of the game.
Chapter 3: ensuring high-quality Definitions
This chapter focuses on a tool called the definition checklist, which includes guidelines to ensure that the defined quality is at the highest level.
Chapter 4: project plans of Data modelers
This chapter focuses on four tools for determining the data modeling stage, task, tool, and time limit:
· Data modeling tool: used to determine the data modeling steps at the highest level.
· Phase-task-tool: Extracts various stages of the "Data Modeling phase" and splits them into data modeling tasks.
· Priority triangle: You can take two extreme values from the following three items: high quality, shortest time, and lowest cost, but you never have to think about both of them.
· Reliable estimation tool: the "subject domain workload limit" determines the percentage of each data modeling phase to the entire project based on the application type. The "task workload tool" extracts each task identified in "phase-task-tool" and lists the percentage of the tasks to be used in the entire data modeling work product. The combination of these two tools allows you to provide a reasonable estimation with a certain degree of accuracy to the project manager.
Chapter 5: Subject Domain Analysis
This chapter mainly discusses five key tools. These five tools can help the main domain analysis stage of data modeling. They should be completed one by one in the following order:
1) subject domain checklist: a complete list of subject domains in the new application, as well as definitions and synonyms (or aliases) of each subject domain ).
2) subject domain CRUD (create read update delete) matrix: contains the differences and duplicates between the new application and the existing application, and determines the scope of the application.
3) In-the-know template: Determine the persons and documents required to complete the inter-data model work product of the new application and used as resources.
4) Subject domain family tree: contains the source application of each subject domain and several other key information, clarifying where the subject domain data will come from.
5) subject domain strength matrix: records the publishing layers of each measurement and fact subject domain in the format of a workbook.
Chapter 6: Subject Domain Modeling
This chapter describes the powerful tools for modeling the information of the three main domains:
· Business cleaning board model.
· Application cleaning board model.
· Early reality check model.
Chapter 7: Logical Data Analysis
This chapter focuses on four logical data analysis tools which should be used in the following order:
1) Data Element family tree: contains a complete list of data elements of the application, as well as the source and transformation information of each data element, as well as several other key data element metadata.
2) Data Element granularity matrix: records the publishing layers of each measurement and fact in the format of a workbook.
3) Data Quality record template: displays the staff data of each data element and compares some actual data.
4) data quality validation template: record the metadata of each data element and the comparison results of some actual data.
Chapter 8: Standardization journey and reverse normalization Guide (highly recommended: it is the best standardized technical document for relational databases I have read)
Normalization is a process that removes redundancy and applies rules. It aims to better understand and express the dependencies and participation between data elements. Normalization consists of six levels, with the highest level being the fifth Paradigm (5nf ). Generally, the technical documentation assumes that 3nf can be achieved. Steve Hoberman gave us a higher goal: 5nf. Graeme simsion wrote a book named "Data Modeling essentials". In this book, he wrote: "high-level paradigms are often misunderstood and ignored by practitioners, or be referenced to support unreliable modeling time." However, we need to understand these high-level Standardization because they reflect additional standardization opportunities and help us further reduce redundant information and improve design flexibility. Although the remaining three normative layers may only produce few changes, they still have some opportunities to improve flexibility and efficiency. The following is the definition of bcnf & 4nf & 5nf (much easier to understand than the mathematical formula listed in Chinese textbooks ):
Bcnf = 3nf + the following rules:
Each data element is completely dependent on the key and the entire key, and does not depend on any other data element except the key.
4nf = 3nf + the following rules:
We need to break down three or more external data elements in the primary key, and cut the entities with no constraints between the exclusive keys into two or more entities.
5nf = 4nf + the following rules:
Put three or more foreign key data elements in the primary key, there are constraints between these foreign key data elements. The entity is decomposed into many-to-many relationships required by all constraints.
When we climb to the top of 5nf, we can "reverse normalization" based on actual needs to increase data redundancy, which simplifies development and improves query speed. Reverse normalization is such a process: After defining a reliable and fully-standardized data structure, you can use this process to selectively introduce some duplicate data, to promote the realization of special performance requirements. Steve Hoberman's "reverse normalization Survival Guide" provides a set of measurable scoring standards for how to appropriately add redundancy. By examining the six problems of each link, after accumulating the scores of each problem, when the score is greater than or equal to 10, we will reverse normalize the relationship.
Scoring rules for "reverse normalization Survival Guide:
1. What type of link is: this problem determines the type of the link we analyzed. What is the relationship between parent entities and child entities?
Hierarchy (20 points)
Equivalent relationship (-10 points)
Determine the Link (-20 points)
2. What is the participation rate: This issue determines the participation of each entity in a link. In other words, for a given parent entity value, how many child entity values will there be? The closer the relationship between the parent and the child is to "one-to-one", the more chance we will normalize it.
Up to "one-to-five" ratio (20 points)
Up to "one to one hundred" ratio (-10 points)
Exceeding the "one to one hundred" ratio (-20 points)
3. Number of data elements in the parent object
Less than 10 data elements (20 points)
The number of data elements ranges from 10 to 20 (-10 points)
More than 20 data elements (-20 points)
4. What is the usage rate? Do users need parent information when they need child information? In other words, what is the coupling or correlation between the two entities?
Strong association between each other (30 points)
Weak or unrelated to each other (-30 points)
5. Is a placeholder for a parent object? In the near future, are we still planning to add more data elements or relationships to the parent object? If the answer is "no", reverse normalization is more feasible.
Yes (20 points)
No (-20 points)
6. What is the ratio of changes: this problem is to determine whether the insertion and update frequency of two entities are similar within the same period of time. If one entity rarely changes while the other entity changes frequently, we are very inclined to maintain their normalization status and put them in their respective tables.
Same (20 points)
Different (-20 points)
Usage of "reverse normalization Survival Guide:
1) sort the relationship in the model by priority
2) Select a link
3) answer questions about this relationship
4) if the score is equal to or greater than 10, reverse normalization is performed.
5) return to step 2 until all links are completed.
Chapter 9: Abstract security guide and components
Those who have read my "talking about database design skills (I)" should remember the second example I gave: the design of the commodity information table on the online e-commerce platform. This chapter raises the methods I used in the above example to the theoretical stage and uses object-oriented design to extract the common attributes of all commodities and abstract them into a superclass, add a table to record the details of different entities to realize the derivation of superclass, so as to achieve design flexibility. Abstraction is extremely useful when there are any of the following conditions:
The design must be permanently maintained: do not modify the database design as much as possible in the future.
Requirements may change: Application requirements change, and business processes need to be reorganized or functions upgraded.
Data warehouse: when the new category type is transmitted from the source application, we do not need to make any changes to the data warehouse design, but only need to add a new row to the category type entity.
Metadata warehouse: similar to Data Warehouse requirements
Of course, abstraction will greatly increase the workload and development complexity, while people usually focus on very short-term applications and immediate costs, rather than the much higher costs in the future. Therefore, I strongly agree with the agile software development idea that at first there was almost no pre-design, but once the demand changes, as a programmer pursuing excellence, the entire architecture design should be reviewed from the ground up to design a system architecture that can meet similar changes in the future.
Abstract components are small abstract model fragments. They can be used repeatedly in many modeling scenarios (regardless of the industry, organization, or even the modeling of subject domains. After using abstraction multiple times in the key-mode stage, you will begin to see the trend of abstract structure. These "abstract components" have the following purposes:
Faster Design
Accelerate development
Provides general and useful institutions
Chapter 10: Data Model beautification Techniques
This chapter focuses on how to improve the visual appearance of logical and physical data models, so that our design goes beyond the direct application requirements. This chapter discusses five types of beautification techniques:
Logical Data Element arrangement skills: these skills are a recommended method for sorting the data elements of each entity in your Logical Data Model.
Physical Data Element sorting skills: these skills focus on the optimal layout of each object in the data model.
Entity layout skills: these skills focus on the optimal layout of each object in the data model.
Relationship layout skills: these skills focus on how to adjust overlapping relationship lines and relationships that seem to traverse (rather than bypassing) irrelevant entities.
Attention-attracting skills: these skills focus on how to highlight certain elements, entities, or relationships in our engagement.
Chapter 2: planning a prosperous data modeling career
List of top 10 suggestions for data modelers:
1) Remember: flexibility, accuracy, and background
2) modeling is only a small part of your work.
3) try other roles
4) Understand 95/5 rules: 95% of the time will be spent on 5% of the data elements
5) Data Modeling never gets bored: If you have been doing data modeling and find yourself getting bored, you should change it. This may not be annoying in the field of data modeling, but rather your specific task, company, or industry is no longer excited. Take a risk and try to model data in a different project or industry!
6) standing at the forefront of Technology
7) Try not to involve emotional factors in the model: modelers must understand that people's opinions during the review process are not for the model creator, but for the content of the model. That's the old saying: It's not right.
8) Let your creativity expand: Be creative when considering new ways to record data needs and improve design. Being creative may mean modifying some of the tools in this book. This may also mean proposing your own workbooks or other tools.
9) Simple Theory is too expensive: Make sure you keep this idea in mind during design activities. Departments and organizations paying for this application expect to see tangible practical results.
10) become an amazing storyteller: as a data modeler, storytelling is a very important part of the work. To help group education and influence project managers and others who lack understanding of our industry, we need to tell stories or anecdotes.
Finally, I personally think that the concept of "abstract components" proposed by Steve Hoberman is very similar to the "Design Pattern" in object-oriented design. That is, after multiple data modeling, database experts abstract similar parts of each project and extract specific modeling model fragments. In the future, they only need to refine the derivation of these model fragments in the new project, you can quickly build a database architecture suitable for this project. However, the fragments of these modeling models are not uniform, forming standards, and there are no such books published at present. I am summing up my experience in this area one after another, but I have a limited level of self-knowledge. I do not dare to make a shift in front of the senior personnel. I only hope that the relevant articles I will release in the future will play a role in turning them apart, it is expected that programmers in China will take the lead in unifying the "Design Model" in the field of data modeling ".