◆ Introduction
Before reading this article, I must clearly tell you that I am not a master in the database design field. The 11 points listed below are my personal opinions on my experience gained from project practice and reading. I personally think they are of great help to my database design. You are welcome to make a picture :)
The reason why I wrote this complete article is that when many developers participate in database design, they naturally use the "Three Paradigm" as a silver bullet. They often think that following this specification is the only criterion for database design. Because of this mentality, they often stick to the project even if they hit the wall all the way.
If you are not clear about the "three paradigms", click here FQ) to learn more about the "three paradigms" step by step ".
Everyone said that standards and norms are important guidelines and are doing the same thing, but using them as a marker on a stone to remember to memorize them. The following 11 points are my top priorities for database design.
◆ Rule 1: Find out the OLTP or OPAP of the application to be developed )?
When you want to design a database, you should first analyze the type of the application you designed for it, which is "transaction processing" Transactional) or "Analytical" Analytical? You will find that many developers adopt a standardized approach to design databases without considering the type of the target program. Such programs will soon fall into performance and customer customization issues. As mentioned above, there are two application types: transaction-based processing and analysis-based ", let's take a look at what these two types mean.
Transaction Processing type:For this type of applications, your end users are more concerned with data addition, query, deletion, CRUD, Creating/Reading/Updating/Deleting ). This type is more officially called "OLTP ".
Analysis type:For this type of applications, your end users are more interested in data analysis, reports, trend prediction, and other functions. This type of database has fewer "insert" and "Update" operations. They primarily aim to query and analyze data more quickly. This type is more officially called "OLAP ".
In other words, if you think that the insert, update, and delete operations are more prominent in your program, design a standardized table. Otherwise, create a flat and nonstandard database structure.
The following simple chart shows a simple and standardized table like Names and Address on the left. How can we create a flat table structure through the nonstandard structure of the application.
◆ Rule 2: divide your data into different blocks based on logic meaning to make things easier
This rule is actually the first paradigm in the "Three Paradigm. One sign that violates this rule is that your query uses many string parsing functions.
For example, substring and charindex. If so, you need to apply this rule.
For example, the following figure shows a table with a student name. If you want to query records whose Student name contains "Koirala" but does not contain "Harisingh, you can imagine what results you will get.
Therefore, it is better to split this field into deeper logical blocks so that our table data can be written more cleanly and the query can be optimized.
◆ Rule 3: Do not over-use Rule 2"
Developers are a group of cute creatures. If you tell them that this is the right way to solve the problem, they will continue to do so. If they do so, they will lead to unnecessary consequences. This can also be applied to Rule 2 we just mentioned. When you consider splitting fields, pause and ask if you really need to do so. As mentioned, decomposition should be logical.
For example, you can see the phone number field. You rarely separate the ISD code of the phone number, unless required by your application ). So a wise decision is to keep it as it is, otherwise it will bring more problems.
◆ Rule 4: Treat repetitive and inconsistent data as your greatest enemy
Concentrate those duplicate data and refactor them. I am more worried about the confusion caused by repeated data, rather than how much disk space they occupy.
For example, in the following chart, you can see that "5th Standard" and "th standard" mean the same, they mean repeated data. Now you may say that the repeat data is transferred to your system because the repeat data has been input by those inspector or the poor verification program has not been stopped. Now, if you want to export a report that shows confusing data as different Entity Data, what should you do?
One solution is to completely move the data to another primary table and reference it with a foreign key. In the following chart, you can see how we create a master table named "Standards" course level), and then connect it with a simple foreign key.
◆ Rule 5: Beware of Data separated by delimiters. They violate the "field cannot be split" rule"
The previous rule 2, the "first paradigm", is to avoid "repeated groups ". The following chart is an example to explain what a "repeated group" looks like. If you carefully observe the syllabus course) field, you will find that too much data is filled in this field. Such fields are called "repeated groups. If we have to use the data again, these queries will be very complex and I suspect that these queries may have performance problems.
Note these data columns that are filled with delimiters. A better way is to move these fields to another table and connect them with foreign keys for better management.
So, let's apply Rule 2 first paradigm now) "avoid repeated groups. You can see the chart above. I created a separate syllabus course) table, and then joined it with the subject table using the "many-to-many" relationship.
In this way, the syllabus course in the student table in the main table) does not have duplicate data or separators.
◆ Rule 6: Beware of columns that only partially depend on the primary key
Pay attention to the columns that only partially depend on the primary key. For example, in the chart above, we can see that the primary key of this table is Roll No. + Standard. Now, take a closer look at the syllabus field. You can see that the syllabus course) field is only associated with the dependency) Standard course level) field rather than directly associated with the dependency ).
The Syllabus course) field is associated with the course level Standard field that the student is learning, rather than directly associated with the student itself. If we want to update the syllabus tomorrow, we need to modify it for each student. This is obviously not logical ). It is more meaningful to move these fields from this table to another table, and then associate them with the Standard course level) table.
You can see how we move the syllabus course) field and attach the Standard table.
This rule is only the "second Paradigm" in the "Three Paradigm": "All fields must be completely dependent on the primary key rather than partially dependent ".
◆ Rule 7: carefully select a derived Column
If you are developingOLTPIs a good idea, unless there are urgent performance requirements, such as the need for summation and calculation.OLAPProgram, for the sake of performance, these derived fields need to exist.
Through the above chart, you can see how the Average field depends on the Marks and Subjects fields. This is also a form of redundancy. Therefore, for such fields obtained from other fields, you need to think about whether they really need to exist.
This rule is also called the third in the "Three Paradigm": "columns that depend on non-primary keys should not exist ". My personal opinion is that we should not blindly use this rule. We should look at the actual situation. redundant data is not always bad. If redundant data is computed, check the actual situation and decide whether to apply this third paradigm.
◆ Rule 8: If performance is critical, do not stubbornly avoid redundancy.
Do not regard "Avoid redundancy" as an absolute rule to follow. If there is an urgent demand for performance, consider breaking the regular. In normal cases, you need to connect multiple tables. in unconventional cases, such multi-Table connections will greatly reduce the performance.
◆ Rule 9: multi-dimensional data is the aggregation of different types of data
OLAPThe project mainly solves the problem of multi-dimensional data. For example, you can look at the chart below to get the sales volume of each country, each customer, and each period of time. Simply put, the sales data you are looking at includes three dimensions.
It is a better way to make a practical design for this situation. To put it simply, you can create a simple main sales table that contains the sales field and connect all other tables with different dimensions using a foreign key.
◆ Rule 10: Design tables with names and values
I have encountered this "name-value table" many times ". "Name-value table" means it has some keys, which are associated with other data. For example, in the chart below, you can see two tables, Currency and Country. If you observe carefully, you will find that these tables only have keys and values.
For such a table, it makes more sense to create a main table and use a Type field) to distinguish different data.
◆ Rule 11: data in an infinitely hierarchical structure, referencing its primary key as a foreign key
We often encounter some data tree structures with an infinite parent-child hierarchical structure ?). For example, if you consider a multi-level sales solution, you can have multiple sales personnel under one salesperson. All of them are sales personnel ". That is to say, the data itself is a type. But the hierarchy is different. At this time, we can use our primary key as a foreign key to express this hierarchical relationship, so as to achieve our goal.
This article does not mean that you do not follow the paradigm, but that you do not follow the paradigm blindly. Make the right choice based on the nature of your project and the data type to be processed.
Original article: 11 Important Database designing rules