Database design--11 important database design rules

Source: Internet
Author: User

Brief introduction

Before you start reading this article, I have to tell you clearly that I am not a master of database design. The 11 points listed below are the personal insights I have learned from my own experience in project practice and reading. I personally think they have provided a great help to my database design. It is opinion, welcome to shoot Bricks:)

The reason why I wrote this complete article is because, as many developers participate in the database design, it is natural to use the "three paradigm" as a silver bullet. They tend to believe that following this specification is the only standard for database design. Because of this mentality, they tend to stick to the project despite all the snags.

If you are not clear about the "three paradigms", click here (FQ) to understand step-by-step what is "three paradigms".

It is said that the standard is an important guideline and is doing the same, but it can be problematic to remember (rote) as a mark on a stone. The following 11 points are my top priority in database design.

Rule 1: Figure out what is the nature of the application that will be developed (OLTP or OPAP)?

When you want to start designing a database, you should first analyze what type of application you are designing for, is It "transactional" (transactional) or "analytic" (analytical)? You'll find that many developers adopt a standardized approach to designing databases, regardless of the type of target program, so that the program will quickly fall into performance, customer-specific issues. As mentioned earlier, there are two types of applications, "transaction-based" and "based on analytics," Let's look at what these two types mean.

Transactional Type: This type of application, your end users are more concerned about the data of the search and deletion (crud,creating/reading/updating/deleting). This type is more officially called "OLTP".

Analytic: This type of application, your end users more attention to data analysis, reporting, trend prediction and so on. The "Insert" and "Update" operations for this type of database are relatively small. Their primary goal is to query and analyze data more quickly. This type is more officially called "OLAP".

So in other words, if you think inserting, updating, deleting data are more prominent in your program, then design a normalized table or create a flat, denormalized database structure.

The following simple chart shows a simple normalized table such as the left Names and Address, how to create a flat table structure by applying an denormalized structure.

Rule 2: Make things easier by dividing your data into chunks of logical meaning

This rule is actually the first paradigm in the "three paradigms". A sign of violating this rule is that your query uses a lot of string parsing functions

such as substring, charindex and so on. If that's the case, then you need to apply this rule.

For example, you can see the picture below has a student name table, if you want to query the student name contains "Koirala", but does not include "Harisingh" record, you may imagine you will get what kind of result.

So it's better to split this field into deeper logical chunks so that our table data is more cleanly written and optimized for queries.

Rule 3: Do not overuse "rule 2"

Developers are a bunch of cute creatures. If you tell them that this is the right way to solve the problem, they will continue to do so, and the end of it has led to some unnecessary consequences. This can also be applied to the rules we just mentioned earlier in rule 2. When you consider the decomposition of a field, pause it and ask yourself if you really need to do so. As is said, decomposition should be logical.

For example, you can see the phone number field, and you rarely separate the ISD code for the phone number (unless your application requires it). So a sensible decision is to keep it as it is, or it will bring more problems.

Rule 4: Treat repetitive, non-uniform data as your greatest enemy

Centralize those duplicated data and refactor them. What I am more concerned about is the confusion of these repetitive data rather than how much disk space they occupy.

For example, the following chart, you can see that "5th Standard" and "fifth standard" are the same meaning, they are duplicate data. Now you might say that because those who entered the duplicate data or the poor verification program did not stop, so that the repeated data into your system. Now, what if you want to export a report that shows the data that is confusing to the user as a different entity?

One solution is to move the data completely to another main table and then reference it through a foreign key. In the chart below you can see how we created a primary table called "standards" (course level), and then used the simple foreign key to connect to the past.

Rule 5: Beware of data separated by delimiters, which violate the "field is no longer divisible"

The preceding rule 2, "first paradigm", is about avoiding "repeating groups". The following chart, as an example, explains what a repeating group looks like. If you look closely at the field of syllabus (course), you will find that too much data is being filled in this field. Like these fields are called "repeating groups". If we have to use this data again, these queries will be very complex and I suspect that these queries will have performance problems.

These delimited data columns require special attention, and a good way to do this is to move these fields to another table, using foreign keys to connect to the past, as well as to facilitate better management.

So, let's apply Rule 2 (first paradigm) "Avoid repeating groups" now. You can see the chart above, I create a separate syllabus (course) table, and then use the "many-to-many" relationship to associate it with the subject (account) table.

With this method, the syllabus (course) field of the primary table (the student table) will no longer have duplicate data and separators.

Rule 6: Beware of columns that are only partially dependent on the primary key

Pay attention to those columns that are only partially dependent on the primary key. For example, in the above chart, we can see that the primary key of this table is roll no.+standard. Now look closely at the syllabus field, and you can see that the syllabus (course) field only correlates (depends on) the standard (course level) field instead of directly associating (relying on) a student (roll No). field).

The Syllabus (course) field is associated with which course level (standard field) the student is learning and not directly related to the student itself. If we are to update the syllabus (course) tomorrow, it will be painful to revise it for each of our classmates, which is obviously illogical (abnormal practice). It is more meaningful to move these fields from this table to another table and then associate them with the standard (course level) table.

You can see how we move the syllabus (course) field and attach the standard table as well.

This rule is just the "second paradigm" in the "Three paradigms": "All fields must be completely dependent on the primary key and not the partial dependency."

Rule 7: Carefully select derived columns

If you are developing an OLTP -type application, it would be a good idea to force a non-use of derived fields, unless there are compelling performance requirements, such as an OLAP program that often requires summation and computation, and for performance, these derived fields need to exist.

With this diagram above, you can see how the Average field is dependent on the Marks and subjects fields. This is also a form of redundancy. So for those fields that are obtained from other fields, consider whether they really need to exist.

This rule is also called the third in the "Three paradigms": "There should be no columns that depend on non-primary keys". My personal view is not to blindly apply this rule, should look at the actual situation, redundant data is not always bad. If the redundant data is calculated, look at the actual situation and decide whether to apply the third paradigm.

Rule 8: If performance is critical, do not stubbornly avoid redundancy

Don't use "avoid redundancy" as an absolute rule to follow. If there is an urgent need for performance, consider breaking the routine. In general, you need to do multiple table connection operations, and in unconventional cases such a multi-table connection can significantly degrade performance.

Rule 9: Multidimensional data is an aggregation of various different data

OLAP projects are primarily about solving multidimensional data problems. For example, you can look at the chart below and you will want to get sales for each country, every customer, and every period of time. In a nutshell, the sales data you're looking at includes a crossover of three dimensions.

It is a better way to make a practical design for this situation. Simply put, you can create a simple main sales table that contains the sales field and joins all other tables of different dimensions through a foreign key.

Rule 10: Unify the tables that have the "Name Value table" feature

Many times I have encountered this "Name Value table". A "Name value table" means that it has some keys that are associated with other data. For example, the chart below, you can see that we have Currency (currency type) and country (country) these two tables. If you look closely you will find that in fact these tables have only keys and values.

For such a table, it makes more sense to create a primary table that distinguishes different data by a type (type) field.

Rule 11: Data of an infinitely hierarchical structure, referencing its own primary key as a foreign key

We will often encounter some infinite parent-child hierarchy data (tree structure?). )。 For example, consider a multi-tiered sales scenario where you can have multiple salespeople under a sales person. Notice that they are all "salespeople". This means that the data itself is a kind of. But the hierarchy is different. At this point we can refer to our primary key as a foreign key to express this hierarchical relationship, so as to achieve the purpose.

The purpose of this article is not to ask you not to follow the paradigm, but to not blindly follow the paradigm. Make the right choice based on the nature of your project and the type of data you need to process.

Database design--11 important database design rules

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.