Database Design Principles _

Database Design Principles __ Database

Last Update:2018-07-26 Source: Internet

Author: User

Tags one table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

If you are not clear about the "three paradigms", please click here (FQ) to learn step-by-step what is "three normal".

It is said that standard specification is an important guideline and is doing so, but remembering it as a mark on a stone can still cause trouble. The following 11 points are my top priority in database design.

Rule 1: Figure out what the nature of the application will be developed (OLTP or OPAP).

When you start designing a database, you should first analyze what type of application you are designing, whether it's "transactional" (transactional) or "analytic" (analytical). You will find that many developers use a standardized approach to designing a database, regardless of the type of target program, so that the program will quickly fall into performance, customer customization issues. As mentioned earlier, there are two application types, "transaction based" and "analysis based," and let's take a look at what these two types mean.

Transaction type: This type of application, your end user is more concerned about the data of the search and delete (crud,creating/reading/updating/deleting). This type is more officially called "OLTP".

Analytic type: This type of application, your end user more attention to data analysis, reports, trend prediction and so on functions. The "Insert" and "Update" operations of this type of database are relatively small. Their main purpose is to query and analyze data more quickly. The more official term for this type is "OLAP".

So in other words, if you think inserting, updating, and deleting data are more prominent in your program, design a normalized table or you can create a flat, unstructured database structure.

The following simple diagram shows a simple, normalized table such as the left Names and address, how to create a flat table structure by applying an unstructured structure.

Rule 2: Divide your data into different chunks in logical terms, making things simpler

This rule is actually the first paradigm in the "three paradigms". One sign of violating this rule is that your query uses a lot of string parsing functions

such as substring, charindex and so on. If that's the case, then you need to apply this rule.

For example, you can see a table with a student's name on the following picture, and if you want to check the student's name for a "Koirala", but do not include a "harisingh" record, imagine what you're going to get.

So it's better to split this field into a deeper logical chunk so that our table data is more cleanly written and the query is optimized.

Rule 3: Do not overuse "rule 2"

Developers are a bunch of cute creatures. If you tell them that this is a way to solve the problem, they will continue to do so, and it leads to some unnecessary consequences. This can also be applied to the rule 2 we just mentioned earlier. When you consider field decomposition, pause for a moment and ask yourself if you really need to do this. As is said, decomposition should be logical.

For example, you can see the phone number field, and rarely do you separate the ISD code of the phone number separately (unless your application requires it). So a sensible decision is to keep it as it is, otherwise it will lead to more problems.

Rule 4: Treat repetitive, non-unitary data as your worst enemy

Centralize those duplicate data and refactor them. My personal concern is more about the confusion of these data than how much disk space they are consuming.

For example, the following chart, you can see "5th Standard" and "fifth Standard" are the same meaning, they are duplicate data. Now you might say that it's because those people who entered the duplicate data or the bad validator didn't stop and let the repetitive data get into your system. Now, what do you do if you want to export a report that shows a very confusing piece of data that was originally in the user's eyes as a different entity data?

One workaround is to move the data completely to another primary table and then refer to it through a foreign key. In the chart below you can see how we create a primary table called "standards" (course level), and then use the same simple foreign key to connect to the past.

Rule 5: Beware of delimited split data, which violates the "field cannot be divided"

The preceding rule 2, "first paradigm", is to avoid "repeating groups". The following chart, as an example, explains what a "repeating group" looks like. If you look closely at the syllabus (course) field, you will find that there are too many data in this field. Like these fields are called "repeating groups." If we have to use this data again, then these queries will be very complex and I suspect that these queries will have performance problems.

These data columns that are stuffed with delimiters require special attention, and a better approach is to move the fields to another table, connect the past with a foreign key, and, likewise, facilitate better management.

So let's now apply Rule 2 (first paradigm) to "avoid repeating groups". You can see the chart above, I created a separate syllabus table, and then use the Many-to-many relationship to associate it with the subject (account) table.

In this way, the syllabus (course) field of the primary table (student table) no longer has duplicate data and delimiters.

Rule 6: Beware of columns that only partially rely on primary keys

Pay attention to the columns that only partially depend on the primary key. For example, the above chart, we can see that the primary key of this table is Roll No.+standard. Now, if you look closely at the syllabus field, you can see that the syllabus (course) field simply associates (relies on) Standard (Course level) fields instead of directly associating (relying on) a student (Roll No). Fields).

The Syllabus (course) field is associated with which course level (Standard field) The student is studying instead of directly associating with the student itself. It is obviously illogical (unusual) that if we are to update the syllabus tomorrow, we will also have to make a painful change for each student. It is more meaningful to move these fields from this table to another table, and then associate them with the Standard table (course level) tables.

You can see how we move the syllabus (course) field and attach the Standard table as well.

This rule is only "second normal" in "Three Paradigms": "All fields must be fully dependent on the primary key rather than partially dependent."

Rule 7: Carefully select derived columns

If you are developing an OLTP application, it would be a good idea to force the use of derived fields, unless there are urgent performance requirements, such as OLAP programs that often need to be summed and computed, and these derived fields are necessary for performance.

With this chart above, you can see how the Average field relies on the Marks and Subjects fields. This is also a form of redundancy. So for the fields that are available from other fields, consider whether they really need to exist.

This rule is also referred to as the third of the three paradigms: "There should be no columns dependent on a Non-key key." My personal view is not to blindly apply this rule, should look at the actual situation, redundant data is not always bad. If the redundant data is calculated, look at the actual situation before deciding whether to apply this third paradigm.

Rule 8: If performance is critical, do not stubbornly avoid redundancy

Don't take "avoid redundancy" as an absolute rule to follow. If there is an urgent need for performance, consider breaking the routine. Under normal circumstances you need to do more than one table connection operation, and in unconventional circumstances such a multiple-table connection can greatly reduce performance.

Rule 9: Multidimensional data is an aggregation of different data

OLAP projects are primarily about solving multidimensional data problems. For example, you can look at the chart below and you will want to get sales for each country, every customer, and every period of time. To put it simply, the sales data you're looking at includes three dimensions of crossover.

It is a better idea to do a practical design for this situation. To put it simply, you can create a simple main sales table that contains the Sales field, which joins all other tables of different dimensions through a foreign key.

Rule 10: Unify the tables that have the characteristics of a "Name value table" to design

Many times I have encountered this "Name Value table". A "Name-value table" means that it has some keys, which are associated with other data. For example, the chart below, you can see that we have Currency (currency type) and Country (country) These two tables. If you look closely you will see that these tables are actually only keys and values.

For such a table, it makes more sense to create a primary table that distinguishes different data by a type field.

Rule 11: Data with an infinite hierarchy, referencing its primary key as a foreign key

We often encounter some infinite parent-child hierarchies of data (tree structure). ）。 For example, consider a multi-level sales scenario, a salesperson can have more than one sales person. Notice that they are all "sales people." That means the data itself is a kind of. But the hierarchy is different. At this point we can refer to our primary key as a foreign key to express this hierarchical relationship, thereby achieving the goal.

The purpose of this article is not to tell people not to follow the paradigm, but to not blindly follow the paradigm. Make the right choices based on the nature of your project and the type of data you need to deal with.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More