11 Database Design Guidelines and 11 database guidelines I have followed

Last Update:2014-08-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author Shivprasad koirala, a former MVC engineer from Microsoft ASP/ASP. NET, is now a CEO in India. If you have any questions about the first translation, please kindly advise.

Introduction

Before reading this article, I want to tell you that I am not a master of database design. The 11 principles below are derived from projects, my own experiences, and my own understanding and learning. I personally think that using these guidelines in database design can benefit me a lot. I also welcome any criticism and guidance.

The reason why I want to write such a detailed article is that many developers believe in the "three paradigms" when designing databases ). They believe that the three paradigm is the only true meaning of database design. With the continuous development of projects, those who hold these ideas will often encounter endless troubles.

If you do not know much about the paradigm, click here:3 normal forms: part1-part2-part3(Translator's note: the original article is the yotube video. Due to the well-known reasons, the video cannot be watched in China. I have already shared the original video to the csdn Download Area and downloaded it without points ), this tutorial will teach you these three paradigms.

"Paradigm" is a very important criterion, but if you follow it blindly and do not change it at all, you will often get into trouble. When I design databases, I keep these 11 important guidelines in mind. Criterion 1: What is the essence of an application (OLTP or OLAP )?

(Translator's note: online transaction processing or online analysis processing is biased towards addition, deletion, modification, query, or data analysis)

When you begin designing databases, the first thing you need to do is to analyze the nature of the applications you designed, whether it is transactional or analytical. You will find that many developers do not consider the type of their applications, but are all designed according to the "paradigm" principle, then we encountered various performance and customization problems. As I said, there are two types of applications: transaction-based and analysis-based. Let's take a look at what these two types are.

Transaction type: For such applications, the end user pays more attention to addition, deletion, modification, and query, such as adding, reading, updating, and deleting records. For this type of database, we officially call it OLTP (Translator's note: Online Transaction Processing, Online Transaction Processing ).

Analyticdb: For such applications, end users are more concerned with data analysis, report generation, and data prediction. This type of database seldom inserts or updates data. The main purpose of this design is to obtain and analyze data from the database as quickly as possible. For this type of database, we officially call it OLAP (Translator's note: Online Analytical Processing, Online Analytical Processing ).

In other words, if you find that it is more important to insert, update, or delete a database, you should follow the standard paradigm. Otherwise, you should establish a flat non-standard database structure.

Demonstrate how to use the nonstandard design idea in the left-side Name and Address Table to create a flat non-standard data structure that does not comply with the standard paradigm on the right.

Criterion 2: divide your data into several logical fragments to make your life better

This principle is actually the first paradigm. An obvious feature that violates this criterion is that your query statement uses a lot of string parsing functions, such as substring and charindex. If so, you need to apply this rule.

For example, a field in the table is "Student Name". If you want to query students whose names contain "Koirala" but not "Harisingh, you can imagine what the query statement you wrote at that time is.

Therefore, a better solution is to break this field into more detailed logical segments, so that we can write more clean and perfect query statements.

Criterion 3: Do not over-use Criterion 2

Developers are all smart people. If you tell them a way, they always use this method. Rule 2 overhead will lead to some results you don't want at all. However, Criterion 2 is very useful. When you want to "remove this stuff", pause and ask yourself, is it really necessary to split it? As I said earlier, splitting must be logical.

For example, you can see that there is a field in the phone number. You have almost no ISD (International Subscriber Dialing, International subscribers Dialing) for the number) manage the numbers separately (unless there is a real demand in the project), so it is more wise to put the numbers here. If you really split them, then you will have to face a lot of "splitting sequelae.

Criterion 4: redundant and inconsistent data is your greatest enemy

Be careful about redundant data and refactor them. I am not worried that redundant data will occupy a lot of disk space. What I am worried about is the confusion and confusion caused by redundant data.

For example, in the middle, you will find that "5th Standard" and "th standard" are a meaning, but the description form is slightly different. You may explain that the data was randomly input and has not been verified. When you plan to export a report, there will be two different reports for this field. When users see such reports, they may be in a mess.

One solution is to move the data to a new table, and then the original table uses the data as a reference. You can see that I have created a new table named "Standards" and connected the two tables with a foreign key.

Criterion 5: Be careful with the data separated by delimiters

The second article of the first paradigm stipulates that there cannot be multiple values in one position. Is an example of storing multiple values. After carefully observing the "Syllabus" field, we will find that we have inserted too much data into this field. This type of field becomes a "Repeat group". If we want to operate on the data, not only is the query statement very complex to write (and whether it is right or wrong), but I doubt the efficiency of this query.

This type of columns contains a lot of data fields, so be very careful. A better solution is to move these fields to different tables and then connect them with foreign keys, in order to better manage the logic and operation.

We need to follow the second rule of the first paradigm: one location cannot store multiple values. As you can see, I have created an independent "syllabus" table, and then established many-to-many associations with the master table.

In this way, the "syllabus" field in the main table is no longer inserted into a large amount of data separated by separators. It is logically easy to understand and more operational.

Criterion 6: Be careful with part of the dependency

Be careful with the fields that are partly dependent on the primary key. For example, the primary keys in the fields are Roll Number and Standard. Now let's take a look at "syllabus". This field is associated with the standard field, rather than directly associated with the Student ("Roll Number" field.

The "syllabus" field is associated with the "standard" field that the student is learning, but not directly associated with the student. When I want to update "syllabus" tomorrow, I must also update all the "Student" fields related to it. Dude, this is a tough physical activity and there is no logic! So it is more meaningful to pull out the "syllabus" field separately and associate it with "Standard" through a foreign key.

You can see how I pull out the "syllabus" field and associate it with "Standards.

This principle is actually the second paradigm itself: All attributes must be fully dependent, and fields should not be partially dependent on the primary key.

Criterion 7: carefully select a derived Column

If you are designing an OLTP-type database, that is, a transaction-type database, it would be wise to avoid deriving columns unless there are high performance requirements for this part of content. However, for OLAP databases that require a large number of summation and computing, that is, analyticdb, these derived columns are necessary and can improve database performance.

You can see that Average relies on Total Marks and Total Subject for calculation, which is one of the manifestations of data redundancy. If you encounter a field derived from other columns, do you have to think about it? Is this field really required?

As stated in the third paradigm, a field cannot depend on any non-primary key field. I personally think that we should not blindly follow the third paradigm, depending on the situation. Redundant data is not always a bad thing. If the redundant data is computed, the situation determines whether or not to observe the third paradigm.

Criterion 8: If performance is the key, do not strictly avoid redundant data.

Do not treat "Avoid all redundant data" as an dead command. If you have high performance requirements, consider non-standardization. To comply with the standards, you often need to use the join command to connect to many tables. But in non-standard mode, by setting some redundant information, there will be less join commands, so the performance will be much better.

Criterion 9: multi-dimensional data is a completely different beast

Projects that use OLAP (analyticdb) primarily process multidimensional data. For example, you can view the sales volume, sales volume, and other data of each customer in each country. Simply put, the sales volume you pay attention to is the intersection of three dimensions.

In this case, it will be better to design a dimension. To put it simply, you can create a sales table with the "sales amount" field (which is a central table) and set foreign keys to be associated with other one-dimensional tables.

Guideline 10: design of a centralized key-value table

I often encounter a key-value table. A key-value table stores the key and the data corresponding to the key, is a simple table ). For example, you can see a Currency table and a Country table in the table below. After careful observation, you will find that the two tables have only one key and one corresponding value.

For such data tables, create a centralized table and set up a "Type" field to differentiate. This will make your database work better.

Guideline 11: for multi-level data, you must reference its own primary key or set up a foreign key.

I often encounter multiple data sets and consider a multi-level sales solution. A salesperson can have many salespeople. In this case, referencing your own primary key or setting up a foreign key will help you achieve the desired effect.

This article does not mean that you do not follow the standard paradigm, but that you do not blindly abide by it. The first thing you need to consider is the project attributes and the types of data you want to process.

The following is a video that uses a simple school table to explain the three standard paradigms, the downloaded readers do not need to download it again)

SQL Server -- Can you explain First, Second and Third normal form in SQL server [part1]

SQL Server -- Can you explain First, Second and Third normal form in SQL server [part2]

SQL Server -- Can you explain First, Second and Third normal form in SQL server [part3]

(Full text)

From: http://www.codeproject.com/Articles/359654/11-important-database-designing-rules-which-I-foll

Let me explain the basic principles of database design.

The three paradigms of database design are the standards for the standardization of relational database relationship models, from the easing of standardization to the strict, which are different paradigms respectively, the first paradigm, the second paradigm, the third paradigm, and the BC paradigm are usually used. The paradigm is based on function dependencies.

Function dependency

Definition: relational mode R (U) is set. X and Y are the subsets of the property set U. function dependency is a proposition in the form of X → Y, for the two tuples t and s in any R, t [X] = s [X] contains t [Y] = s [Y], fd x → Y is established in relational mode R (U. X → Y: The 'x function determines y', or the 'Y function depends on x '. In general, if the value of a field Y in a table is determined by the value of another field or a group of fields X, the function Y depends on X. Function dependencies are determined by understanding the data items and enterprise rules. The function dependencies obtained based on the table content may be incorrect.

1NF)

Definition: If the attribute of each relational R in the relational mode r is an inseparable data item, R is called the first normal mode.
In short, each attribute is an atomic item and cannot be separated. 1NF is the minimum condition for the relational model. If the database design cannot meet the first paradigm, it is not called a relational database. The relationship normalization of relational database design research is based on 1NF.

2NF)

Definition: If the relational mode R is 1NF and each non-primary attribute fully function depends on the candidate key, R is called the second paradigm.
Simply put, the second paradigm must meet the following conditions: first, it must meet the first paradigm. Second, each non-primary attribute must have full function dependencies and candidate keys, or primary keys. That is to say, each non-primary attribute is determined by the entire primary key function, and cannot be determined by a portion of the primary key. For example:
The primary key of a stock statement is the stock code and the transaction date. Non-primary attributes include the closing price and transaction volume, which are determined by the primary key, namely the stock code and the transaction date function. Separate Stock Codes or transaction dates cannot be used to determine these non-primary attributes. If the table contains the abbreviation of a non-primary-attribute stock, the abbreviation of the stock can be determined by the stock code, so that the abbreviation of the stock is not completely dependent on the candidate key, such a design does not meet the second paradigm.

3NF)
Definition: If the relational mode R is 2NF and all non-primary attributes in the relational mode R (U, F) do not have any dependent on any candidate keywords, the relationship R belongs to the third paradigm.
Simply put, the third paradigm must meet the following conditions: first, it must meet the second paradigm, and second, there is no function dependency between non-primary attributes. Because the second paradigm is satisfied, it means that each non-primary attribute function depends on the primary key. If a function dependency exists between non-primary attributes, the passing dependency exists, which does not meet the third paradigm.
For example, in the basic stock information table, the primary key is the stock code, which has the primary and secondary industries of non-primary attributes. According to the business rules, the second-level industry can function to determine its level-1 industry. This indicates that the function is stored in this relationship: the stock code function determines its level-2 industry, the second-level industry function determines the first-level industry, which forms a transfer dependency. Such a design does not conform to the third paradigm. However, in practice, for the convenience of query and use, sometimes it violates the third paradigm. For the above example, if there is no attribute of a level-1 industry, you need to query the relevant stocks of the level-1 industry. You need to use a function to generate a level-1 industry from the level-2 industry, performance will be affected. Therefore, attributes of the primary industry are usually added.

BC Paradigm (BCNF)

The BC paradigm is the enhanced version of the third paradigm, but some people say that it has developed directly from 1NF, that is, each attribute, including the primary or non-primary attributes, is completely dependent on the candidate key, there is no transfer dependency.

Principles for designing a relational database

Guideline 1: minimize access records in transactions.
Guideline 2: keep transactions as concise as possible.
First, do not add too many modifications or delete statements to the same transaction.
Second, when updating, if there are more one-time update statements, it is best to choose the appropriate time to update
Guideline 3: do not require user input during Transaction Processing
Guideline 4: Do not open transactions when Browsing data.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

11 Database Design Guidelines and 11 database guidelines I have followed

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

11 Database Design Guidelines and 11 database guidelines I have followed

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support