The 11 database design guidelines I have followed

Source: Internet
Author: User

Foreword: Author Shivprasad Koirala, former Microsoft asp/asp. NET's MVC engineer, now a CEO in India. Students First translation, if there is inappropriate, generous enlighten.


Brief introduction

Before you start reading this article, let me tell you that I am not a master of database design. The following 11 guidelines are derived from the project, from my own experience and from my own understanding and learning. I personally think that the use of these guidelines in database design can be of benefit to me. I also welcome any criticism and advice.

The reason why I write such an exhaustive article is that many developers are very much in the "three paradigm" when designing a database (the translator notes: Not a combination of facts). They believe that the three paradigms are the only true meaning of database design. With the development of the project, the people who hold these ideas often go to the wall and encounter endless troubles.

If you still don't know what a paradigm is, click here:3 normal forms: part1-part2-part3(Translator note: Yotube video in the original, Due to the well-known reasons for domestic inability to watch, I have to share the original video to the csdn download area, download free points, this tutorial will teach you these three paradigms.

The "paradigm" is a very important criterion, but if you keep it completely, you will often get into trouble if you don't make a change. When I design a database, I keep these 11 important guidelines in mind. Guideline 1: What is the nature of the application (OLTP or OLAP)?

(Translator Note: Online transaction processing or online analysis processing, that is, preference for additions or deletions or bias to data analysis)

The first thing you need to do when you start designing a database is to analyze the nature of the application you're designing, whether it's transactional or analytic. You will find that many developers simply do not consider what type of application they belong to, but are all designed according to the "Paradigm" guidelines, and then encounter a variety of performance and customization issues. As I said, there are two applications: transactional-based and analytic-based, so let's take a look at what these two types are exactly.

transactional : This type of application, the end-user to add and remove changes to the attention, such as adding, reading, updating and deleting records. For this type of database, formally we call it OLTP (translator: Online Transaction processing, online transaction processing).

Analytical : This type of application, the end user is more concerned with the analysis of data, generate reports and the prediction of data. This type of database rarely performs an operation that inserts or updates data. The main intent of this design is to get the data from the database at the fastest speed and analyze it. For this type of database, formally we call it OLAP (translator Note: Online analytical processing, on-line Analytical processing).

In other words, if you find it more important to insert, update, or delete on a database, you should follow the standard paradigm to design, or you should create a flat, non-canonical database structure.

Shows how to use denormalized design ideas on the left name and Address table to create flat, non-canonical data structures on the right that do not conform to the standard paradigm.

Guideline 2: Divide your data into logical fragments to make life better

This rule is actually the first paradigm. One obvious feature of the violation of this rule is that your query statement uses a whole bunch of string parsing functions, such as substring, charindex, and so on. If this is the case, then you need to apply this rule.

For example, a field in the table is "Student name", and if you want to query a student whose name has "Koirala" without "Harisingh", you can imagine what the query statement would look like when you wrote it.

So a better solution would be to break up this field into more detailed logical fragments, so that we could write a cleaner and more perfect query statement.

Guideline 3: Do not over-use guideline 2

The developers are all smart men. If you tell them a way, they always use this method. Guideline 2 Using your head will result in something you don't want at all. But guideline 2 is very useful in itself. When you think of "Take this thing down", pause and ask yourself, is it really necessary to dismantle it? As I have just said, the split must be logical.

For example, you can see a field with a phone number in it, and you'll almost never manage the number of ISD (International subscriber dialing, International subscriber dialing) separately (unless there's a real need in any project), so it's obviously wiser to put the number on it. If you really tear them apart, then you will have to face a lot of "split Sequela".

Guideline 4: Redundant, non-uniform data is your biggest enemy.

Be careful with redundant data and refactor them. I'm not worried that redundant data will take up a lot of disk space, and what I'm worried about is the confusion and confusion caused by these redundant data.

For example, in the, you will find that "5th standard" and "fifth standard" is a meaning, but the description of the form slightly different. You might explain that this data was previously scrambled and has not been verified. When you plan to export a report, there will be two different reports for this field, and when the user sees such a report, they are afraid they will be in a mess in the wind.


One solution is to move the data into a new table, and the original table will use that data in a reference way. As you can see, I created a new table called "Standards" and joined the two tables with a foreign key.

Guideline 5: Beware of data separated by separators

The second section of the first paradigm specifies that multiple values cannot be stored in a single position. is an example where multiple values are stored. A closer look at the "Syllabus" field will reveal that we've stuffed too much data into this field. This kind of field we become "repeating group", if we want to manipulate this data, not only the query statement will write very complex (and right and wrong), and I very much doubt the efficiency of this query.


Some of these columns have a lot of data in the field, to be particularly careful to treat, a better solution is to move these fields to different tables, and then connect them with foreign keys, so that the logical and operational better management.


We need to comply with the second rule of the first paradigm: a location cannot store multiple values. As you can see, I have created a separate "syllabus" table and then established a many-to-many association with the main table.

In this way, the "Syllabus" field in the main table no longer plugs into a bunch of data separated by separators. It is logically easy to understand and more operational.

Guideline 6: Beware of partial dependence


Be careful with the fields that are partially dependent on the primary key, such as the primary key in roll number and standard. Now look at "syllabus", which is associated with the standard field, not directly with the student ("Roll number" field).

The "Syllabus" field is associated with the "standard" field that the student is learning and is not directly associated with the student. So tomorrow I want to update the "syllabus", I have to have all the relevant "Student" field also updated. Man, this is a hard work, and there's no logic! Therefore, it is more meaningful to pull the "syllabus" field separately and then associate it with "standard" through a foreign key.

You can see how I pulled out the "syllabus" field and then contacted "standards".

This rule is actually the second paradigm itself: all attributes must be fully dependent, and no fields should be partially dependent on the primary key.

Guideline 7: Carefully select derived columns

If you are designing an OLTP database, a transactional database, it would be wise to avoid deriving columns unless there is a high performance requirement for this piece of content. However, for OLAP databases that require a lot of summation and computation, that is, analytic databases, these derived columns are necessary and can make database performance more advantageous.

As you can see, average relies on total marks and total subject to calculate, which is one of the manifestations of data redundancy. So if you come across a field that is derived from another column, do you really need to think about this field?

As the third Paradigm says, the field cannot depend on any non-primary key fields. I personally think that the third paradigm should not be blindly followed, depending on the circumstances. Having redundant data is not always a bad thing. If these redundant data are computed, it depends on the situation to decide whether to follow the third paradigm.

Guideline 8: If performance is the key, don't avoid redundant data very strictly


Don't think of "avoiding all redundant data" as a dead order. If the performance requirements are high, consider a non-standardized bar. If you follow the standard, you often need to use the join command to connect many tables, but in non-normalization, by setting up some redundant information, there are not so many join commands, so performance is much better.

Guideline 9: Multidimensional data is a completely different beast

Projects that use OLAP (analytic) databases mainly deal with multidimensional data. For example, you can look at the sales of each country, the sales of each customer, and other data you want to get. In short, the sales you're focusing on are the intersection of three dimension data.


In this case, it would be better to design a dimension. Simply put, you can create a sales table with a "Sales Amount" field (which is the central table), and then set up some foreign keys to associate with other one-dimensional tables.


Guideline 10: Design of a centralized key-value table

I often come across key-value tables. The key value table means that it stores the key and the data corresponding to the key (the translator notes: Like the use of maps in a data structure, it is a fairly simple table). For example, as you can see in the table below, there is a currency table and a country table, and you will notice that there is actually only one key and one corresponding value in the two tables.


For this type of data table, create a centralized table and then set up a "Type" field to differentiate. This will make your database work better.

Guideline 11: For multilevel data to refer to its own primary key, or to set up a foreign key

I will often come across multiple sets of data, consider a multi-level sales plan, a salesperson can have a lot of sales staff. In this case, referencing your own primary key or setting up a foreign key will help you achieve the desired effect.


This article is not to let you not abide by the standard paradigm, but to let you do not blindly follow, you have to consider the first thing is the properties of the project and what type of data you are dealing with.


Here is a simple school table to explain the three standard paradigm of the video (translator Note: With the above "3 normal forms" for the same video, downloaded readers can not have to repeat the download)

SQL Server--Can explain First,second and third normal form in SQL Server [Part1]

SQL Server--Can explain First,second and third normal form in SQL Server [Part2]

SQL Server--Can explain First,second and third normal form in SQL Server [PART3]

(End of full text)


Originally from: Http://www.codeproject.com/Articles/359654/11-important-database-designing-rules-which-I-foll

The 11 database design guidelines I have followed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.