Database Normalization and design skills

Source: Internet
Author: User

In the design of dynamic websites, the importance of database design is self-evident. If the design is not proper, it is very difficult to query, Program Performance will also be affected. Whether you are using a MySQL or Oracle database, you can make your php Code It is more readable and easy to expand, which will also improve the performance of applications.
To put it simply, normalization is to eliminate redundancy and uncoordinated subordination during table design. In this article, I will show you the normalization skills you should understand in design through five gradual processes. To establish a feasible and efficient database. This article also analyzes in detail the available link types.

Assume that we want to create a user information table, which stores the user name, company, company address, and personal favorites or URLs. At the beginning, you may define the following table structure:

Zero status

Users
Name company company_address url1 url2
Joe ABC 1 work Lane abc.com XYZ.com
Jill XYZ 1 job street abc.com XYZ.com

Since no normalization is performed, we call this form of table as a zero-state table. Pay attention to the url1 and url2 fields. What if we need a third URL in the application? In this way, you need to add a column to the table. Obviously, this is not a good solution. If you want to create a scalable system, you need to consider using the first formal form and apply it to the table.

Level 1 Normalization

1. Remove duplicate groups in each table
2. Create an independent table for each set of related data
3. Use a primary key to identify each set of related data

The preceding table clearly violates the first rule. What does the primary key of the third statement mean? Simply put, it adds a unique, automatically increasing integer value to each record. With this value, two records with the same name can be distinguished. Through the first-level normalization form of the application, we get the following table:

Users
Userid name company company_address URL
1 Joe ABC 1 work Lane abc.com
1 Joe ABC 1 work Lane XYZ.com
2 Jill XYZ 1 job street abc.com
2 Jill XYZ 1 job street XYZ.com

Now our table can be said to be in the first-level normalization form. It has solved the restriction problem of URL fields, but this processing has brought about a new problem. Every time we insert a record in the User table, we must repeat all the company and user data. This not only makes the database bigger than before, but also is prone to errors. Therefore, it must undergo second-level regularization.
Level 2 Regularization

1. Create an independent table for the application's fields in multiple records
2. Associate the values of these tables with a foreign key.

We put the URL value in a separate table so that we can add more data in the future without worrying about duplicate values. We also associate these fields with primary key values:

Users
Userid name company company_address
1 Joe ABC 1 work Lane
2 Jill XYZ 1 job Street

URLs
Urlid reluserid URL
1 1 abc.com
2 1 XYZ.com
3 2 abc.com
4 2 XYZ.com

As shown above, an independent table is created. The primary key userid in the users table is now associated with the foreign key reluserid in the URL table. The current situation seems to have been significantly improved. But what if we want to add an employee record for ABC? Or more, 200? In this way, we must reuse the company name and address, which is obviously not redundant. Therefore, we will apply the third-level normalization method:

Level 3 Regularization

1. Eliminate fields that do not depend on the key

The company name and address have no relationship with the user ID, so their applications have their own company ID:

Users
Userid name relcompid
1 Joe 1
2 Jill 2

Companies
COMPID company company_address
1 ABC 1 work Lane
2 XYZ 1 job Street

URLs
Urlid reluserid URL
1 1 abc.com
2 1 XYZ.com
3 2 abc.com
4 2 XYZ.com

In this way, the primary key comid in the companies table is associated with the foreign key named relcompid in the users table. Even if ABC joins 200 employees, there is only one record in the companies table. Our users and URLs tables can be expanded constantly without worrying about inserting unnecessary data. Most developers think that the three-step normalization is sufficient. The database design can easily handle the burden on the entire enterprise. This is true in most cases.

We can pay attention to the URL field-have you noticed data redundancy? If the HTML page for users to enter the URL data is a text box and can be input at will, this is no problem. The two users have less probability to enter the same favorites. However, if you use a drop-down menu, you can select only two URLs or a little more. In this case, our database can be optimized at the next level-Step 4, which is ignored by most developers, because it depends on a very special relationship-a many-to-many relationship, which has never been encountered in our applications.
Data Relationship

Before defining the fourth normalization form, I would like to mention three basic data relationships: one-to-one, one-to-many, and multiple-to-many. Let's look back at the first normalized users table. If we put the URL field in an independent table, every time we insert a record in the users table, we will insert a row in the URLs table. We will get a one-to-one relationship: each row in the User table will find the corresponding row in the URLs table. This is neither practical nor standard for our applications.

Then let's take a look at the second normalization example. For each user record, our table allows multiple URLs records to be associated with it. This is a one-to-many relationship, which is a common relationship.

The multi-to-many relationship is a bit complicated. In our third normalization form example, one of our users is related to many URLs, and we want to change this structure to allow multiple users to be related to multiple URLs, in this way, we can get a many-to-many structure. Before the discussion, let's take a look at the changes in the table structure.

Users
Userid name relcompid
1 Joe 1
2 Jill 2

Companies
COMPID company company_address
1 ABC 1 work Lane
2 XYZ 1 job Street

URLs
Urlid URL
1 abc.com
2 XYZ.com

Url_relations
Relationid relatedurlid relateduserid
1 1 1
2 1 2
3 2 1
4 2 2

To further reduce data redundancy, we use level 4 normalization. We have created a strange url_relations table, where the fields are primary keys or foreign keys. Through this table, we can eliminate duplicate items in the URLs table. The following are the specific requirements for the fourth normalization form:

Fourth normalization form

1. In a many-to-many relationship, an independent object cannot be stored in the same table.

Because it applies only to many-to-many relationships, most developers can ignore this rule. But in some cases, it is very practical. In this example, we separate the same entities and move the relationships to their own tables, this improves the URLs table.

To make it easier for you to understand, we will use an SQL statement to select all URLs belonging to Joe:

Select name, URL from users, URLs, url_relations where url_relations.relateduserid = 1 and users. userid = 1 and URLs. urlid = url_relations.relatedurlid

If we want to traverse the personal information and URL information of each person, we can do this:

Select name, URL from users, URLs, url_relations where users. userid = url_relations.relateduserid and URLs. urlid = url_relations.relatedurlid

Level 5 Regularization

There is also a form of first-level regularization, which is not common and profound, and is unnecessary in most cases. The principle is:

1. The original table must be able to be rebuilt through the table separated by it

The advantage of using this rule is that you can ensure that no extra columns are introduced in the separated table. All the table structures you create are as large as they actually need. Applying this rule is a good habit, but unless you want to process a very large data, you will not need to use it.

Hope this articleArticleIt is useful to you and can help you apply these normalization rules in all projects. You may want to know where these methods come from. I can tell you that the first three normalization rules were as follows: 1972, dr. e.f. codd proposed in his paper "further formalizing the relational model of the Database", and the rest of the Rules were developed by the later theory of set and relational mathematicians. Comment: the so-called thing-level inverse, it is sometimes not good to divide tables too carefully, because it is necessary to associate each table, this will make the query complex, and the efficiency may also be reduced, these normalization rules can be referred to. In actual applications, some tests can be conducted based on the project size and when necessary to design a more reasonable table structure.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.