Normalization-Database design principles

Last Update:2015-09-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The core problem of relational database design is the design of relational model. In this paper, we will introduce the process of database design normalization with specific examples.

Summary

Relational database is a widely used database type, relational database design is the process of organizing and structuring data, and the core problem is the design of relational model. For a small database size, we can easily handle the table structure in the database. However, as the size of the project grows, the database becomes more complex, and the relational model tables are more heterogeneous, we often find that the SQL statements we write are clumsy and inefficient. Worse, due to the unreasonable definition of the table structure, it can result in incomplete data when the data is updated. Therefore, it is necessary to learn and master the standardized process of the database, to guide us to better design the table structure of the database, reduce redundant data, so as to improve the storage efficiency of the database, data integrity and scalability. In this paper, we will introduce the process of database normalization with specific examples.

Preface

The purpose of this paper is to expound the principle of standardized database design through detailed examples. In DB2, concise and structure-clear table structure is very important to the design of database. A normalized table structure design that does not occur when inserting (insert), deleting (delete), and updating (update) are exceptions in future data maintenance. Conversely, the database table structure design is unreasonable, not only to the use and maintenance of the database brings a variety of problems, but also may store a large number of unnecessary redundant information, wasting system resources.

In order to design a standardized database, we need to do it according to the database design paradigm, which is the normative principle of database design. But some of the related materials mentioned in the paradigm design, often give a lot of formulas, which to the understanding and use of the designers caused some difficulties. Therefore, this article will combine the concrete image example, as far as possible to describe the three paradigms, and how to optimize the application in practical engineering.

Standardization

The key step in designing and manipulating the maintenance database is to ensure that the data is correctly distributed to the tables in the database. Using the right data structure not only facilitates access to the database, but it also greatly simplifies the rest of the application (queries, forms, reports, code, etc.). The formal name for proper table design is "database normalization". In the following we will illustrate the specific standardized works by example. For a definition of what a paradigm is, refer to article 1 in appendix.

Data redundancy

The data should be redundant as little as possible, which means that duplicate data should be minimized. For example, a department employee's phone call should not be stored in a different table, because the phone number here is an attribute of the employee. If there is too much redundant data, which means more physical space, but also the maintenance of data and consistency check problems, when the employee's phone number changes, redundant data will lead to more than one table update action, if there is a table unfortunate ignored, it can lead to data inconsistency.

Normalization instance

To illustrate the convenience, we will use a sample data table in this article to analyze the normalization process step-by-step.

First, let's start by generating the most original table.

CREATE TABLE"SAMPLE" ("Prjnum" )INTEGER  not NULL, "Prjname"VARCHAR( $), "Emynum"INTEGER  not NULL, "Emyname"VARCHAR( $), "Salcategory"CHAR(1), "Salpackage"INTEGER)            inch"USERSPACE1";ALTER TABLE"SAMPLE"ADD PRIMARY KEY("Prjnum", "Emynum");Insert  intoSAMPLE (Prjnum, Prjname, Emynum, Emyname, Salcategory, Salpackage)Values(100001,'TPMS',200001,'Johnson','A', -), (100001,'TPMS',200002,'Christine','B', the), (100001,'TPMS',200003,'Kevin','C',4000), (100002,'TCT',200001,'Johnson','A', -), (100002,'TCT',200004,'Apple','B', the);

Table 1-1

In table 1-1, we can see that there are six fields in this table, and the analysis of each field has duplicate values, that is to say, there is a data redundancy problem. This can potentially cause anomalies in data operations such as deletions, updates, and so on, so normalization is required.

First Paradigm

Referring to the definition of paradigm, examining the above table, we find that this table has satisfied the requirements of the first paradigm.

1, because the field in this table is a single attribute, can not be divided;

2, and the records of each row are not duplicated;

3, there is the main attribute, and all the attributes are dependent on the main attribute;

4. All the main attributes are already defined

In fact, in all of the current relational database management systems (DBMS), the first paradigm has been forced to be met in the construction of tables. Therefore, this sample table is already a table that satisfies the requirements of the first paradigm. In table 1-1, we first need to find out the primary key. As you can see, the attribute to <project number, Employee number> is the primary key, and all other properties depend on the primary key.

Transformation from one paradigm to the two paradigm

According to the definition of the second paradigm, conversion to two paradigm is to eliminate partial dependence.

In table 1-1, we can see that the <project name> part of the non-primary attribute depends on the <project in the primary key number>; Non-primary properties <employee name>,<salary category> and <salary package> are partially dependent on the <employee number>; in the primary key

In the form of table 1-1, there are the following potential problems:

1. Data redundancy: Each field has a value repetition;

2. Update exception: For example, the value of <project name> field, such as the value of "TPMS" modified, then one time to update the field of multiple values;

3. Insert exception: If a new project is created with the name TPT, but no Employee has joined, then <employee Number> will be vacant, and the field will be part of the primary key, so the record cannot be inserted;

Insert into SAMPLE (Prjnum, Prjname, Emynum, Emyname, salcategory, Salpackage) VALUES (100003, ' TPT ', NULL, NULL, NULL, NULL )

4. Delete exception: If an employee 200003, Kevin leaves, to delete the employee's record from the table, and the relevant Salary information C will also be lost, because there is no other line to record Salary C information.

Delete from sample where Emynum = 200003
Select distinct salcategory, salpackage from SAMPLE

Therefore, we need to separate the main and non-primary attributes of the existence of partial dependencies from the table that satisfies the first paradigm, forming a new table, and a one-to-many relationship between the new table and the old table. Thus, we get:

CREATE TABLE"PROJECT" ("Prjnum"INTEGER  not NULL, "Prjname"VARCHAR( $))         inch"USERSPACE1";ALTER TABLE"PROJECT"ADD PRIMARY KEY("Prjnum");Insert  intoPROJECT (Prjnum, Prjname)Values(100001,'TPMS'), (100002,'TCT');

Table 1-2

Table 1-3

CREATE TABLE"EMPLOYEE" ("Emynum" )INTEGER  not NULL, "Emyname"VARCHAR( $), "Salcategory"CHAR(1), "Salpackage"INTEGER)         inch"USERSPACE1";ALTER TABLE"EMPLOYEE"ADD PRIMARY KEY("Emynum");Insert  intoEMPLOYEE (Emynum, Emyname, Salcategory, Salpackage)Values(200001,'Johnson','A', -), (200002,'Christine','B', the), (200003,'Kevin','C',4000), (200004,'Apple','B', the); Employee NumberEmployee Name Salary Category Salary Package200001Johnson A -200002Christine B the200003Kevin C4000200004Apple B theCREATE TABLE" Prj_emy" ("Prjnum ")INTEGER  not NULL, "Emynum"INTEGER  not NULL)         inch"USERSPACE1";ALTER TABLE"Prj_emy"ADD PRIMARY KEY("Prjnum", "Emynum");Insert  intoPrj_emy (Prjnum, Emynum)Values(100001,200001), (100001,200002),(100001,200003), (100002,200001), (100002,200004);

At the same time, we extract the primary key of table 1-1, which is the respective primary key of table 1-2 and table 1-3, and form a single table to indicate the association between table 1-2 and table 1-3:

Table 1-4

At this point we look at table 1-2, 1-3, 1-4, we find that the insertion exception is no longer present, when we introduce a new project TPT, we only need to insert a data to table 1-2, we need to join the project TPT, we have to table 1-3, 1-4 in each plug into one piece of data. Although we have solved a big problem, we still find that there are problems in our careful observation.

Transformation from two paradigm to three paradigm

Looking at the three tables generated in front of the table, we found that table 1-3 has transitive dependencies, namely: Key fields < Employee number >--Non-critical fields < Salary Category >--Non-critical fields < Salary Package >. And this is not satisfied with the three-paradigm rule, there are the following deficiencies:

1, data redundancy: <salary category> and <salary package> values are duplicated;

2, update the exception: there is redundant information duplication, modify the need to modify multiple records at the same time, otherwise there will be inconsistent data;

3. Delete exception: Similarly, if the employee 200003 Kevin leaves the company, it will directly lead to the loss of Salary C information.

Delete from EMPLOYEE where emynum = 200003
Select distinct salcategory, salpackage from EMPLOYEE

So we need to proceed with the normalization process and take the table 1-3 apart, and we get:

Table 1-5

And

Table 1-6

If 200003 Kevin leaves the company, we just need to remove him from table 1-5, and the salary C information that exists in table 1-6 will not be lost. But we have to note that in addition to the information on Kevin in table 1-5, there is also Kevin's information in table 1-4, which is easy to understand because Kevin is involved in the project 100001, TPMS, and, of course, it is removed from it.

At this point, we have normalized the table 1-1 to get four tables that meet the constraints of the three paradigms, data redundancy, update exceptions, insert exceptions, and delete exceptions.

In the three paradigms, there are more strictly constrained BC paradigm and four paradigm, but these two forms are seldom used in commercial applications, in the vast majority of cases, the three paradigms have satisfied the requirements of database table normalization, effectively solve the data redundancy and maintenance operation of the abnormal problem.

Conclusion

In the process described in this paper, we deduce the process of normalization of data table by using the method of example, and show how to solve the problem of data redundancy and database operation anomaly in this process.

In the concrete engineering application, using the database normalization method to design the database table, will have the practical significance.

Resources

Database Normalization Basics: The basic principle of DB normalization
Normalization principles: Database Normalization Principle and paradigm definition

Transferred from: http://www.ibm.com/developerworks/cn/data/library/techarticles/dm-0605jiangt/

Normalization-Database design principles

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Normalization-Database design principles

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Normalization-Database design principles

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support