Database design Principles-Paradigm

Source: Internet
Author: User

Normalization-Database design principles

Summary

Relational database is a kind of database which is widely used at present, the design of relational database is the process of organizing and structuring data, the core problem is the design of relational model. For the smaller database size, we can easily handle the table structure in the database. However, as the scale of the project grows, the database becomes more complex and the relational model table structure is more complicated, and we tend to find that the SQL statements we write are clumsy and inefficient. Worse, because the table structure is poorly defined, it can cause incomplete data when the data is updated. Therefore, it is necessary to learn and master the normalization process of the database to guide us to better design the table structure of the database, reduce redundant data, which can improve the storage efficiency of the database, data integrity and scalability. This article will combine the concrete example, introduces the database normalization flow.

The purpose of this paper is to elaborate the standardized database design principles through a detailed example. In Oracle, concise, structure-clear table structure is very important to the design of the database. A normalized table structure design, in which future data maintenance does not occur when inserts (insert), delete (delete), and update (update) are abnormal. On the contrary, the database table structure design is unreasonable, not only will bring a variety of problems to the use and maintenance of the database, and may store a lot of unnecessary redundant information, waste system resources.

To design a standardized database requires us to do so according to the database design paradigm-the normative principle of database design. But some of the relevant materials mentioned in the form of design, is often given a large number of formulas, which gives the designer's understanding and application of a certain difficulty. Therefore, this article will combine the concrete image example, as far as possible to describe the three paradigms, and how to optimize the application in the actual project.

Normalization

The key step in designing and operating the maintenance database is to ensure that the data is correctly distributed to the tables in the database. Using the correct data structure not only facilitates the appropriate access to the database, but also greatly simplifies other aspects of the application (queries, forms, reports, code, and so on). The formal name for proper table design is "database normalization". Later we will illustrate the specific standardization of the project through examples. Refer to Appendix 1 for a definition of what a paradigm is.

Data Redundancy

The data should be as little redundant as possible, which means that duplicate data should be minimized. For example, a department employee's phone should not be stored in a different table because the phone number here is an attribute of the employee. If there is too much redundant data, this means taking up more physical space, there are also problems with data maintenance and consistency checking, and when this employee's phone number changes, redundant data can cause updates to multiple tables, and if one of the tables is unfortunately ignored, it can cause inconsistencies in the data.

Normalize An instance:

In order to illustrate the convenience, we will use a sample data table in this article to analyze the normalization process step by step.

First, let's start by generating a first table.

CREATE TABLE "SAMPLE" (

' Prjnum ' INTEGER not NULL,

"Prjname" VARCHAR (200),

' Emynum ' INTEGER not NULL,

"Emyname" VARCHAR (200),

"Salcategory" CHAR (1),

"Salpackage" INTEGER)

In "USERSPACE1";

ALTER TABLE "SAMPLE"

ADD PRIMARY KEY

("Prjnum", "Emynum");

Insert into SAMPLE (Prjnum, Prjname, Emynum, Emyname, Salcategory, Salpackage)

VALUES (100001, ' TPMS ', 200001, ' Johnson ', ' A ', Watts), (100001, ' TPMS ', 200002,

' Christine ', ' B ', 3000, (100001, ' TPMS ', 200003, ' Kevin ', ' C ', 4000), (100002,

' TCT ', 200001, ' Johnson ', ' A ', Watts, (100002, ' TCT ', 200004, ' Apple ', ' B ',

3000);

Table 1-1


Looking at table 1-1, we can see that there are six fields in this table, and that there are duplicate values in each field, which means there is a problem with data redundancy. This will potentially cause anomalies when data operations, such as deletes, updates, and so on, are required for normalization.

First Paradigm

Referring to the definition of the reference paradigm, we find that the table has met the requirements of the first paradigm.

1, because this table is a single attribute of the field, can not be divided;

2, and the records of each row are not duplicated;

3, there are main attributes, and all attributes are dependent on the main attribute;

4, all the main properties have been defined

In fact, in all of the current relational database management systems (DBMS), the first paradigm is enforced when the tables are being built. Therefore, this sample table is already a table that satisfies the requirements of the first normal form. Looking at table 1-1, we first need to find the primary key. As you can see, the attribute to <project number, Employee number> is the primary key, all other attributes depend on the primary key.

from the first paradigm into the second normal form

according to the definition of the second paradigm, the conversion to the two paradigm is to eliminate partial dependencies.

In table 1-1, we can see that the <project name> part relies on the <project in the primary key number>; Non-primary attributes <employee name>,<salary category> and <salary package> are all partially dependent on the <employee in the primary key number>;

In the form of table 1-1, the following potential problems exist:

1. Data redundancy: Each field has a duplicate value;

2. Update exception: For example, the value of <project name> field, such as the value "TPMS" modified, you need to update multiple values of the field;

3. Insert Exception: If you create a new project with the name TPT, but no Employee joins, then <employee number> will be vacant, which is part of the primary key and will not be able to insert the record;

Insert into SAMPLE (Prjnum, Prjname, Emynum, Emyname, salcategory, Salpackage) VALUES (100003, ' TPT ', NULL, NULL, NULL, NULL )

4. Delete exception: If an employee 200003, Kevin leaves, to remove the employee's record from the table, the associated Salary information C will also be lost because there is no other row to record the Salary C information.

Delete from sample where Emynum = 200003
Select distinct salcategory, salpackage from SAMPLE

Therefore, we need to separate the main and the non main attributes that have partial dependencies from the table that satisfies the first normal form, creating a new table, and a one-to-many relationship between the new table and the old one. From this, we get:

CREATE TABLE "PROJECT" (

' Prjnum ' INTEGER not NULL,

"Prjname" VARCHAR (200))

In "USERSPACE1";

ALTER TABLE "PROJECT"

ADD PRIMARY KEY

("Prjnum");

Insert into PROJECT (Prjnum, Prjname) VALUES (100001, ' TPMS '), (100002, ' TCT ');

Table 1-2




Table 1-3

CREATE TABLE "EMPLOYEE" (

' Emynum ' INTEGER not NULL,

"Emyname" VARCHAR (200),

"Salcategory" CHAR (1),

"Salpackage" INTEGER)

In "USERSPACE1";

ALTER TABLE "EMPLOYEE"

ADD PRIMARY KEY

("Emynum");

Insert into EMPLOYEE (Emynum, Emyname, salcategory, Salpackage) VALUES (200001,

' Johnson ', ' A ', ', ' (200002, ' Christine ', ' B ', 3000), (200003, ' Kevin ', ' C ',

4000), (200004, ' Apple ', ' B ', 3000);

Employee Number employee Name Salary Category Salary Package

200001 Johnson A 2000

200002 Christine B 3000

200003 Kevin C 4000

200004 Apple B 3000

CREATE TABLE "Prj_emy" (

' Prjnum ' INTEGER not NULL,

"Emynum" INTEGER not NULL)

In "USERSPACE1";

ALTER TABLE "Prj_emy"

ADD PRIMARY KEY

("Prjnum", "Emynum");

Insert into Prj_emy (Prjnum, Emynum) VALUES (100001, 200001), (100001, 200002),

(100001, 200003), (100002, 200001), (100002, 200004);

At the same time, we extracted the primary key of table 1-1, the respective primary keys of table 1-2 and table 1-3, and formed a separate table to show the association between table 1-2 and table 1-3:


Table 1-4

At this point we looked at table 1-2, 1-3, 1-4, we found that the insert exception no longer exists, when we introduce a new project TPT, we only need to insert a data to table 1-2, when there are new entrants to the project TPT, we need to table 1-3, 1-4, insert It's OK to enter a piece of data. Although we have solved a big problem, we still find that there are problems in our careful observation.

from the second normal form to the third normal form

After examining the three tables generated earlier in the table, we found that table 1-3 has a pass-through dependency, namely: Key fields < Employee number >--> non-critical fields < Salary Category >--> non-critical Field < Salary Package >. And this is not satisfied with the rules of the three paradigms, there are the following deficiencies:

1, data redundancy: <salary category> and <salary package> values are duplicated;

2, update the exception: there is duplication of redundant information, modify the need to modify a number of records at the same time, otherwise there will be inconsistent data;

3, delete the exception: Similarly, if the employee 200003 Kevin left the company, will directly lead to the loss of Salary C information.

Delete from EMPLOYEE where emynum = 200003
Select distinct salcategory, salpackage from EMPLOYEE

So we need to continue the normalization process, take the table 1-3 apart and we get:


Table 1-5

And

Table 1-6

Now if 200003 Kevin leaves the company, we just need to remove him from table 1-5, and the salary C information that exists in table 1-6 is not lost. But we have to note that in addition to the presence of Kevin's information in table 1-5, there is also Kevin's information in table 1-4, which is easy to understand because Kevin participates in the project 100001, TPMS, and, of course, is removed from it.

At this point, we will table 1-1 through the normalized steps to get four tables to meet the three normal form constraints, data redundancy, update exceptions, insert exceptions and delete exceptions.

In the third paradigm, there are more strictly restricted BC paradigm and four paradigm, but these two forms are rarely used in commercial applications, in most cases, the three paradigms have satisfied the requirements of database table normalization, effectively solve the data redundancy and maintenance operation of the abnormal problems.

End:

In the process described in this article, we use the method of example to deduce the normalization process of the data table, and show how to solve the problems such as data redundancy and abnormal database operation in this process.

In the concrete engineering application, it is of practical significance to use database normalization method to design database table.

References: Database normalization Basics: Basic principles for the normalization of databases

Normalization principles: The principle of database normalization and the definition of paradigm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.