Problems solved by the five Paradigms of database table design

Source: Internet
Author: User
Tags contact form

The concept of database system was learned at school, and was uniformly indefinitely, and remained at the definition and certification level. Recently in the project, seriously understand the next database paradigm problem, only the subconscious understanding of its principles and applications to quickly design a reasonable table.

First, clear concepts are as follows:

The main code is also the primary key

Candidate code if the value of an attribute group in a relationship can uniquely identify a tuple, and any of its true subsets can no longer be identified, the attribute group is called a candidate code. Candidate code is not unique, the main code is one of them.

The main attribute is contained in any of the candidate keywords for the property called the main attribute

Secondly, it is also the main play, combined with examples, to explain the major paradigm of the former improvement and application scenarios.

Problems solved by paradigm in reality

1. Data redundancy

data redundancy refers to repeated occurrences of multiple or more attributes when one or more of the attributes in a table change. That is, there is a one-to-many or many-to-many data relationship between attributes and attributes, between attributes and attribute groups, or between attribute groups and attribute groups. As the following table:

course C teacher T reference book B
------------------------
Physics Li Yong General Physics
Physics Li Yong Optics principle
Physics of Li Yong Physics problem sets
General Physics of the physics of June
Physics Wang June Optics principle
Physical Wang June Physics problem sets
Mathematical analysis of Mathematics Li Yong
Differential equation of mathematical Li Yong
Mathematics Li Yong Advanced Algebra
Mathematical analysis of Mathematics Zhang Ping
Differential equation of mathematics Zhang Ping
Mathematics Zhang Ping Advanced Algebra
. . .
2. Insert Exception

The so-called Insert exception, which refers to an attribute, especially a primary key, is empty, and cannot be inserted even though other properties have data.

Assume that the selection relationship table is Selectcourse (school number, name, age, course name, score, credit), keyword for the combination of keywords (study number, course name), namely (school number, course name) → (name, age, score, credits). At the same time there is a connection:

(course name) → (credits)

(school number) → (name, age)

For this table selectcourse, it is assumed that a new course will be opened and no one has yet been enrolled. Thus, the course name and credits cannot be recorded in the database because there is no "learning number" keyword.

3. Delete exception

The so-called Delete exception, is that for some reason, you need to delete some useless properties in the table, but also will be some other useful properties of the data also deleted.

The same is true for the above table Selectcourse, when students graduate, they need to remove all the information from the system, as well as the information of the course will be deleted.

4. Update exception

The so-called update exception, that is, if the data of a property changes, it is necessary to modify the corresponding properties of a lot of data.

Also for the above table Selectcourse, if the credit of a course is adjusted, the "credits" value of all the rows in the data table should be updated, otherwise the same course credit will appear differently.

In short, these attributes of the data between these problems, can take up too much space, affect the efficiency of the query, and also for the subsequent programming brings unnecessary trouble, Sunline also affect the integrity of the database. For this reason, it is necessary to optimize the paradigm of these tables.

Two or six description of the large paradigm

The overall paradigm optimization diagram:

elimination of determining factors

| 1NF

Non-code non-trivial | ↓ elimination of partial function dependency of non-principal attribute pair code

function dependency

| 2NF

| ↓ Elimination of the transfer function dependency of the non-principal attribute pair code

| 3NF

| ↓ eliminate the partial and transitive function dependencies of the main attribute pair code

| BCNF

| ↓ Elimination of non-trivial and non-function dependent multi-valued dependencies

| 4NF

| ↓ eliminate connection dependencies that are not implied by the candidate code

  | 5NF


1.1NF (first paradigm)

The so-called first paradigm is that each attribute in a table is non-divided, that is, it cannot be divided into two columns. This is the basis for a relational database.

2.2NF (second normal)

The so-called second paradigm is that all non-principal attributes are completely dependent on the keyword. From this definition, it can be seen that the second paradigm does not have a partial dependency of the non-primary attribute on some candidate keywords, although there is a transitive dependency between the non-primary attributes.

The following is an example of an optimization of the second paradigm:

Assume that the selection relationship table is Selectcourse (school number, name, age, course name, score, credits), keyword for the combination of keywords (study number, course name), because of the following decision relationship:

(School number, course name) → (name, age, score, credits)

This database table does not meet the second normal form because of the following decision relationship:

(course name) → (credits)

(school number) → (name, age)

That is, the presence of a field in the combo key determines the non-keyword situation.

Because it does not conform to 2NF, the following questions exist for this class selection relationship:

(1) Data redundancy:

The same course by N students elective, "credit" repeated n-1 times, the same student elective m courses, name and age repeated m-1 times.

(2) Update exception:

If the credit of a course is adjusted, the "credits" value of all the rows in the data sheet should be updated, otherwise the same course credit will be different.

(3) Insert exception:

Suppose a new course is to be opened and no one has yet been enrolled. Thus, the course name and credits cannot be recorded in the database because there is no "learning number" keyword.

(4) Delete exception:

Assuming that a group of students has completed elective courses, these elective records should be removed from the database table. At the same time, however, the course name and credit information were also removed. Obviously, this can also lead to an insertion exception.

Change the course of the elective selectcourse to the following three tables:

Student: Student (school number, name, age);

Course: Course (course name, credits);

Elective relationship: Selectcourse (School number, course name, score).

Such database tables conform to the second paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions.

In addition, all single-key database tables conform to the second normal form, as there is no possible combination of keywords.

3. The third paradigm (3NF)

The so-called third paradigm is that each non-principal attribute is neither partially dependent nor transitive depending on the keyword, which is the elimination of transitive dependencies (A>B>C) on the basis of the second paradigm.

Assume that the Student relationship table is student (school number, name, age, school, college location, college phone), the keyword is a single keyword "study number" because of the following decision relationship:

(school number) → (name, age, school, college location, college phone)

This database is 2NF compliant, but does not conform to 3NF because of the following decision relationship:

(school number) → (school) → (college location, college phone)

That is, there is a non-critical field "College location", "College phone" to the key field "study number" of the transfer function dependency.

It also has data redundancy, update exceptions, insert exceptions, and delete exceptions, which readers can analyze on their own.

The Student relations table is divided into the following two tables:

Student: (School number, name, age, school);

College: (College, location, telephone).

Such database tables conform to the third paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions.

4, BCNF

The so-called bcnf, refers to the third paradigm based on the further elimination of the main attribute of the code of the partial function dependency and transfer dependence.

Suppose the Warehouse Management Relationship table is storehousemanage (warehouse ID, store item ID, Administrator ID, number), and an administrator works only in one warehouse, and a warehouse can store multiple items. The following decision relationships exist in this database table:

(Warehouse ID, store item id) → (Administrator id, quantity)

(Admin ID, store item id) → (warehouse ID, quantity)

So, (warehouse ID, store item ID) and (Administrator ID, store item ID) are the candidate keywords for storehousemanage, the only non-critical field in the table is the number, which is in accordance with the third paradigm. However, the following decision relationship exists:

(warehouse id) → (Administrator id)

(Administrator id) → (warehouse id)

That is, the key field determines the critical field, so it does not conform to the BCNF paradigm. It will have the following exception:

(1) Delete exception:

When the repository is emptied, all "store item ID" and "quantity" information is deleted, and the "Warehouse ID" and "Administrator ID" information are also deleted.

(2) Insert exception:

You cannot assign an administrator to a warehouse when no items are stored in the warehouse.

(3) Update exception:

If the warehouse has been replaced by an administrator, the administrator ID for all rows in the table is modified.

Break down the Warehouse management relationship table into two relational tables:

Warehouse Management: Storehousemanage (warehouse ID, administrator ID);

Warehouse: Storehouse (warehouse ID, store item ID, quantity).

Such database tables conform to the BCNF paradigm, eliminating deletion exceptions, insert exceptions, and update exceptions.

Another example, there is such an accessory management table WPE (WNO,PNO,ENO,QNT), wherein WNO represents the warehouse number, PNO represents the accessory number, ENO represents the employee number, QNT represents the number.

The following constraints are required:

(1) A warehouse has multiple employees;

(2) An employee only works in one warehouse;

(3) Each warehouse in one type of accessories by the person responsible, but one can manage several accessories;

(4) The same type of accessories can be placed in several warehouses.

To parse the function dependencies in the table, you can get:

(1) eno->wno;

(2) (Wno,pno)->qnt

(3) (Wno,pno)->eno

(4) (Eno,pno)->qnt

As you can see, the candidate keys are: (eno,pno);(wno,pno). Therefore, Eno,pno,wno are the primary property, and Qnt is a non-primary property. Obviously, a non-primary property is directly dependent on the candidate key. So this table satisfies the third paradigm.

And we'll look at the main attribute: (wno,pno)->eno; Eno->wno. Obviously Wno has a transitive dependency on the candidate key (Wno,pno), so it does not conform to BCNF.

The solution to this problem is to split into two tables: Management Table EP (ENO,PNO,QNT); sheet ew (ENO,WNO). But doing so causes the function dependency (wno,pno)->eno to be lost.

Although, not satisfying the bcnf, it also leads to some redundancy and consistency issues. However, decomposing a table into a table that satisfies bcnf may also lose some function dependencies. Therefore, it is generally not mandatory to require the relational table to satisfy the bcnf.

5, 4NF (fourth paradigm)

for the fourth paradigm, from a theoretical level, the relational mode R∈1nf, if for R for each non-trivial multivalued dependency x→→y (Y does not belong to x), X contains the candidate code, then r∈4nf. 4NF is a multi-valued dependency that does not allow non-trivial and non-functional dependencies between attributes that restrict relational patterns. Obviously a relational pattern is 4NF, then it must be BCNF.

That is, when non-primary properties in a table are independent of each other (3NF), these non-primary properties should not have multiple values. If there are multiple values, it violates the fourth paradigm.

The multi-valued dependencies mentioned here are described in the blog:

There is a user contact form telephone (Customerid,phone,cell). CustomerID for the user Id,phone for the user's landline, cell for the user's mobile phone.

Originally, this is a very simple form of the 3rd paradigm. The primary key is CustomerID and there is no delivery dependency. But in some cases, such a table is still unreasonable. For example, the user has two landline phones and two mobile phones. At this point, the table's specific representation is as follows:

CUSTOMERID

PHONE

CELL

1000

8828-1234

149088888888

1000

8838-1234

149099999999

Since the phone and cell are independent of each other, some users have two and more values. This table violates the IV paradigm.

In this case, the design of this table will cause a lot of maintenance trouble. For example, if the user abandons the first line of the landline and the second line of the mobile phone, are the two rows merged? Wait a minute

The solution to the problem is to design a new table New_phone (Customerid,number,type). This allows you to process multiple phone numbers of different types for each user without violating the IV paradigm.

Obviously, the application of the fourth paradigm is relatively small, because only in certain special cases, consideration should be given to the specification of the table to the Quaternary paradigm. Therefore, in the practical application, the table is generally not required to meet the IV paradigm.

6, 5NF (fifth paradigm)

the fifth paradigm (5NF): is the final paradigm. Eliminates connection dependencies in 4NF.

The five paradigms have the following requirements:

(1) must meet the IV paradigm

(2) tables must be decomposed into smaller tables, unless those tables logically have the same primary key as the original table.

The model five is further normalized on the basis of the fourth paradigm. The Quaternary paradigm deals with the multi-valued situations which are independent of each other, while the V paradigm deals with the multi-valued cases of interdependence.

There is a Sales Information table for sale (saleperson,vendor,product). Saleperson on behalf of sales staff, vendor on behalf of vendors, product is representative of products.

In some cases, there is some redundancy in this table. Tables can be decomposed into Person_vendor tables (Saleperson,vendor), person_product tables (saleperson,product), vendor_prodict tables (vendor,product)



Problems solved by the five Paradigms of database table design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.