For relational databases, understanding of the first and second third paradigm? _ Database

Source: Internet
Author: User

This chapter of the paradigm of database introduction is too conceptual and foggy to appreciate the explanation of the great God, included below:


Author: Knowledge of user links: https://www.zhihu.com/question/24696366/answer/29189700
Source: Know
Copyright belongs to the author. Commercial reprint please contact the author to obtain authorization, non-commercial reprint please indicate the source.

Most colleges and universities in China use the Wang Shan "Database system Introduction" This textbook, some aspects do not give a very detailed and clear explanation, and practical application is not so close, you have such a doubt is quite normal. I have taught the principles of the database for several years, and many students have raised the same questions as you, try to explain it to you. (basically from the content of my class, some places in order not too long-winded, gave up a certain degree of rigor, mainly in the "relationship" and "table" on)

The first thing to understand is what "paradigm (NF)" means. According to the definition in the textbook, The paradigm is "a set of relational patterns that conform to a certain level, representing the degree of rationalization of the connections among the attributes within a relationship". It's very obscure. You can actually interpret it roughly as the level of a design standard that a table structure of a datasheet conforms to. Like home decoration to buy building materials, the most environmentally friendly is the E0 level, followed by the E1 level, there are E2 level and so on. The database paradigm is also divided into 1nf,2nf,3nf,bcnf,4nf,5nf. In general, when we design a relational database, the most consideration is enough to bcnf. A design that conforms to a higher-level paradigm must conform to a lower-level paradigm, such as a relational model conforming to 2NF, which must conform to 1NF.

The next step is to explain each level of paradigm, first of all, the first paradigm (1NF).

Conforms to the 1NF relationship (you can understand it as a datasheet.) The difference between "relational mode" and "relationship" is similar to the difference between "class" and "Object" in object-oriented programming. "Relationship" is an example of a "relational schema" that you can interpret as a table with data, and "relational mode" is the table structure of this datasheet.

1NF is defined as: each attribute in a relationship that conforms to 1NF cannot be divided.

The situation shown in table 1 does not meet the requirements of the 1NF.

Table 1


In fact, 1NF is the most basic requirement of all relational databases, when you create data tables in a relational database management system (RDBMS), such as SQL Server,oracle,mysql, the operation must not be successful if the design of the datasheet does not meet this basic requirement. That is, as long as the data table already exists in the RDBMS, it must be in line with 1NF. If we want to represent the data in a table in an RDBMS, we have to design the form in Table 2:

Table 2


But only 1NF design, there will still be data redundancy is too large, insert exceptions, delete exceptions, modify the problem of exceptions, such as for the design in Table 3: Table 3
Each student's school number, name, department name, department director the data is repeated several times. Each department and the corresponding Dean's data are repeated many times-data redundancy is too large if the school has a new department, but for the time being, no students have been enrolled (for example, March is new, but not until August), it is not possible to add the department name and the dean's data separately to the datasheet (note 1)--Insert an exception

Note 1: According to the requirements of entity integrity in three relational integrity constraints, any one of the properties contained in the Code (note 2) of the relationship cannot be null, and the combination of all attributes cannot be duplicated. In order to meet this requirement, the table in the figure can only be the combination of the school number and class name as a code, otherwise it will not be able to separate each record of the region.
NOTE 2: Code: A property in a relationship or a combination of several attributes that distinguishes each tuple (you can interpret the tuple as each record in a table, that is, each row). If the records of all the students in a department are deleted, then the data of all the department and Dean will disappear (none of the students in one department is missing, which does not mean that the system is gone). --Delete exception if Ted transfer to the law department, then in order to ensure the consistency of the data in the database, we need to revise the data of the Department and dean in the three records. --Modify the exception.

Because of the problem of only 1NF database design, we need to improve the design standards, remove the factors that lead to the above four problems and make it conform to a higher level paradigm (2NF), which is called "normalization".

The strict definition of the second normal form (2NF) in relation theory I don't have much to introduce here (because it involves a lot of bedding), only need to understand 2NF to 1NF made what improvements can. The improvement is that 2NF is based on 1NF, eliminating the partial function dependence of the non main attribute for the code. Next, the four concepts involved in this sentence--"function dependency", "Code", "Non-master attribute", and "partial function dependency" are explained.

function dependencies
We can understand it this way (but not specifically): In a table where the value of an attribute (or attribute group) x is determined, the value of the attribute Y must be determined, so you can say that the Y function relies on X, writing x→y. That is, there are no two records in the datasheet, they have the same values on the X property (or group of properties), and the values on the Y property are different. This is the origin of the name "function dependency", similar to the function relation y = f (x), and the value of Y must be determined in the case where the value of x is determined.

For example, for the data in table 3, no records were found, and their numbers were the same and the corresponding names were different. So we can say that the name function depends on the school number, the writing number → the name. But conversely, because there may be students with the same name, there may be different two student records, they have the same value on the name, but the corresponding school number is different, so we can not say that the number function depends on the name. The other function dependencies in the table are as follows: Department name → Department head number → department Director (School number, class name) → Score

However, the following function dependencies are not valid: School number → class name → grade class name → Department Director (School number, class name) → Name

Starting with the concept of functional dependencies, there are three more concepts:

Full function dependencies

In a table, if x→y, and for any true subset of X (if the attribute group X contains more than one attribute), X ' →y does not hold, then we call Y for the X complete function dependency, which is recorded as X f→y. (that F should be written on the top of the arrow, no way to play ..., the correct wording is shown in Figure 1)

Figure 1

For example: School number f→ Name
f→ Score (note: Because the same number of the corresponding score is not sure, the same class name corresponding to the score is also uncertain)

Partial function dependencies

If the Y function relies on X, but at the same time Y does not have a complete function that relies on X, then we call the Y part function dependent on X, which is recorded as X P→y, as shown in Figure 2.


Figure 2


For example: (School number, class name) p→ name


Transfer function dependencies
If the Z function relies on Y, and the Y function relies on X (thank you for the error that the @ Hundred points out, here instead: "Y is not included in X, and X does not function dependent on the Y" premise), then we call the Z-transfer function to be X-t→z, as shown in Figure 3.


Figure 3

Code
Let K be a property or group of attributes in a table, and if all attributes except K are fully functional dependent on K (this "complete" does not leak), then we call K the candidate code, or code for short. In practice we can usually understand that if K is determined, the value of all the attributes of the table except K is determined, then K is the code. There can be more than one code in a single table. (In practical applications for convenience, usually choose one of the code as the main code)

For example:
For table 3, the attribute group is the code (school number, class name). This is the only code in the table. (assuming that all classes have no duplicate names)

Non-primary properties
Attributes that are contained in any one code become primary properties.

For example:
For table 3, there are two main attributes, the number of which is the name of the class.


I can finally come back to see the 2NF. First, we need to determine whether table 3 meets the requirements of 2NF. According to the definition of 2NF, the basis of judgment is actually to see whether there is a partial function dependency of the non main attribute for the code in the data table. If there is, then the highest data table only in line with the requirements of 1NF, if it does not exist, then meet the requirements of 2NF. The method of judgment is:

First step: Find all the code in the datasheet.
Step two: According to the code obtained in the first step, find all the main properties.
Step three: In the datasheet, all the main properties are dropped, and the rest is not the primary attribute.
Step Fourth: See if there is a partial function dependency of a non-primary property pair code.

For table 3, according to the four steps mentioned above, we can do this:

The first step is to view all of the individual properties and, when its value is determined, whether all remaining attribute values are determined. View all property groups that contain two attributes, and if the value is determined, all remaining property values are determined. ...... View all the property groups that contain six attributes, all attributes, and determine whether all the remaining property values are available when its value is determined.

Looks like a lot of trouble, doesn't it? But here's the trick: If A is a code, then all the attribute groups that contain a, such as (A,b), (A,c), (a,b,c), etc., are not yards (because there is a "full function dependency" in the code's requirements).

Figure 4 shows all the function dependencies in the table:

Figure 4

This step is completed, you can get, table 3 of the code is only one, that is (study number, class name).

Step Two:
There are two main attributes: School number and class name


Step Three:
There are four non-main attributes: Name, department name, department head, score


Fourth Step:
For (school number, class name) → name, there is a number → name, there is a non-main attribute name on the code (school number, class name) part of the function dependence.
For (school number, class name) → department name, there is a number → department name, there is a non-main attribute system name pairs of code (school number, class name) part of the function dependence.
For (school number, class name) → Dean, there is a number → Dean, there are non-main attributes on the code (school number, class name) part of the function of dependence.

Therefore, table 3 has a partial function dependency on the code, the highest only meet the requirements of 1NF, does not meet the requirements of 2NF.



To allow table 3 to meet the requirements of 2NF, we have to eliminate these partial function dependencies, only one way is to split large data tables into two or more smaller data tables, in the process of splitting, to achieve a higher level of paradigm requirements, this process is called "pattern decomposition." The method of schema decomposition is not unique, and the following are one of the methods:
Selected Course (school number, class name, score)
Student (school number, name, department name, department head)

We first judge whether the timetable and the student table meet the requirements of 2NF.

For the selected timetable, the code is (school number, class name), the main attribute is the number and class name, non-main attribute is a score, the number is determined, and can not only determine the score, class name is determined, also can not be unique to determine the score, so there is no non-primary attribute score for the code (school number, class name) Partial function dependencies, so this table meets the requirements of 2NF.

For the student table, its code is the number, the main attribute is the number, not the main attribute is the name, the name and the dean, because the code has only one attribute, so there is no main attribute for the code part of the function dependency, so this table in line with the requirements of 2NF.

Figure 5 shows a new function dependency after schema decomposition

Figure 5

Table 4 shows the new data after the pattern decomposition

Table 4

(here also involves a how to do the pattern decomposition is the correct knowledge point, first does not introduce)

Now let's take a look at the same operation and whether there are any previous problems. Ted Transfer to law department
You only need to modify the value of a Ted corresponding system. --has improved data redundancy been reduced.
The student's name, name and department head are not repeated as many times as before. --Improved deletion of all student records in a department
The information of the Department is still lost. --no improvement to insert a new system with no students.
This operation is not allowed because the code for the student table is a school number and cannot be empty. --No improvement

Therefore, only in line with the requirements of 2NF, many cases are not enough, and the cause of the problem is that there is still the director of the non-master attribute system for the transfer function dependence of the code number. In order to solve these problems further, we need to improve the data table conforming to the 2NF requirement to meet the requirements of 3NF.

The third normal form (3NF) 3NF is based on 2NF, eliminating the dependence of the non main attribute on the transfer function of the code. That is, if a non-primary attribute is dependent on the transfer function of a code, it does not conform to the requirements of 3NF.

Let's look at the design in table 4 to see if it meets the requirements of 3NF.

For the timetable, the main code for (school number, class name), the main attribute for the number and class name, not the main attribute only one, for the score, there is no transfer function dependencies, so the design of the timetable, in line with the requirements of the 3NF.

For the student table, the main code is the school number, the main attribute is the school number, and the non main attribute is the name, the name and the dean of the Department. Because the school number → department name, at the same time the department name → Dean, so there is a non-master attribute department director for the transfer function dependence of the code number, so the design of the student table, do not meet the requirements of 3NF.

In order for the data table to be designed to 3NF, we must further decompose the pattern into the following form:
Selected Course (school number, class name, score)
Student (school number, name, department name)
Department (department name, department head)

For the selected timetable, in line with the requirements of 3NF, has been analyzed before.

For the student table, the code is the number, the main attribute is the school number, the non main attribute is the system name, it is impossible to exist the non main attribute for the code transfer function dependence, therefore conforms to the 3NF request.

For the Line table, the code is the system name, the main attribute is the system name, the non-main attribute is the Dean, it is impossible to have the non-main attribute to the code transfer function dependency (at least three attributes are possible to have transitive function dependencies), so the requirement of 3NF is satisfied.


The new function dependencies are shown in Figure 6

Figure 6

The new datasheet is shown in table 5


Table 5


Now let's take a look at the same operation and whether there are any previous problems. Delete all student records in a department
Information about the system is not lost. --Improved insertion of information on a new system with no students.
Because the list and the student table are now separate two tables, so does not affect. Improved data redundancy is even less. --There are improvements


Conclusion
Thus, in line with the requirements of the 3NF database design, basically solve the data redundancy is too large, insert the exception, modify the exception, delete the problem of abnormal. Of course, in practice, often in order to performance or to meet the needs of expansion, often do 2NF or 1NF, but as a database designer, should at least know what the 3NF requirements are.

============== after six months, finally decided to fill in this pit, late ===========

BCNF Paradigm

To understand the BCNF paradigm, first look at the question:

If: A company has a number of warehouses;
There can only be one administrator per warehouse, and an administrator can only work in a single warehouse;
A warehouse can store a variety of items, and one item can be stored in different warehouses. Each item has a corresponding quantity in each warehouse.

Then the relational schema warehouse (warehouse name, admin, item name, quantity) is the first-level paradigm.

A: Known function dependencies set: Warehouse name → admin, admin → warehouse name, (warehouse name, item name) → Quantity
Code: (admin, item name), (warehouse name, item name)
Main attributes: Warehouse name, admin, name of property
Non-primary attributes: Quantity
∵ does not exist partial function dependencies and transitive function dependencies of the non-primary attribute pairs of code. ∴ This relational pattern belongs to 3NF.

Relationships based on this relational pattern (specific data) might look like the following illustration:



Well, since this relational pattern is already 3NF, is there a problem with this relational pattern? Let's take a look at the following operations: Add a new warehouse first, but have no items stored, or assign an administrator to the warehouse. --No, because the item name is also the primary property, the primary attribute cannot be empty, depending on the entity integrity requirements. After a warehouse is emptied, it is necessary to delete all the items that are related to the warehouse and what will be the problem. The information of the warehouse itself and the administrator is also deleted.
If a warehouse has been replaced by an administrator, what is the problem? --This warehouse has several items to keep records, it is necessary to modify how many times administrator information.

From here we can conclude that in some special cases, even if the relational pattern meets the requirements of 3NF, there is still the problem of inserting exception, modifying exception and deleting exception, still not "good" design.

The cause of this problem is that the main attribute is dependent on partial function and transfer function of code. (In this case, there is a partial function dependency for the main attribute "warehouse name" for code "(admin, Item name)."

The solution is to eliminate the main attribute on the basis of 3NF to the part of the code and the transfer function dependencies.

Warehouse (warehouse name, admin)
Stock (warehouse name, item name, quantity)

In this way, before the insertion exception, the problem of modifying the exception and removing the exception is resolved.

The above is the explanation about BCNF.


Recently, the body is not very comfortable, can't write. A few more typical exercises and their answers when you are free.
===============================
Question 1:

Li Dezhu: Hello, teacher, I read your answer about database paradigm, one thing is not quite understood, is about the definition of code, if all the attributes except K are fully functional dependent on K to call K as code, then in the judgment of 2NF, how can there be some function of the non-main attribute pairs of code dependent on this situation. I hope the teacher has time to enlighten, thank you

Me: In the definition of "code", all attributes except K should be considered as a set U (i.e., a whole), that is, only K is able to determine every attribute of u in a complete function, then K is the code. K is not a code if K is only able to determine a part of a property in a full function, not a complete function to determine another part of the property.

For example, there are relational schemas R (Sno, sname, Cno, Cname, Sdept, Sloc, Grade), where the function dependency set is f= {
Sno→sname, Sno→sdept, Sdept→sloc,sno→sloc, Cno→cname, (Sno, Cno) →grade}

So the code in R can only be (Sno, Cno), Sno or Cno does not fully function to determine all other attributes except Sno/cno (in fact, can not determine Grade), so the individual Sno and Cno can not be used as code.

So you can get the main attribute: Sno, Cno
Non-primary attributes: Sname, Cname, Sdept, Sloc, Grade

There is a partial function dependency (cno→cname) of the Cname attribute for the code (SNO, Cno) in R. (There are many other examples that are not listed). So R does not meet the requirements of 2NF.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.