Relational database First second third paradigm

Source: Internet
Author: User

Pick self-Knowledge Liu Yan teacher detailed, clear answer for years of doubts. Liu Comfort
Links: https://www.zhihu.com/question/24696366/answer/29189700
Source: Know

Updated on 2015-1-9, answered

@ Li Dezhu's problem ============2015-1-6 updated, supplemented the BCNF explanation ============ the majority of domestic institutions use the Wang Shan of the "Database system Introduction" This textbook, some aspects did not give a very detailed and clear explanation, and practical application is not so closely related , it is quite normal for you to have such a question. I've been teaching database principles for a few years, and a lot of students have raised questions like yours and tried to explain it to you. (basically from the content of my class, some places in order not to be too verbose, give up a certain degree of rigor, mainly in the "relationship" and "table")

The first step is to understand what the "paradigm (NF)" means. According to the textbook definition, the paradigm is "a set of relational patterns that conform to a certain level, which represents the degree of rationalization of the linkages between the various attributes within a relationship". Pretty obscure, huh? In fact, you can roughly interpret it as a level of design criteria that conforms to the table structure of a data table . Like home decoration buy building materials, the most environmentally friendly is the E0 class, followed by E1 class, there are E2 levels and so on. The database paradigm is also divided into 1nf,2nf,3nf,bcnf,4nf,5nf. Generally, when we design relational databases, it is sufficient to consider bcnf. A design that conforms to a high-level paradigm must conform to a low-level paradigm, such as a 2NF-compliant relational model, which must conform to 1NF.

The next step is to explain each level of the paradigm, first in the first paradigm (1NF).

Conforms to the 1NF relationship (you can understand it as a data table.) The difference between "relationship" and "relationship pattern" is similar to the difference between "class" and "Object" in object-oriented programming. "Relationship" is an example of "relational pattern", you can understand "relationship" as a table with data, and "relational mode" is the table structure of this data table. 1NF is defined as: each attribute in a relationship that conforms to 1NF cannot be divided. the situation shown in table 1 does not meet the requirements of 1NF.

Table 1

In fact,1NF is the most basic requirement for all relational databases , and when you create a data table in a relational database management system (RDBMS), such as SQL Server,oracle,mysql, if the datasheet design does not meet this most basic requirement, Then the operation must not be successful. That is, as long as the data table already exists in the RDBMS, it must be 1NF compliant. If we want to show the data in the RDBMS in the table, we have to design the form in table 2 :

Table 2

However, only 1NF design, there will still be too large data redundancy, insert exceptions, delete exceptions, modify the problem of the exception, for example, in table 3 design:

Table 3

    1. Each student's school number, name, department name, dean of the data are repeated several times. The data for each department and the corresponding Dean are repeated multiple times- too much data redundancy
    2. If the school has a new department but has not enrolled any students for the time being (for example, it was created in March, but not until August), it is not possible to add the data of the department name and the dean to the data sheet separately (note 1)-- Insert Exception Note 1: According to the requirements of entity integrity in three relationship integrity constraints, the code in the relationship (note 2) cannot be empty, and the combination of all attributes cannot be duplicated. In order to meet this requirement, the table in the diagram can only use the combination of the study number and the class name as the code, otherwise it will not be possible to divide each record in a unique region. NOTE 2: code: A property in a relationship or a combination of several attributes that distinguishes each tuple (a tuple can be understood as each record in a table, that is, each row).
    3. If the records of all the students in a department are deleted, then the data of all the department and department heads disappears (none of the students in the department does not mean that the department is gone). -- Delete exception
    4. If Ted Physico to the law department, then in order to ensure the consistency of data in the database, we need to revise three records of the Department and department director of the data. -- Modify the exception .

Because only 1NF database design has such problems, we need to improve the design standards, to eliminate the factors that lead to the above four problems, so that they conform to a higher-level paradigm (2NF), which is called "normalization."

The second paradigm (2NF) strict definition in relation theory I don't introduce much here (because it involves more bedding), only need to understand 2NF to 1NF of what improvements can be. The improvement is that2NF is based on 1NF, eliminating the partial function dependency of the non-main attribute on the code . Next, the four concepts involved in this sentence-"function dependency","code","Non-Principal attribute", and "partial function dependency" -are explained.

The function Dependency we can understand (but not particularly strict definition): If in a table, the value of the property (or attribute group) X is determined to determine the value of the property y, then you can say that the Y function depends on X, writing x→y. That is, no two records exist in the data table, they have the same value on the x attribute (or attribute group), and the value on the Y property is different. This is the origin of the "function dependent" name, similar to the function relationship y = f (x), where the value of x is determined, the value of Y must be deterministic.

For example, for the data in table 3, you cannot find any records, they have the same number and the corresponding names are different. So we can say that the name function depends on the school number , writing study number → name . But conversely, because students with the same name may appear, there may be different two student records, they have the same value on the name, but the corresponding number is different, so we can not say that the study number function depends on the name. Other function dependencies in the table are also as follows:

    • Department name → Department head
    • School Number → Department head
    • (School number, class name) → Score

However, the following function dependencies are not true:

    • School number → class name
    • Study number → Score
    • Class name → Department head
    • (School number, class name) → Name

There are also three concepts that can be expanded from the concept of "function dependency":

Full function dependency

In a table, if x→y, and for any true subset of X (if the attribute group X contains more than one attribute), X ' →y is not true, then we call Y for the X full function dependency , which is recorded as X f→y. (that F should be written right above the arrow, no Way out ...), correct notation ( Figure 1)

Figure 1

For example:

    • School Number f→ Name
    • (School number, class name) f→ score (note: Because the same school number corresponding to the score is not certain, the same class name corresponding to the score is not determined)

Partial function dependency

If the Y function is dependent on X, but at the same time Y is not a complete function dependent on X, then we call the Y function dependent on x, which is recorded as X P→y, as shown in Figure 2.


Figure 2


For example:

    • (School number, class name) p→ name

Transfer function Dependency if the Z function relies on y, and the Y function depends on X (thanks

@ The error indicated by the point, here instead: "Y is not included in X, and X does not depend on the function of Y" the premise), then we call the Z-transfer function depends on X, recorded as x t→z, such as Figure 3


Figure 3

The code set K is a property or group of attributes in a table, if all attributes except K are fully functional dependent on K (this "completely" do not miss), then we call K as Candidate Code , abbreviated to code . In practice we can usually understand that if K is determined, the value of all attributes except K is determined, then K is the code. there can be more than one code in a single table. (In the practical application for convenience, usually choose one of the code as the main code )

For example: For table 3,(school number, class name) This attribute group is the code. There is only one code in the table. (assuming that all classes have no duplicate name)

A non-primary property that contains a property in any one code becomes the primary property.

For example: For table 3, there are two main attributes, the number of students and the name of the class .


I can finally go back to see the 2NF. First of all, we need to determine whether table 3 meets 2NF requirements? According to the definition of 2NF, the basis of judgment is actually to see whether there is a partial function dependency of the non-main attribute on the code in the data table. If present, the data table meets the requirements of 1NF only, and if it does not exist, it meets the requirements of 2NF. The method of judging is:

The first step: Find all the codes in the data table. Step two: Find all the main properties based on the code obtained in the first step. Step three: In the datasheet, except for all the main attributes, the rest is a non-primary attribute . Fourth step: see if there is a partial function dependency of the non-primary attribute pair code.

For table 3, according to the previous four steps, we can do this:

The first step:

    1. View all of the individual properties, when its value is determined, whether all remaining property values are determined.
    2. View all attribute groups that contain two attributes, and when its value is determined, whether all remaining property values are determined.
    3. ......
    4. View all attribute groups that contain six attributes, that is, all attributes, when its value is determined, whether all remaining property values are determined.

It looks like a lot of trouble, but here's the trick: If A is a code, then all attribute groups that contain a, such as (A, B), (A,c), (a,b,c), and so on, are not yards (because there is a " full function dependency" in the code requirement).

Figure 4 shows all the function dependencies in the table:

Figure 4

This step is completed, you can get, table 3 of the code only one, that is (school number, class name).

The second step: There are two main attributes: study number and class name


Step three: There are four non-main attributes: name , department name , Dean, score


Fourth step: for (school number, class name) → name , there is a number → name , there is a non-main attribute name to the code (school number, class name) part of the function depends. For (school number, class name) → department name , there is a number → The name , there is a non-main attribute of the name of the Code (school number, class name) part of the functional dependence. For (school number, class name) → department director , there is the number of students → Department director , there are non-main attributes of the code (school number, class name) part of the function of dependency.

Therefore, table 3 existence of non-main properties of the code part of the function dependence, the highest only meet the requirements of 1NF, does not meet the requirements of 2NF.



In order for table 3 to meet 2NF requirements, we must eliminate these partial function dependencies, there is only one way, is to split the big data table into two or more smaller data tables, in the process of splitting, to achieve a higher level of the requirements of the paradigm, this process is called "pattern decomposition." The method of pattern decomposition is not unique, the following is one of the methods: elective course (school number, class name, score) students (student number, name, department name, dean)

Let us first determine the following, the choice of the table and the Student table, whether it meets the requirements of 2NF?

For the selection of the table, the code is (school number, class name), the main attribute is the number and class name , non-main attribute is the score , the number is determined, and can not uniquely determine the score , the class name is determined, also cannot determine the unique score , so there is no non-main attribute score for the Code (school number, class name) part of the function dependence, so this table meets the requirements of 2NF.

For the Student table, the code is the number, The main attribute is the number , the non-main attribute is the name, the department name and the dean, because the code has only one attribute, Therefore, it is not possible to have a non-primary attribute dependent on the partial function of the code, so this table conforms to the 2NF requirement.

Figure 5 shows the new function dependencies after the pattern decomposition

Figure 5

Table 4 shows the new data after the pattern decomposition


Table 4

(It also involves how a pattern decomposition is the correct knowledge point, not the first introduction)

Now let's take a look and do the same thing, do we still have those problems?

    1. Ted Physico to the law department you only need to modify the value of the corresponding system once Ted. -There are improvements
    2. is the data redundancy reduced? The student's name, department name and Dean are not repeated as many times as before. -There are improvements
    3. Deleting all the student records in a department all the information in the department is still missing. -No improvement
    4. Insert a new system with no students yet. This operation is not allowed because the student table code is a study number and cannot be empty. -No improvement

Therefore, only meet the requirements of 2NF, in many cases is not enough, and the cause of the problem is that there is still a non-principal attribute Department Director for the Code number of the transfer function dependency. In order to further solve these problems, we also need to improve the 2NF requirements of the data table to meet the requirements of 3NF.

Third paradigm (3NF) 3NF on the basis of 2NF, eliminate the non-main property of the code transfer function dependency . That is, if there is a non-primary property on the code's transfer function dependency, it does not meet the requirements of 3NF.

Let's look at the design in table 4 to see if it meets the requirements of 3NF.

For the selection table, the main code is (school number, class name), the main attribute is the school number and the class name, the non-main attribute only one, is the fraction, cannot exist the transfer function dependence, therefore the Choice table design, conforms to the 3NF request.

For student tables, the main code is the school number , the main attribute is the number , the non-main attribute is the name , the department name and the Dean. Because the study number → department name, at the same time the department name → Department director, therefore exists the non-main attribute department director to the Code science number 's transfer function dependence, therefore the Student table design, does not meet the 3NF request.

In order for the data table design to reach 3NF, we must further the pattern decomposition into the following form: Elective course (school number, class name, score) Student (school number, name, department name) Department (department name, dean)

For the selection table, meet the requirements of 3NF, has been analyzed before.

For the Student table, the code is the school number , the main attribute is the school number , the non-main attribute is the system name , cannot exist the non-main attribute to the code's transfer function dependence, therefore conforms to the 3NF request.

For the system table, the code is the system name , the main attribute is the system name , the non-main attribute is the Dean, It is not possible to have a non-primary property that is dependent on the pass function of the code (at least three properties are possible to have a transitive function dependency), so it conforms to the 3NF requirement:


New function Dependencies 6

Figure 6

The new data table is shown in table 5


Table 5


Now let's take a look and do the same thing, do we still have those problems?

    1. Deleting all student records in a department does not lose information about the department. -There are improvements
    2. Insert a new system with no students yet. Because the table and student table are currently independent of the two tables, so does not affect. -There are improvements
    3. Data redundancy is even less. -There are improvements

The conclusion is that the database design conforms to the 3NF requirement, which basically solves the problem that the data redundancy is too large, the inserting exception, the modification exception, the deletion exception. Of course, in practice, often for performance or to meet the needs of expansion, often do 2NF or 1NF, but as a database designer, at least should know, 3NF requirements.

============== after six months, finally decided to fill this hole, late ===========

BCNF Paradigm

To understand the BCNF paradigm, look at the question first:

If:

    1. A company has a number of warehouses;
    2. Each warehouse can have only one administrator, an administrator can only work in a warehouse;
    3. A warehouse can hold a variety of items, one item can also be stored in different warehouses. Each item has a corresponding quantity in each warehouse.

What is the primary paradigm of relational mode warehouses (warehouse name, Administrator, item name, quantity)?

A: Known function dependency set: Warehouse name → administrator, administrator → warehouse name, (warehouse name, item name) → Quantity Code: (Administrator, item name), (warehouse name, item name) main attribute: Warehouse name, Administrator, item name non-primary attribute: quantity ∵ There is no partial function dependency and transitive function dependency of non-primary attribute pair code. ∴ This relationship pattern belongs to 3NF.

Relationships based on this relational pattern (specific data) may:



Well, since this relationship pattern already belongs to 3NF, is there a problem with this relational model? Let's look at the following actions:

    1. A new warehouse is added first, but no items have been stored, can I assign an administrator to the warehouse? --No, because the item name is also the primary attribute, the primary attribute cannot be empty, depending on the entity integrity requirements.
    2. When a warehouse is emptied, you need to delete all the storage records associated with this warehouse, what is the problem? -the repository itself and the administrator's information are also deleted.
    3. What is the problem if an administrator is replaced in a warehouse? --This warehouse has several items stored in the record, it is necessary to modify how many times the administrator information.

From here we can conclude that in some special cases, even if the relational pattern conforms to the 3NF requirement, there is still a problem of inserting the exception, modifying the exception and deleting the exception, still not a "good" design.

The cause of this problem: there is a partial function dependency of the main attribute on the code and the transfer function dependency. (In this case there is a partial function dependency of the main attribute "warehouse name" for code "(Administrator, item name)".

The solution is to eliminate the main attribute from the part of the code and the transfer function dependency on the 3NF basis.

Warehouse (warehouse name, Administrator) inventory (warehouse name, item name, quantity)

This way, the previous insert exception, the problem of modifying the exception and removing the exception is resolved.

The above is the explanation about BCNF.


Recently the body is not very comfortable, can't write to move. Have time to put a few typical exercises and their solutions. =============================== Question 1:

Li Dezhu: Hello teacher, I read your answer about the database paradigm, there is little understanding, that is, about the definition of code, if all except K, all attributes are completely dependent on K to be called K, then in the judgment of 2NF, then how can there be a non-main attribute to the part of the function of the code dependency of this situation? I hope the teacher has time to point out, thank you

Me: In the definition of "code", all attributes except K should be considered as a set of U (i.e., a whole), that is, only k can fully function to determine each attribute in U, then K is the code. K is not a code if K can only be able to fully function to determine a subset of the attributes in U, and not the full function to determine another part of the property.

For example, there are relational modes R (Sno, Sname, Cno, Cname, Sdept, Sloc, Grade), where the function dependency set is f= {sno→sname, sno→sdept, Sdept→sloc,sno→sloc, Cno →cname, (Sno, Cno) →grade}

Then the code in R can only be (Sno, Cno), Sno or Cno can not fully determine all other properties except sno/cno (in fact, can not determine Grade), so the separate Sno and Cno can not be used as a code.

So you get the main attribute: Sno, cno non-main attribute: Sname, Cname, Sdept, Sloc, Grade

There is a partial function dependency (cno→cname) for the code (SNO, Cno) in the non-primary attribute Cname in R. (There are many other examples that are not listed.) Therefore R does not meet the requirements of 2NF.

========================================

It took me a few days to write the answer on and off and I was exhausted. See a lot of people have doubts about this, simply write a detailed point, hope to become this knowledge point authoritative answer ... If there are some details of the problem, such as the expression, will be modified, the big aspect, certainly is correct.

Relational database First second third paradigm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.