Database paradigm-General handouts and database paradigm Handouts

Source: Internet
Author: User

Database paradigm-General handouts and database paradigm Handouts
Wang Shan's introduction to database systems, a textbook used by most colleges in China, does not provide a very detailed and clear explanation in some aspects, and is not closely related to practical applications, therefore, there are often many questions about learning paradigm. There are also many people who have raised a wide variety of questions. Now, let's try to explain them here and try to make it easy for everyone to understand.

First, we must understand what the "Paradigm (NF)" means. According to the definition in the textbook, the paradigm is a set of relational patterns that conform to a certain level, indicating the degree to which the associations between attributes within a link are rationalized ". Obscure? In fact, you can roughly understand it as a level of design standards that a table structure of a data table conforms. Like buying building materials for home decoration, E0 is the most environmentally friendly, followed by E1 and E2. The database paradigm is also divided into 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF. Generally, BCNF is enough when we design a relational database. In line with the design of the high level paradigm, it must be in line with the low level paradigm, for example, in line with the Relationship Model of 2NF, must comply with 1NF.

Next, we will explain each level-1 paradigm. The first is 1NF ).
1 NF-compliant relationship (you can understand it as a data table. The difference between "Link" and "link mode" is similar to the difference between "class" and "object" in Object-Oriented Programming ." The Link "is an instance of" link mode ". You can regard" Link "as a table with data, while" link mode "is the table structure of this data table. 1NF is defined as: Each attribute in the relationship that conforms to 1NF cannot be further divided. As shown in table 1, 1NF is not met.
Table 1

In fact, 1NF is the most basic requirement of all relational databases. When you create data tables in relational database management systems (RDBMS), such as SQL Server, Oracle, and MySQL, if the design of the data table does not meet this basic requirement, the operation must fail. That is to say, as long as the data table already exists in RDBMS, it must be 1NF. If we want to present the table data in RDBMS, we have to design it as table 2:
Table 2

However, the design that only complies with 1NF still has problems such as excessive data redundancy, insertion exceptions, deletion exceptions, and modification exceptions. For example, for the design in table 3:
Table 3

  1. The student ID, name, Department name, and department head are repeated multiple times. The data of each department and its department head is repeated multiple times --Excessive data redundancy
  2. If the school has created a new department, but it has not yet recruited any students (for example, it was created in February, but it will not be enrolled until February ), therefore, the data of the Department name and department head cannot be separately added to the data table (Note 1 )--Insertion exception

    Note 1: according to the requirements of entity integrity in the three link integrity constraints, any attribute contained in the Code in the Link (note 2) cannot be empty, and the combination of all attributes cannot be repeated. To meet this requirement, the tables in the figure can only combine the student ID and Course name as codes, otherwise each record cannot be located in a unique region.

    NOTE 2:Code: a link or a combination of several attributes, used to distinguish each tuples(You can think of "tuples" as every record in a table, that is, every row).
  3. If we delete the records related to all the students in a department, the data of all the departments and department heads will disappear. (all the students in a department will be absent, it does not mean that this system is gone ). --Deletion exception
  4. If Li Xiaoming is transferred to the Legal Department, in order to ensure data consistency in the database, it is necessary to modify the data of the Department and Department Head in three records. --Modification exception.
Because there is such a problem in the database design that only complies with 1NF, we need to improve the design standards and remove the factors that cause the above four problems, make it conform to a higher Paradigm (2NF), which is called "normalization ".

I will not describe the strict definition of the second Paradigm (2NF) in the relational theory here (because there are many preparations involved). I only need to know what improvements 2NF has made to 1NF. The improvement is that 2NF eliminates some function dependencies of non-primary attributes on codes based on 1NF. Next, we will explain the four concepts involved in this sentence: "function dependency", "code", "non-primary attributes", and "partial function dependency.

Function dependency
We can understand this (but it is not strictly defined): If the value of attribute (or attribute group) X is determined in a table, the value of attribute Y must be determined, so we can say that the Y function depends on X and writes X → Y. That is to say, there are no two records in the data table. They have the same value in the X attribute (or attribute group), but different values in the Y attribute. This is the origin of the "function dependency" name, similar to the Function Relation y = f (x). When the value of x is determined, the value of y must be determined.

For example, if no record is found for the data in table 3, their student IDs are the same and their names are different. So we can say that the name function depends on the student ID, writing student ID → name. However, because there may be students with the same name, there may be two different student records with the same name value but different student IDs, therefore, we cannot say that the student ID function depends on the name. Other function dependencies in the table are as follows:
  • Department name → department head
  • Student ID → department head
  • (Student ID, course name) → score
However, the following function dependency is not true:
  • Student ID → Course name
  • Student ID → score
  • Course name → department head
  • (Student ID, course name) → name
From the concept of "function dependency", there are also three concepts:
Full function dependency
In a table, if X → Y and any real subset of X (if Attribute Group X contains more than one attribute), x' → Y is not true, in this case, we call Y fully functional dependency on X, which is recorded as x f → Y. (That F should be written on the top of the arrow and cannot be typed ......, Figure 1)
Figure 1

For example:
  • Student id f → name
  • (Student ID, course name) F → score (Note: The score corresponding to the same student ID is not determined, and the score corresponding to the same class name is not determined)
Some function dependencies
If the Y function depends on X, but Y is not completely dependent on X, then we call Y function dependent on X, which is recorded as x p → Y, as shown in figure 2.

Figure 2


For example:
  • (Student ID, course name) P → name

Pass function dependency
Assume that the Z function depends on Y, and the Y function depends on X (strictly speaking, there is another X that does not contain Y, and the Y function does not depend on the Z precondition ), in this case, the Z-passing function depends on X and is recorded as x t → Z, as shown in figure 3.


Figure 3

Code
Set K to an attribute or attribute group in a table. If all the attributes except K are completely dependent on K (this "completely" should not be omitted ), then we call K as the candidate code. In practice, we can generally understand that, if K is determined, all the attribute values of the table except K are determined accordingly, then K is the code. A table can have more than one code. (For convenience, one of the codes is usually used as the master code)

For example:
For Table 3, the attribute group (student ID and Course name) is a code. This table only has this code. (Assume that no duplicate names exist in all courses)

Non-Primary attribute
The attribute contained in any code is the primary attribute.

For example:
For Table 3, there are two primary attributes: Student ID and Course name.


Finally, we can look back at 2NF. First, we need to determine whether Table 3 meets 2NF requirements? According to the definition of 2NF, the judgment is based on checking whether there is a function dependency of non-primary attributes on codes in the data table. If yes, the data table can only meet the requirements of 1NF. If no, the data table meets the requirements of 2NF. The judgment method is as follows:

Step 1: Find all the codes in the data table.
Step 2: locate all primary attributes based on the Code obtained in step 1.
Step 3: Remove all primary attributes from the data table, and the rest will be non-primary attributes.
Step 4: Check whether some function dependencies of non-primary attributes on codes exist.

For Table 3, we can do this according to the preceding four steps:
Step 1:
  1. Check whether all the remaining attribute values are determined when the value of each attribute is determined.
  2. Check all the attribute groups that contain two attributes. When its value is determined, whether all the remaining attribute values can be determined.
  3. ......
  4. Check all the attribute groups that contain six attributes. If the value of the attribute group is determined, check whether all the remaining attribute values are determined.
It seems very troublesome, right? But here is A trick, that is, if A is A code, then all the attribute groups that contain A, such as (A, B), (A, C), (A, B, C), and so on, are not codes (because there is A "full function dependency" in the code requirement ").

Figure 4 shows all functional dependencies in the table:
Figure 4

After completing this step, we can see that table 3 has only one code (student ID and Course name ).

Step 2:
There are two main attributes: Student ID and Course name.


Step 3:
There are four non-primary attributes: name, Department name, Department Head, score


Step 4:
For function dependencies (student ID, course name) → name, student ID → name, there are some non-master attribute name pairs (student ID, course name.
For function dependencies (student ID, course name) → department name, student ID → department name, there are some non-main attribute system name pairs (student ID, course name.
For function dependencies (student ID, course name) → department head, student ID → department head, there are some non-master attribute pairs (student ID, course name.

Therefore, Table 3 has some function dependencies of non-primary attributes on codes. The maximum value is 1NF, not 2NF.



To make table 3 conform to 2NF requirements, we must eliminate these functional dependencies. There is only one way to split a large data table into two or more smaller data tables, in the process of splitting, to meet the requirements of a higher level paradigm, this process is called "pattern decomposition". The pattern decomposition method is not unique. one of the following methods is as follows:
Course Selection (student ID, course name, score)
Student (student ID, name, Department name, Department Head)

Let's first determine whether the course selection table and student Table Meet the 2NF requirements?

For the course selection table, the code is (student ID, course name), the primary attribute is student ID and Course name, the non-primary attribute is score, the student ID is determined, and the score cannot be uniquely determined, the course name is determined, and the score cannot be uniquely determined. Therefore, the non-primary attribute score does not depend on some functions of the Code (student ID, course name). Therefore, this table meets 2NF requirements.

For a student table, the student ID is the student ID, the primary attribute is the student ID, and the non-primary attribute is the name, Department name, and department head. Because the Code has only one attribute, therefore, it is impossible for non-primary attributes to depend on some code functions. Therefore, this table meets 2NF requirements.

Figure 5 shows the new function dependency after schema Decomposition
Figure 5

Table 4 indicates new data after schema Decomposition

Table 4

(The correct knowledge point is how to break down the pattern here. I will not introduce it first)

Now let's take a look at whether the same operation still has the previous problems?
  1. Li Xiaoming transferred to the Legal Department
    You only need to modify the system value of Li Xiaoming once. -- Improvements
  2. Is data redundancy reduced?
    The student's name, Department name, and department head are no longer repeated as before. -- Improvements
  3. Delete all student records in a system
    All information about this system is still lost. -- No improvement
  4. Insert information about a new department without any students.
    Because the student table code is a student ID and cannot be blank, this operation is not allowed. -- No improvement
Therefore, it is not enough to meet the 2NF requirement in many cases. The reason for the problem is that the non-primary attribute department head still has a function dependency on passing the student ID. To further solve these problems, we also need to improve the 2NF compliant data table to 3NF compliant.

3NF)
Based on 2NF, 3NF eliminates the function dependency of non-primary attributes on code passing. That is to say, if a non-primary attribute has a function dependency on code passing, it does not meet the 3NF requirement.

Next, let's see if the design in Table 4 meets the 3NF requirements.

For the course selection table, the primary code is (student ID, course name), the primary attribute is student ID and Course name, and the non-primary attribute has only one score, which makes it impossible to transmit function dependencies, therefore, the course selection table is designed to meet the 3NF requirements.

For student tables, the master code is the student ID, the master attribute is the student ID, and the non-master attribute is the name, Department name, and department head. Because the student ID → department name, and department name → department head, there is a function dependency on the transfer of student ID for the non-master attribute department head. Therefore, the student table design does not meet the 3NF requirements ..

To design a data table to 3NF, we must further break down the schema into the following forms:
Course Selection (student ID, course name, score)
Student (student ID, name, Department name)
Department (Department name, Department Head)

The course selection table meets the 3NF requirements and has been analyzed before.

For a student table, the student ID code is the student ID, the primary attribute is the student ID, and the non-primary attribute is the department name. It is impossible for non-primary attributes to have function dependencies on code passing, so it meets the 3NF requirements.

For a department table, the Code is the department name, the primary attribute is the department name, and the non-primary attribute is the department head, it is impossible for non-primary attributes to have code-passing function dependencies (at least three attributes are required for passing function dependencies). Therefore, 3NF is required ..


New function dependency 6
Figure 6

New data tables, such as table 5


Table 5


Now let's take a look at whether the same operation still has the previous problems?
  1. Delete all student records in a system
    The system information will not be lost. -- Improvements
  2. Insert information about a new department without any students.
    Because the system table and student table are currently two independent tables, it is not affected. -- Improvements
  3. Data redundancy is reduced. -- Improvements

Conclusion
It can be seen that the 3NF-compliant database design basically solves the problems of excessive data redundancy, insertion exceptions, modification exceptions, and deletion exceptions. Of course, in practice, 2NF or 1NF is often used to meet performance or scaling needs. But as a Database Designer, we should at least know what the 3NF requirement is.


BCNF paradigm
To understand the BCNF paradigm, let's take a look at the following question:
If:
  1. A company has several warehouses;
  2. Each Warehouse can only have one administrator, and one administrator can only work in one warehouse;
  3. Multiple items can be stored in one warehouse, and one item can be stored in different warehouses. Each item has a corresponding quantity in each warehouse.
What kind of paradigm does the relational model repository (repository name, Administrator, item name, and quantity) belong?

A: known function dependency set: Repository name → administrator, Administrator → repository name, (repository name, item name) → quantity
Code: (Administrator, item name), (warehouse name, item name)
Main attributes: Repository name, administrator, and item name
Non-Primary attribute: Quantity
The consumer does not have some function dependencies of non-primary attributes on codes and transmits function dependencies. Ignore this link mode is 3NF.

The Link (specific data) based on this link mode may be:


Well, since this relational model already belongs to 3NF, is there a problem with this relational model? Let's take a look at the following operations:
  1. Can I assign an administrator to a new warehouse without storing any items? -- No. Because the item name is also the primary attribute, the primary attribute cannot be blank according to the object integrity requirements.
  2. After a warehouse is cleared, you need to delete the storage records of all items related to the warehouse. What problems will this cause? -- The information of the warehouse itself and the Administrator is also deleted.
  3. If a warehouse has changed its administrator, what problems will it bring? -- The number of Administrator information changes required for storing several items in this warehouse.
From this point, we can draw a conclusion that in some special cases, even if the link mode meets the 3NF requirements, there are still insertion exceptions, modification exceptions, and deletion exceptions, still not a "good" design.

Cause of this problem: some function dependencies of the main attribute on the Code and the function dependencies of passing the code exist. (In this example, some function dependencies of the main attribute [repository name] For Code [(Administrator, item name)] exist.

The solution is to eliminate the code part of the main attribute and transfer function dependencies Based on 3NF.

Repository (repository name, Administrator)
Inventory (warehouse name, item name, quantity)

In this way, the previous insertion exception and the problem of modification and deletion exceptions are solved.

The above is the explanation of BCNF.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.