.1 First normal form (1NF) No duplicate columns
The so-called First paradigm (1NF) refers to the fact that each column of a database table is an indivisible basic data item and cannot have multiple values in the same column, that is, an attribute in an entity cannot have multiple values or cannot have duplicate properties. If duplicate attributes are present, you may need to define a new entity, which is composed of duplicate attributes, and a one-to-many relationship between the new entity and the original entity. In the first normal form (1NF), each row of a table contains only one instance of information. In short, the first paradigm is a column with no duplicates.
Note: In any relational database, the first paradigm (1NF) is the basic requirement for relational schemas, and a database that does not meet the first normal form (1NF) is not a relational database.
1.2 Second normal form (2NF) property fully dependent on primary key [eliminate partial child function dependency]
The second paradigm (2NF) is established on the basis of the first paradigm (1NF), i.e. satisfying the second normal form (2NF) must first satisfy the first paradigm (1NF). The second normal form (2NF) requires that each instance or row in a database table must be divided by a unique region. For the implementation of the distinction, it is common to add a column to the table to store unique identities for each instance. For example, the Employee Information table has the employee number (emp_id) column added because each employee's employee number is unique, so each employee can be uniquely differentiated. This unique attribute column is called the primary key or primary key, and the main code.
The second normal form (2NF) requires that the attributes of an entity depend entirely on the primary key. The so-called full dependency is the inability to have a property that depends only on the primary key, and if so, this part of the property and the primary key should be separated to form a new entity, and the new entity is a one-to-many relationship with the original entity. For the implementation of the distinction, it is common to add a column to the table to store unique identities for each instance. In short, the second paradigm is that properties depend entirely on the primary key.
1.3 Third normal form (3NF) property does not depend on other non-primary properties [eliminate transitive dependencies]
Satisfying the third normal form (3NF) must first satisfy the second normal form (2NF). In short, the third paradigm (3NF) requires that a database table not contain non-primary key information already contained in other tables. For example, there is a departmental information table, where each department has a department number (dept_id), a department name, a department profile, and so on. Then the department number is listed in the Employee Information table, the department name, department profile and other departments related information can no longer be added to the Employee Information table. If there is no departmental information table, it should be built according to the third paradigm (3NF), otherwise there will be a lot of data redundancy. In short, the third paradigm is that properties do not depend on other non-principal properties.
Ii. Analysis of Paradigm application examples
The following is an example of a school's student system, which illustrates the application of these paradigms. First Paradigm (1NF): The fields in a database table are single attributes and cannot be divided. This single attribute consists of a basic type, including Integer, real, character, logical, date, and so on. In any current relational database management system (DBMS), it is impossible for a fool to make a database that does not conform to the first paradigm, because these DBMS do not allow you to divide a column of a database table into two or more columns. Therefore, it is impossible for you to design a database that does not conform to the first paradigm in your existing DBMS.
First, let's make sure that the content to be designed includes those. School number, student name, age, gender, course, course credit, department, academic results, office address, Department of telephone and other information. For the sake of simplicity we only consider these field information for the time being. There are several aspects to this information that we say are concerned about.
Students have the basic information
What did the students do when they chose those classes?
What are the credits for each lesson?
Students belong to that department, the basic information of the Department is what.
2.1 Example analysis of the second normal form (2NF)
First of all, we consider that all of this information is placed in a table (student, name, age, gender, course, course credit, department, academic results, office address, Office Phone) The following dependencies exist.
(school number) → (name, age, gender, department, office address, Office phone)
(course name) → (credits)
(School number, course) → (academic results)
Analysis of 2.1.1 Problems
Therefore, the following problems will arise if the requirements of the second paradigm are not satisfied
Data redundancy: The same course is taught by n students, and "credits" are repeated n-1 times; the same student took the M course, and the name and age were repeated m-1 times.
Update exception:
1) If the credit of a course is adjusted, the "credit" value of all the lines in the data sheet should be updated, otherwise the same course credit will be different.
2) Suppose that a new course is to be opened and no one has yet been enrolled. Thus, the course name and credits cannot be recorded in the database because there is no "learning number" keyword.
Delete exception: Assuming that a group of students have completed elective courses, these elective records should be removed from the database table. At the same time, however, the course name and credit information were also removed. Obviously, this can also lead to an insertion exception.
2.1.2 Solutions
Change the course of the elective selectcourse to the following three tables:
Student: Student (school number, name, age, gender, department, office address, Office phone);
Course: Course (course name, credits);
Elective relationship: Selectcourse (School number, course name, score).
2.2 Example analysis of the third paradigm (3NF)
Then look at the student table above student (school number, name, age, gender, department, office address, Office Phone), the keyword is a single keyword "study number", because the following decision relationship exists:
(school number) → (name, age, gender, department, office address, Office phone)
But there is a decision relationship
(school number) → (school) → (college location, college phone)
That is, there is a non-critical field "College location", "College phone" to the key field "study number" of the transfer function dependency.
It also has the condition of data redundancy, update exceptions, insert exceptions, and delete exceptions. (Data updates, delete exceptions are not analyzed here, you can refer to 2.1.1 for analysis)
According to the third paradigm, the student relationship table is divided into the following two tables to satisfy the third paradigm:
Student: (School number, name, age, gender, department);
Department: (Department, Office address, Office phone).
Summarize
The database tables above are in compliance with the I,II,III paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions
I. Concept of function dependency
The function dependence is defined from the mathematical point of view, which is used in the relationship to characterize the relationship between the various properties of the mutual constraints and interdependent situation. Functional dependency is prevalent in real life, for example, to describe a student's relationship, can have a number of students, name, department and other properties, because a school number corresponding to one and only one student, a student enrolled in a certain department, so when the value of the "student" attribute is determined, "name" and "department" The value is only determined, at this time, you can call the "name" and "department" function depends on "school number", or "learning number" function to determine "name" and "department", as follows: study number → name, study number → the department. The following is an exact definition of a function dependency.
Definition: Set U{a1,a2,...,an} is a collection of attributes, R (u) is a relationship on U, X, Y is a subset of U. For any possible relationship under R (U), there is a value for x that corresponds to the unique value of Y, which is called the Y function dependent on x, which is recorded as X→y. where x is called the determining factor. Then, if there is y→x, it is said that X and y depend on each other, as x←→y. For example, table 1. 2 in the "system" relationship: If the name value is unique, that is, the system names are not the same, then there is a function dependency set:
Department code → Department name, department code → department address, Department code → department phone, Department code → Department of Professional settings.
Department name → Department code, Department name → Department address, Department name → Department phone, Department name → Department of professional settings.
It can be seen that the department name and the code of the system depend on each other, as the name of the system code.
function dependencies can also be subdivided into a variety of function dependencies, respectively, described as follows:
Second, part of the function dependence
Set R (U) is the relationship on the attribute set U, x, Y is a subset of U, X ' is a true subset of X, if x→y and X ' →y, then the y part is said to be dependent on X, recorded as X→py. Obviously, some function dependencies can occur when and only if X is a composite attribute group.
For example, in table 1.6, there is obviously a course number → course name, course number → class code. From another point of view, as long as the course number is certain, and the course name is determined, the course class is only determined, so the program number + course name → class code. However, it differs from the preceding course number → The course code is different, because {course number, curriculum name} There is a true subset: "Course number", course number → class code, we take the course Number ten course name → The program department code is called "Lesson Room Code" part of the function depends on the course number + course name.
Third, complete function dependence
Set R (U) is the relationship on the attribute set U, x, Y is a subset of U, X ' is a true subset of X. If any possible relationship to R (U) is x→y but X ' →y, then the Y complete function is called X, which is recorded as X→fy.
The so-called complete dependency is a description of the dependency on the decision item (that is, the left-hand side of the dependency), there is no redundant attribute, there is a partial dependency of the redundant attribute.
For example, set up relationship mode R,r=r (school number, name, class number, course number, score), easy to know:
"(School number, class number, course number) → score" is a part of R's dependency relationship. Therefore there is a true subset of decisions (number, course number), so that "(school number, course number) → results" set up, and "study number → results" or "course number → results" set up, "(study number, course number) → results" is a complete dependency of R.
Iv. Transfer Function Dependency
Set R (U) is the relationship on the attribute set U, x, Y, Z is a subset of U, in R (U), if x→y, but y→x, if y→z, then x→z, called Z transfer function depends on X, recorded as X→tz.
For example, in a school, each course is taught by a certain teacher, but some teachers can teach many courses, there is a relationship "teaching" as shown in table 3. As shown in 1.
From the above relationship is not difficult to analyze, the course name → employee number, employee number → course name, but the employee number and other attributes of the functional relationship are the determinants, that is, employee number → teacher name, employee number → title, in this case, the teacher name, title transfer function depends on the course name.
Table 3. 1 Teaching Form
Course Name
Employee number
Teacher's name
Gender
Date of birth
Title
English
T1
Zhang ping
Man
55. 6. 3
Professor
Mathematical
T2
Wang Wen
Woman
62. 10. 5
Associate professor
C language
T3
John Doe
Woman
62. 10. 5
Associate professor
Database
T2
Wang Wen
Woman
62. 10. 5
Associate professor
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Closure concept
The following is a written comparative scientific specification of the closure solution method, set X and Y are a subset of the attribute set of the relationship R, F is a function dependency set on R, if any of the set of R property B, once x→b, there must be b⊆y, and any one of r satisfies the above conditions of the attribute set Y1, there must be y⊆y1, At this point, y is the attribute set X in the function dependency set F under the closure, recorded as x+.
The procedure for calculating the closure of a property set X for a relationship R is as follows:
The first step: Set the attribute set that will eventually become a closure is Y, and y is initialized to X;
Step two: Check each function in F depends on A→b, if all properties in attribute set A are in Y, and B has a property that is not in Y, add it to Y;
Step three: Repeat the second step until no attributes can be added to the property set Y. And the last Y is x+.
Example (1): With relationship mode R (u,f), where u={a,b,c,d,e,i},f={a→d,ab→e,bi→e,cd→i,e→c}, calculation (AE) +
Solution: (1) make x={ae},x (0) =ae
(2) in F to find unused left is a subset of AE function dependence, the result is: A→d, e→c; so x (1) =x (0) Dc=acde, obviously X (1) ≠x (0).
(3) In the F search for the unused left is a subset of the ACDE function dependency, the result is: cd→i; so X (2) =x (1) i=acdei. Although X (2) is ≠x (1), there is no subset of X (2) in the left side of the search for the function dependencies that have not been used in F, so it is no longer necessary to calculate (AE) +=acdei.
White: Closures are a collection of all properties that are deduced directly or indirectly by an attribute.
For example,:f={a->b,b->c,a->d,e->f}; can be directly obtained from a B and D, indirectly get C, then A's closure is {a,b,c,d}
Solution theory and algorithm of candidate code
For a given relationship R (A1,A2, ... An) and a function dependency set F, you can divide its properties into 4 categories:
The L class appears only on the left-hand property of the function.
The R class appears only on the right-hand property of the function.
The N class does not appear on the left or right side of the function dependency property.
The LR class has properties that appear on both sides of the function dependency.
Theorem: For a given relationship pattern R and its function dependency set F, if X (X∈r) is an L-class attribute, then x must be a member of either candidate of R.
Inference: For a given relationship pattern R and its function dependency set F, if X (X∈r) is an L-class attribute, and X+ contains all the properties of R, then x must be the unique candidate for R.
Example (2): With the relationship mode R (a,b,c,d), its function depends on set f={d→b,b→d,ad→b,ac→d}, all candidate codes for R are obtained.
Solution: Study F found that the A,c two attribute is an L-class attribute, so AC must be a candidate member of R, and because (AC) +=ABCD, AC is the only candidate for R.
Theorem: For a given relationship pattern R and its function dependency set F, if X (X∈r) is an R class attribute, X is not in any candidate code.
Theorem: For a given relationship pattern R and its function dependency set F, if X (X∈r) is an n class attribute, x must be included in any candidate code of R.
Inference: For a given relationship pattern R and its function dependency set F, if X (X∈r) is a set of properties consisting of L and N classes, and x+ contains all the properties of R, then X is the unique candidate for R.
Closure of the database, paradigm