"Go" database paradigm 1NF 2NF 3NF BCNF (example) easy-to-understand explanation
This article on the majority of beginners database principles of the students is definitely a big benefit, haha, after the full reading of this blog post must be able to clearly understand the four main paradigms of the database. Do not understand the message to discuss with each other.
The design paradigm (paradigm, Database design paradigm, database design paradigm) is a collection of relational patterns that conform to a certain level. The construction database must follow certain rules. In a relational database, this rule is the paradigm. Relationships in relational databases must meet certain requirements, that is, to meet different paradigms. There are currently six paradigms for relational databases: the first paradigm (1NF), the second Normal (2NF), the third Normal (3NF), the fourth Normal (4NF), the V-Normal (5NF), and the sixth (6NF). The paradigm that satisfies the minimum requirements is the first paradigm (1NF). On the basis of the first paradigm further satisfies more requirements called the second normal form (2NF), and the rest of the paradigms are in the second analogy. In general, the database only needs to meet the third normal form (3NF) on the line. Here we give an example of the first paradigm (1NF), the second normal form (2NF), and the third paradigm (3NF).
In the process of creating a database, standardizing is the process of translating it into tables that can make the results from the database clearer. This may result in duplicate data being generated by the database, resulting in the creation of redundant tables. Standardizing is a refinement of the data elements, relationships in the database, and defining the required tables and items in each table after the initial work.
Here is an example of a paradigm Customer Item purchased Purchase price Thomas Shirt $ tennis Shoes $ Evelyn Shirt $ Pajaro Trousers $
If the above table is for storing the price of an item and you want to delete one of the customers, you must delete the price at the same time. Standardizing is to solve this problem, you can put this into two tables, one to store each customer and his purchases of information, the other is to store each product and its price information, so that one of the tables do add or delete operations will not affect the other table.
Introduction to several design paradigms of relational database
1 First paradigm (1NF)
In any relational database, the first paradigm (1NF) is the basic requirement for relational schemas, and a database that does not meet the first normal form (1NF) is not a relational database.
The so-called First paradigm (1NF) refers to the fact that each column of a database table is an indivisible basic data item and cannot have multiple values in the same column, that is, an attribute in an entity cannot have multiple values or cannot have duplicate properties. If duplicate attributes are present, you may need to define a new entity, which is composed of duplicate attributes, and a one-to-many relationship between the new entity and the original entity. In the first normal form (1NF), each row of a table contains only one instance of information. For example, for the Employee Information table in Figure 3-2, the employee information cannot be displayed in one column, and two or more of these columns cannot be displayed in one column; Each row in the Employee Information table represents only one employee's information, and the information for an employee appears only once in the table. In short, the first paradigm is a column with no duplicates.
2 second paradigm (2NF)
The second paradigm (2NF) is established on the basis of the first paradigm (1NF), i.e. satisfying the second normal form (2NF) must first satisfy the first paradigm (1NF). The second normal form (2NF) requires that each instance or row in a database table must be divided by a unique region. It is often necessary to add a column to the table to store the unique identity of each instance. 3-2 The employee information sheet is added to the employee number (emp_id) column because each employee's employee number is unique, so each employee can be uniquely differentiated. This unique attribute column is called the primary key or primary key, and the main code.
The second normal form (2NF) requires that the attributes of an entity depend entirely on the primary key. The so-called full dependency is the inability to have a property that depends only on the primary key, and if so, this part of the property and the primary key should be separated to form a new entity, and the new entity is a one-to-many relationship with the original entity. It is often necessary to add a column to the table to store the unique identity of each instance. In short, the second paradigm is that a non-principal attribute is dependent on the primary key.
3 Third normal form (3NF)
Satisfying the third normal form (3NF) must first satisfy the second normal form (2NF). In short, the third paradigm (3NF) requires that a database table not contain non-primary key information already contained in other tables. For example, there is a departmental information table, where each department has a department number (dept_id), a department name, a department profile, and so on. Then in the Employee Information table in Figure 3-2, the department number can no longer be the department name, department profile and other departments related information to join the Employee Information table. If there is no departmental information table, it should be built according to the third paradigm (3NF), otherwise there will be a lot of data redundancy. In short, the third paradigm is that properties do not depend on other non-principal properties.
Analysis of the application of three paradigms of database design
database is the specification that database design needs to meet, the database that satisfies these specification is concise, the structure is clear, at the same time, does not occur insert (insert), Remove (delete) and update operation exceptions. The reverse is a mess, not only to the database programmer to create trouble, and ugly, may have stored a large number of unnecessary redundant information.
design paradigm isn't that hard to understand? Not also, the university textbooks give us a bunch of mathematical formulas we certainly do not understand, also can't remember. So many of us are simply not following the paradigm to design the database.
In essence, the design paradigm can be made clear with a very vivid and concise discourse, and the Tao understands it. In this paper, we will explain how to apply these paradigms to practical engineering by using the database of a simple forum designed by the author as an example.
Paradigm Description
First Normal (1NF): The fields in a database table are single attributes and cannot be divided. This single attribute consists of a basic type, including Integer, real, character, logical, date, and so on.
For example, the following database tables are in accordance with the first paradigm:
Field 1 Field 2 Field 3 Field 4
Such database tables do not conform to the first paradigm:
Field 1 Field 2 Field 3 Field 4
Field 3.1 Field 3.2
Obviously, in any current relational database management system (DBMS), it is impossible for a fool to make a database that does not conform to the first paradigm, because these DBMS do not allow you to divide a column of a database table into two or more columns. Therefore, it is impossible for you to design a database that does not conform to the first paradigm in your existing DBMS.
Second paradigm (2NF): A partial function dependency of a non-critical field on any of the candidate key fields is absent from the database table (some of the function dependencies refer to situations where some fields in the composite keyword determine non-critical fields), or all non-critical fields are completely dependent on any set of candidate keywords.
Assume that the selection relationship table is Selectcourse (school number, name, age, course name, score, credits), keyword for the combination of keywords (study number, course name), because of the following decision relationship:
(School number, course name) → (name, age, score, credits)
This database table does not meet the second normal form because of the following decision relationship:
(course name) → (credits)
(school number) → (name, age)
That is, the presence of a field in the combo key determines the non-keyword situation.
Because it does not conform to 2NF, the following questions exist for this class selection relationship:
(1) Data redundancy:
The same course by N students elective, "credit" repeated n-1 times, the same student elective m courses, name and age repeated m-1 times.
(2) Update exception:
If the credit of a course is adjusted, the "credits" value of all the rows in the data sheet should be updated, otherwise the same course credit will be different.
(3) Insert exception:
Suppose a new course is to be opened and no one has yet been enrolled. Thus, the course name and credits cannot be recorded in the database because there is no "learning number" keyword.
(4) Delete exception:
Assuming that a group of students has completed elective courses, these elective records should be removed from the database table. At the same time, however, the course name and credit information were also removed. Obviously, this can also lead to an insertion exception.
Change the course of the elective selectcourse to the following three tables:
Student: Student (school number, name, age);
Course: Course (course name, credits);
Elective relationship: Selectcourse (School number, course name, score).
Such database tables conform to the second paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions.
In addition, all single-key database tables conform to the second normal form, as there is no possible combination of keywords.
The third paradigm (3NF): On the basis of the second paradigm, if there is no non-critical field in the data table the transfer function dependency on either of the candidate key fields conforms to the third paradigm. The so-called transfer function dependency, refers to if there is a "a→b→c" decision relationship, the C transfer function depends on A. Therefore, a database table that satisfies the third paradigm should not have the following dependencies:
key field → Non-critical field x→ non-critical field Y
Assume that the Student relationship table is student (school number, name, age, school, college location, college phone), the keyword is a single keyword "study number" because of the following decision relationship:
(school number) → (name, age, school, college location, college phone)
This database is 2NF compliant, but does not conform to 3NF because of the following decision relationship:
(school number) → (school) → (college location, college phone)
That is, there is a non-critical field "College location", "College phone" to the key field "study number" of the transfer function dependency.
It also has data redundancy, update exceptions, insert exceptions, and delete exceptions, which readers can analyze on their own.
The Student relations table is divided into the following two tables:
Student: (School number, name, age, school);
College: (College, location, telephone).
Such database tables conform to the third paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions.
Boyce-Christie's Paradigm (BCNF): On the basis of the third paradigm, if no field exists in the database table, the transfer function dependency on either of the candidate key fields conforms to the third paradigm.
Suppose the Warehouse Management Relationship table is storehousemanage (warehouse ID, store item ID, Administrator ID, number), and an administrator works only in one warehouse, and a warehouse can store multiple items. The following decision relationships exist in this database table:
(Warehouse ID, store item id) → (Administrator id, quantity)
(Admin ID, store item id) → (warehouse ID, quantity)
So, (warehouse ID, store item ID) and (Administrator ID, store item ID) are the candidate keywords for storehousemanage, the only non-critical field in the table is the number, which is in accordance with the third paradigm. However, the following decision relationship exists:
(warehouse id) → (Administrator id)
(Administrator id) → (warehouse id)
That is, the key field determines the critical field, so it does not conform to the BCNF paradigm. It will have the following exception:
(1) Delete exception:
When the repository is emptied, all "store item ID" and "quantity" information is deleted, and the "Warehouse ID" and "Administrator ID" information are also deleted.
(2) Insert exception:
You cannot assign an administrator to a warehouse when no items are stored in the warehouse.
(3) Update exception:
If the warehouse has been replaced by an administrator, the administrator ID for all rows in the table is modified.
Break down the Warehouse management relationship table into two relational tables:
Warehouse Management: Storehousemanage (warehouse ID, administrator ID);
Warehouse: Storehouse (warehouse ID, store item ID, quantity).
Such database tables conform to the BCNF paradigm, eliminating deletion exceptions, insert exceptions, and update exceptions.
Paradigm Application
Let's step through the database of a forum with the following information:
(1) User: User name, email, homepage, telephone, contact address
(2) Posts: Post title, post content, reply title, reply content
For the first time, we designed the database as a mere existence table:
User name Email home phone Contact Address post title post content reply Title reply content
This database table conforms to the first paradigm, but no set of candidate keywords can determine the entire row of the database table, and the Unique key field user name does not fully determine the entire tuple. We need to add the "Post ID", "Reply id" field, and the table will be modified to:
User name Email home phone Contact address post ID post title post content Reply ID reply title reply content
The key words in the data table (user name, post ID, reply ID) can determine the entire line:
(username, post ID, reply id) → (email, homepage, phone, contact address, post title, post content, reply title, reply content)
However, such a design does not conform to the second paradigm, as the following determinants exist:
(username) → (email, homepage, phone, contact address)
(post id) → (post title, post content)
(reply id) → (reply to title, reply to content)
That is, the non-critical fields part of the function relies on candidate key fields, and it is clear that this design results in a large amount of data redundancy and operational anomalies.
We decompose the database tables into (underlined keywords):
(1) User information: User name, email, home, phone, contact address
(2) Post information: Post ID, title, content
(3) Reply message: Reply ID, title, content
(4) Posts: User name, Post ID
(5) Reply: Post ID, reply ID
Such a design is to meet the 1th, 2, 3 paradigm and bcnf paradigm requirements, but this design is not the best?
Not necessarily.
It is observed that the 4th "post" in the "User name" and "Posting ID" is a 1:n relationship, so we can put "post" to the 2nd item "Post information", "Reply" in the 5th "Post id" and "Reply ID" is also a 1:n relationship, so we can put "reply" Merge to the "reply message" in item 3rd. This allows for a quantitative reduction in data redundancy, and the new design is:
(1) User information: User name, email, home, phone, contact address
(2) Post information: User name, post ID, title, content
(3) Reply message: Post ID, reply ID, title, content
database table 1 Clearly satisfies all the paradigm requirements;
In the database table 2, there are non-keyword "title", "Content" to the key field "Post ID" part of the function dependency, that is, the requirements of the second paradigm is not satisfied, but this design does not result in data redundancy and operation exception;
The database table 3 also has a non-critical field "title", "Content" for the key field "Reply ID" part of the function dependency, also does not meet the requirements of the second paradigm, but similar to database table 2, this design does not result in data redundancy and operation exception.
Thus can be seen, do not have to force to meet the requirements of the paradigm, for 1:n relationship, when the 1 side of the merger to the other side of N, n over there will no longer meet the second paradigm, but this design is better!
In the case of m:n, it is not possible to merge m side or N side to the other side, which will result in non-conforming paradigm and result in operation exception and data redundancy.
For a 1:1 relationship, we can merge the Left 1 or 1 on the right side to the other side, and the design leads to a non-conforming paradigm, but does not result in operational anomalies and data redundancy.
Conclusion
The database design that satisfies the paradigm requirement is structurally clear and avoids data redundancy and operation Anomaly. This does not mean that a design that does not conform to the paradigm requirements must be wrong, and that in the case of a 1:1 or 1:n relationship in a database table, the non-conforming paradigm requirements of the merger are reasonable.
When we design the database, we must always consider the requirements of the paradigm.
Database Paradigm 1NF 2NF 3NF BCNF (example) easy-to-understand explanation