Introduction
The database design paradigm is the specification that database design needs to meet, the database that satisfies these specification is concise, the structure is clear, at the same time, does not take place insert (insert), delete and update operation exception. The reverse is a mess, not only to the database programmer to create trouble, and ugly, may have stored a large number of unnecessary redundant information.
Is the design paradigm difficult to understand? Not also, the university textbooks give us a bunch of mathematical formulas we certainly do not understand, also can't remember. So many of us are simply not following the paradigm to design the database.
In essence, the design paradigm with very image, very concise words can be said clearly, Tao understand. In this paper, we will explain how to apply these paradigms to practical engineering by using the database of a simple forum designed by the author as an example.
Paradigm Description
first Normal (1NF): The fields in a database table are single attributes and cannot be divided. This single attribute consists of a basic type, including Integer, real, character, logical, date, and so on.
For example, the following database tables are in accordance with the first paradigm:
Field 1 |
Field 2 |
Field 3 |
Field 4 |
|
|
|
|
Such database tables do not conform to the first paradigm:
Field 1 |
Field 2 |
Field 3 |
Field 4 |
|
|
Field 3.1 |
Field 3.2 |
|
Obviously, in any current relational database management system (DBMS), it is impossible for a fool to make a database that does not conform to the first paradigm, because these DBMS do not allow you to divide a column of a database table into two or more columns. Therefore, it is impossible for you to design a database that does not conform to the first paradigm in your existing DBMS.
second paradigm (2NF): A partial function dependency of a non-critical field on any of the candidate key fields is not present in the database table (a partial function dependency refers to a situation in which some fields in a composite keyword determine a non-critical field), or all non-critical fields are completely dependent on any set of candidate keywords.
Assume that the selection relationship table is Selectcourse (school number, name, age, course name, score, credits), keyword for the combination of keywords (study number, course name), because of the following decision relationship:
(School number, course name) → (name, age, score, credits)
This database table does not meet the second normal form because of the following decision relationship:
(course name) → (credits)
(school number) → (name, age)
That is, the presence of a field in the combo key determines the non-keyword situation.
Because it does not conform to 2NF, the following questions exist for this class selection relationship:
(1) Data redundancy:
The same course by N students elective, "credit" repeated n-1 times, the same student elective m courses, name and age repeated m-1 times.
(2) Update exception:
If the credit of a course is adjusted, the "credits" value of all the rows in the data sheet should be updated, otherwise the same course credit will be different.
(3) Insert exception:
Suppose a new course is to be opened and no one has yet been enrolled. Thus, the course name and credits cannot be recorded in the database because there is no "learning number" keyword.
(4) Delete exception:
Assuming that a group of students has completed elective courses, these elective records should be removed from the database table. At the same time, however, the course name and credit information were also removed. Obviously, this can also lead to an insertion exception.
Change the course of the elective selectcourse to the following three tables:
Student: Student (school number, name, age);
Course: Course (course name, credits);
Elective relationship: Selectcourse (School number, course name, score).
Such database tables conform to the second paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions.
In addition, all single-key database tables conform to the second normal form, as there is no possible combination of keywords.
Third paradigm (3NF): On the basis of the second paradigm, if there is no non-critical field in the data table the transfer function dependency on either of the candidate key fields conforms to the third paradigm. The so-called transfer function dependency, refers to if there is a "a→b→c" decision relationship, the C transfer function depends on A. Therefore, a database table that satisfies the third paradigm should not have the following dependencies:
key field → Non-critical field x→ non-critical field Y
Assume that the Student relationship table is student (school number, name, age, school, college location, college phone), the keyword is a single keyword "study number" because of the following decision relationship:
(school number) → (name, age, school, college location, college phone)
This database is 2NF compliant, but does not conform to 3NF because of the following decision relationship:
(school number) → (school) → (college location, college phone)
That is, there is a non-critical field "College location", "College phone" to the key field "study number" of the transfer function dependency.
It also has data redundancy, update exceptions, insert exceptions, and delete exceptions, which readers can analyze on their own.
The Student relations table is divided into the following two tables:
Student: (School number, name, age, school);
College: (College, location, telephone).
Such database tables conform to the third paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions.
Boyce-Christie's Paradigm (BCNF): On the basis of the third paradigm, if no field exists in the database table, the transfer function dependency on either of the candidate key fields conforms to the third paradigm.
Suppose the Warehouse Management Relationship table is storehousemanage (warehouse ID, store item ID, Administrator ID, number), and an administrator works only in one warehouse, and a warehouse can store multiple items. The following decision relationships exist in this database table:
(Warehouse ID, store item id) → (Administrator id, quantity)
(Admin ID, store item id) → (warehouse ID, quantity)
So, (warehouse ID, store item ID) and (Administrator ID, store item ID) are the candidate keywords for storehousemanage, the only non-critical field in the table is the number, which is in accordance with the third paradigm. However, the following decision relationship exists:
(warehouse id) → (Administrator id)
(Administrator id) → (warehouse id)
That is, the key field determines the critical field, so it does not conform to the BCNF paradigm. It will have the following exception:
(1) Delete exception:
When the repository is emptied, all "store item ID" and "quantity" information is deleted, and the "Warehouse ID" and "Administrator ID" information are also deleted.
(2) Insert exception:
You cannot assign an administrator to a warehouse when no items are stored in the warehouse.
(3) Update exception:
If the warehouse has been replaced by an administrator, the administrator ID for all rows in the table is modified.
Break down the Warehouse management relationship table into two relational tables:
Warehouse Management: Storehousemanage (warehouse ID, administrator ID);
Warehouse: Storehouse (warehouse ID, store item ID, quantity).
Such database tables conform to the BCNF paradigm, eliminating deletion exceptions, insert exceptions, and update exceptions.
Paradigm Application
Let's step through the database of a forum with the following information:
(1) User: User name, email, homepage, telephone, contact address
(2) Posts: Post title, post content, reply title, reply content
For the first time, we designed the database as a mere existence table:
User name |
Email |
Home |
Phone |
Contact address |
Post title |
Post Content |
Reply title |
Reply content |
This database table conforms to the first paradigm, but no set of candidate keywords can determine the entire row of the database table, and the Unique key field user name does not fully determine the entire tuple. We need to add the "Post ID", "Reply id" field, and the table will be modified to:
User name |
Email |
Home |
Phone |
Contact address |
Post ID |
Post title |
Post Content |
Reply ID |
Reply title |
Reply content |
The key words in the data table (user name, post ID, reply ID) can determine the entire line:
(username, post ID, reply id) → (email, homepage, phone, contact address, post title, post content, reply title, reply content)
However, such a design does not conform to the second paradigm, as the following determinants exist:
(username) → (email, homepage, phone, contact address)
(post id) → (post title, post content)
(reply id) → (reply to title, reply to content)
That is, the non-critical fields part of the function relies on candidate key fields, and it is clear that this design results in a large amount of data redundancy and operational anomalies.
We decompose the database tables into (underlined keywords):
(1) User information: User name, email, home, phone, contact address
(2) Post information: Post ID, title, content
(3) Reply message: Reply ID, title, content
(4) Posts: User name, Post ID
(5) Reply: Post ID, reply ID
Such a design is to meet the 1th, 2, 3 paradigm and bcnf paradigm requirements, but this design is not the best?
Not necessarily.
It is observed that the 4th "post" in the "User name" and "Posting ID" is a 1:n relationship, so we can put "post" to the 2nd item "Post information", "Reply" in the 5th "Post id" and "Reply ID" is also a 1:n relationship, so we can put "reply" Merge to the "reply message" in item 3rd. This allows for a quantitative reduction in data redundancy, and the new design is:
(1) User information: User name, email, home, phone, contact address
(2) Post information: User name, post ID, title, content
(3) Reply message: Post ID, reply ID, title, content
database table 1 Clearly satisfies all the paradigm requirements;
In the database table 2, there is a non-critical field "title", "Content" to the key field "Post ID" part of the function dependency, that is, does not meet the requirements of the second paradigm, but this design does not result in data redundancy and operational anomalies;
The database table 3 also has a non-critical field "title", "Content" for the key field "Reply ID" part of the function dependency, also does not meet the requirements of the second paradigm, but similar to database table 2, this design does not result in data redundancy and operation exception.
Thus can be seen, do not have to force to meet the requirements of the paradigm, for 1:n relationship, when the 1 side of the merger to the other side of N, n over there will no longer meet the second paradigm, but this design is better!
In the case of m:n, it is not possible to merge m side or N side to the other side, which will result in non-conforming paradigm and result in operation exception and data redundancy.
For a 1:1 relationship, we can merge the Left 1 or 1 on the right side to the other side, and the design leads to a non-conforming paradigm, but does not result in operational anomalies and data redundancy.
Conclusion
The database design that satisfies the paradigm requirement is structurally clear and avoids data redundancy and operation Anomaly. This also means that designs that do not conform to the paradigm requirements must be wrong, and in the case of a 1:1 or 1:n relationship in a database table, the non-conforming paradigm required by the merger is reasonable.
When we design the database, we must always consider the requirements of the paradigm.
Summarize:
1: The relationship between the base table and its fields should satisfy the third paradigm as much as possible. However, the design of the database that satisfies the third paradigm is often not the best design. In order to improve the efficiency of database operation, it is often necessary to reduce the standard of normalization: to increase redundancy appropriately and to achieve the purpose of space-changing time.
Example: There is a basic table for storing goods, as shown in table 1. The existence of the "Amount" field indicates that the design of the table does not satisfy the third paradigm,
Because "amount" can be obtained by multiplying the "unit price" by "quantity", the amount is a redundant field. However, increase the "amount" of this redundant field,
Can improve the speed of query statistics, this is the practice of space-time.
In Rose 2002, it is stipulated that there are two types of columns: data columns and computed columns. Columns such as "Amount" are referred to as "computed columns", while "unit price" and
A column such as "Quantity" is called a "data column."
Table 1 table structure of the commodity table
Product name commodity model Unit Price quantity amount
TV 29 inch 2,500 100,000
2: Database design without redundancy can be done. However, a database without redundancy is not necessarily the best database, and sometimes in order to improve operational efficiency, it is necessary to lower the paradigm standard and properly retain redundant data. The practice is to adhere to the third paradigm when designing the conceptual data model, and to lower the standard of normalization into the design of the physical data model. Lowering the paradigm is adding fields, allowing redundancy.
3:E--R Chart has no standard answer, because its design and drawing is not unique, as long as it covers the system requirements of the business scope and functional content, is feasible. Conversely, to modify the E--r diagram. Although it does not have the only standard answer, it does not mean that it can be arbitrarily designed. The standard of good E-r chart is: The structure is clear, the association is concise, the number of entities is moderate, the attribute allocation is reasonable, there is no low level redundancy.
< reference:http://www.iteye.com/topic/281611>
Analysis of application examples of three paradigms of database design