The most critical issue when designing and operating a database is to ensure that the data is correctly distributed to the tables in the database. Using the correct data structure not only facilitates the corresponding access to the database, but also greatly simplifies other content in the application (queries, forms, reports, code, etc.) and designs the table according to "Database Normalization", which is designed to reduce data redundancy in the database to increase data consistency.
Generalization is a fine-grained process after the initial work of identifying a data element in a database, a relationship, and defining the required tables and items in each table. Common paradigms are 1NF, 2NF, 3NF, BCNF, and 4NF. The following is a brief analysis of these common paradigms.
1, 1NF (first paradigm)
The first paradigm is that each column in a database table is an indivisible base data item and cannot have multiple values in the same column, that is, an attribute in an entity cannot have multiple values or cannot have duplicate properties.
If duplicate attributes are present, you may need to define a new entity, which is composed of duplicate attributes, and a one-to-many relationship between the new entity and the original entity. The pattern of the first paradigm requires that attribute values be no longer split into smaller parts, that is, attribute items cannot be attribute combinations or consist of a set of properties.
In short, the first paradigm is a column with no duplicates. For example, a table of "employee Number" "Name" "Phone number" (a person may have an office phone and a mobile phone), then normalized to 1NF can be divided into "office phone" and "mobile phone" two attributes, namely staff (employee number, name, office phone, mobile phone).
2, 2NF (second paradigm)
The second paradigm (2NF) is established on the basis of the first paradigm (1NF), i.e. satisfying the second normal form (2NF) must first satisfy the first paradigm (1NF). The second normal form (2NF) requires that each instance or row in a database table must be divided by a unique region. For the implementation of the distinction, it is common to add a column to the table to store unique identities for each instance.
If the relational model R is the first paradigm, and every non-primary property in R relies on a candidate key for R, then R is the second normal pattern (if a is a property of the candidate key for the relationship pattern R, then a is the main property of R, otherwise a is a non-primary attribute of R).
For example, in the Selection relationship table (school number, course number, score, credit), the keyword is a combination of keywords (study number, course number), but because the non-primary attribute credits depend only on the course number, the keyword (school number, course number) is only a partial dependency, rather than full reliance, so this way can lead to data redundancy and update The solution is to divide it into two relational models: Student tables (school numbers, course numbers, scores) and course tables (course numbers, credits), and new relationships are linked through the outside keyword course number in the student table and connected when needed.
3, 3NF (third paradigm)
If the relational model R is the second paradigm, and each non-primary attribute does not pass a candidate key that relies on R, then R is the pattern of the third paradigm.
Take the student form (school number, name, course number, score) as an example, where the name of the student is not the same, so the table has two candidate code (school number, course number) and (name, course number), so there is a functional dependency: study number--name, (student number, course number)-- The only non-primary attribute score does not have a partial dependency on the code, and there is no transitive dependency, so the attribute belongs to the third paradigm.
4. BCNF (BC Paradigm)
It is constructed on the basis of the third paradigm, and if the relational model R is the first paradigm, and each attribute does not pass a candidate key that relies on R, then the R is called the BCNF pattern.
Suppose the Warehouse Management Relationship table (warehouse number, store item number, administrator number, number), satisfies an administrator to work only in one warehouse; a warehouse can store multiple items, the following relationship exists:
(Warehouse number, storage item number)--(Administrator number, number)
(Administrator number, storage item number)--(warehouse number, quantity)
Therefore, (warehouse number, storage item number) and (Administrator number, storage item number) is the candidate code of the Warehouse Management relation table, the only non-critical field in the table is the quantity, it conforms to the third normal form. However, the following decision relationship exists:
(warehouse number)--(administrator number)
(Administrator number)--(warehouse number)
That is, the key field determines the critical field, so it does not conform to BCNF. The Warehouse Management Relationship table is decomposed into two relational tables warehouse management table (warehouse number, administrator number) and warehouse table (warehouse number, store item number, quantity), so that the database table is BCNF, and eliminates delete exception, insert exception and update exception.
5, 4NF (fourth paradigm)
Set R is a relational model and D is a multivalued dependent collection on R. If there is a multi-valued dependent x->y in D, X must be a super-key of R, then R is the mode of the fourth paradigm.
For example, the staff table (employee number, name of employee, employee elective course), in this table, the same employee may have more than one worker child's name, similarly, the same employee may have a number of staff elective courses, that is, there is a multi-valued fact, not in line with the fourth paradigm. If you want to conform to the fourth paradigm, you only need to divide the above table into two tables, so that they only have a multi-valued fact, such as staff table one (employee number, employee name), Staff Table II (employee number, staff elective course), two tables have only one multivalued fact, so in line with the fourth paradigm.
Expansion: The diagram for each paradigm is as follows: