Three main paradigms of Oracle (reproduced)

Source: Internet
Author: User

Normalization represents the process of removing data redundancy (redundancy) from your data store. If the database design is fully standardized, then all tables are concatenated with the keyword, and no copy of the data (repetition) occurs. The advantage of standardization is obvious, it avoids data redundancy, saves space naturally, also provides the fundamental guarantee to the consistency of data (consistency), eliminate the phenomenon of data inconsistency, and improve the efficiency.

The first paradigm (1NF; The first Normal Form)

The first paradigm is the minimum normalization requirement, and the first paradigm requires that the data table cannot have duplicate records, i.e. there is a keyword. The second requirement of 1NF is that each field is no longer divided, that is, it has been minimized, and the definition of the relational database determines that the database satisfies this article. The primary keyword achieves the following conditions:
1. The primary key field is unique in the table
2. There are no replicas in the primary key field
3. The primary key field cannot have a null value
4. Each record must have a primary key
5. The primary key is the smallest subset of keywords

A relationship pattern that satisfies 1NF has many unnecessary duplicate values and increases the likelihood of omission when modifying its data. In order to avoid this data redundancy and the omission of updating data, the second paradigm (2NF) is introduced.

Second paradigm (the Second normal Form)

Definition: If a relationship belongs to 1NF and all non-primary key fields are completely dependent on the primary key, it is called the second normal and précis-writers is 2NF.
To illustrate the problem, an example is given: there is a storehouse of storage library has four fields (part number, warehouse number, number of parts, warehouse address),
This library conforms to 1NF, where "part number" and "Warehouse number" form the primary key.
However, because the "warehouse address" is only completely dependent on the "warehouse number", which is only dependent on part of the primary keyword, it does not conform to 2NF,
This is where data redundancy first exists because the number of warehouses may not be large.
Second, there is a data inconsistency if a record is omitted if a warehouse address is changed.
Once again, if a part of a warehouse is finished, the address of the warehouse is lost, that is, the relationship does not allow the absence of parts in a warehouse.
We can use the projection decomposition method to eliminate the partial dependence of the situation, and the relationship reached the standard of 2NF.
The method is to decompose the new two-dimensional table from the relationship, and all the non-keywords in each of the two-dimensional tables are completely dependent on their primary keywords.
We can decompose as follows: Decomposition into two tables (part number, warehouse number, number of parts) and (warehouse number, warehouse address).
This will fully conform to the 2NF.

The third paradigm (the third normal Form)

Definition: If a relationship belongs to 2NF, and each non-keyword is not passed dependent on the primary key, this relationship is 3NF.
Removing the transfer dependency from the 2NF is 3NF. For example, there is a table (name, salary level, salary), where the name is the keyword,
This relationship conforms to 2NF, but because the wage level determines the wage, which is called delivery dependency, it does not conform to 3NF,
We can also use the projection decomposition method to decompose into two tables: (name, wage level),
(Salary level, salary amount).

The general situation, standardized to 3NF to meet the need, more standardized degree of bcnf,4nf,5nf, because not commonly used, not for interpretation and discussion. They are all a subset of the upper layers, The canonical approach is to 1nfà (eliminate the partial function dependency of the non-primary attribute on the keyword) à2nfà (eliminate the dependency of the non-primary attribute on the keyword's transfer function) À3nfà (eliminate the partial and transitive dependency of the primary attribute to the keyword) àbcnfà (eliminate the non-trivial and non-functional dependency of multivalued dependencies) À4nfà (eliminate the implied dependent) à5nf.

Projection decomposition
The projection decomposition method is mentioned above, and the normalization process of relational pattern is realized by projection decomposition. This decomposition of the low-level relational model into a number of high-level relational pattern decomposition is not unique, should be in the decomposition of the attention to meet three conditions:
1. Lossless connection decomposition, no loss of information after decomposition
2. Each relationship after decomposition is a high-level paradigm, not sibling or even low-level decomposition
3. The number of decomposition is minimal, which is the perfect requirement, should be done as little as possible.

The pros and cons of normalization
There must be a disadvantage to one's interest. The advantages of normalization are obvious. He avoids a lot of data redundancy, saves space, maintains data consistency,
If you fully reach 3NF, you will not change the same value in more than one place. If your records are constantly changing, this advantage is going to outweigh all the possible drawbacks!
The biggest disadvantage is that you put the information in different tables, increasing the difficulty of the operation, and the cost of connecting multiple tables together is also a huge
("Time Space Swap theory", this theory is invented by the author, do not take out as an argument!) Save time is bound to pay the cost of space, on the contrary, saving space is also bound to pay the cost of time, time and space in the computer field is a contradiction unity, they interact with each other, opposites). Because the table and table connection operation is to do two relations Cartesian product (if table one N records, table two m records, if there is no connection condition, the connection is N*m records, the number is not affordable, rather say a large number of tables connected together), will inevitably produce a lot of useless or even invalid records, The cost of performance is enormous.

Non-normalized (denormalization)
Even if you spend all your lunch breaks, make a fully normalized database (your university professors can attest), it's still not perfect. The performance problems that are caused by standardized design may not be tolerable to you. If this happens, you should be prepared for non-normalization. Non-normalization is what you do to violate the rules of normalization in order to gain performance benefits, and there is no magic in it. It is a performance benefit analysis that tries and re-tries and constantly re-evaluates the process.  It also has many methods, but most of them are related to the actual application, including copying attributes, copying foreign keywords, table merging, table re-grouping and so on, you can choose the most effective method according to the actual application. Introduction

The database design paradigm is the specification that database design needs to meet, the database that satisfies these specification is concise, the structure is clear, at the same time, does not take place insert (insert), delete and update operation exception. The reverse is a mess, not only to the database programmer to create trouble, and ugly, may have stored a large number of unnecessary redundant information.

Is the design paradigm difficult to understand? Not also, the university textbooks give us a bunch of mathematical formulas we certainly do not understand, also can't remember. So many of us are simply not following the paradigm to design the database.

In essence, the design paradigm with very image, very concise words can be said clearly, Tao understand. In this paper, we will explain how to apply these paradigms to practical engineering by using the database of a simple forum designed by the author as an example.

Paradigm Description

First normal form (1NF): The fields in the database table are all single attributes, no further {personal understanding: like a family, there are several sons, the other sons are made up of one part, but a son needs two parts, that is, this is not a normal family, hehe, said too much}. This single attribute consists of a basic type, including Integer, real, character, logical, date, and so on.

For example, the following database tables are in accordance with the first paradigm:
Field 1 Field 2 Field 3 Field 4

Such database tables do not conform to the first paradigm:
Field 1 Field 2 Field 3 Field 4
Field 3.1 Field 3.2

Obviously, in any current relational database management system (DBMS), it is impossible for a fool to make a database that does not conform to the first paradigm, because these DBMS do not allow you to divide a column of a database table into two or more columns. Therefore, it is impossible for you to design a database that does not conform to the first paradigm in your existing DBMS.

Second paradigm (2NF): A partial function dependency of a non-critical field on any of the candidate key fields is absent from the database table (some of the function dependencies refer to situations where some fields in the composite keyword determine non-critical fields), or all non-critical fields are completely dependent on any set of candidate keywords. {Personal understanding: If in a family, any decision can only be the father, the mother after the unanimous pass to be able to count, it is normal; if there is a daughter can only be determined by the mother to do what, then this violates the principle, do not meet the agreement. }


Assume that the selection relationship table is Selectcourse (school number, name, age, course name, score, credits), keyword for the combination of keywords (study number, course name), because of the following decision relationship:

(School number, course name) → (name, age, score, credits)

This database table does not meet the second normal form because of the following decision relationship:

(course name) → (credits)

(school number) → (name, age)

That is, the presence of a field in the combo key determines the non-keyword situation.

Because it does not conform to 2NF, the following questions exist for this class selection relationship:

(1) Data redundancy:

The same course by N students elective, "credit" repeated n-1 times, the same student elective m courses, name and age repeated m-1 times.

(2) Update exception:

If the credit of a course is adjusted, the "credits" value of all the rows in the data sheet should be updated, otherwise the same course credit will be different.

(3) Insert exception:

Suppose a new course is to be opened and no one has yet been enrolled. Thus, the course name and credits cannot be recorded in the database because there is no "learning number" keyword.

(4) Delete exception:

Assuming that a group of students has completed elective courses, these elective records should be removed from the database table. At the same time, however, the course name and credit information were also removed. Obviously, this can also lead to an insertion exception.

Change the course of the elective selectcourse to the following three tables:

Student: Student (school number, name, age);

Course: Course (course name, credits); {Personal Understanding: You can add the ID field as the primary key, because if the course name changes later, if the database runs for 10 years and has 10 million elective records, then you have to update the 10 million records, which is a matter of cost resources. If you have an ID, no matter how your name changes, it will only affect one current record}

Selectcourse (School number, course name, grade). {The corresponding change is as follows: Selectcourse (school number, course ID, score)}

Such database tables conform to the second paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions.

In addition, all single-key database tables conform to the second normal form, as there is no possible combination of keywords.

The third paradigm (3NF): On the basis of the second paradigm, if there is no non-critical field in the data table the transfer function dependency on either of the candidate key fields conforms to the third paradigm. The so-called transfer function dependency, refers to if there is a "a→b→c" decision relationship, the C transfer function depends on A. Therefore, a database table that satisfies the third paradigm should not have the following dependencies:

key field → Non-critical field x→ non-critical field Y

Assume that the Student relationship table is student (school number, name, age, school, college location, college phone), the keyword is a single keyword "study number" because of the following decision relationship:

(school number) → (name, age, school, college location, college phone)

This database is 2NF compliant, but does not conform to 3NF because of the following decision relationship:

(school number) → (school) → (college location, college phone)

That is, there is a non-critical field "College location", "College phone" to the key field "study number" of the transfer function dependency.

It also has data redundancy, update exceptions, insert exceptions, and delete exceptions, which readers can analyze on their own.

The Student relations table is divided into the following two tables:

Student: (School number, name, age, school);

College: (College, location, telephone).

Such database tables conform to the third paradigm, eliminating data redundancy, update exceptions, insert exceptions, and delete exceptions.

Boyce-Christie's Paradigm (BCNF): On the basis of the third paradigm, if no field exists in the database table, the transfer function dependency on either of the candidate key fields conforms to the third paradigm. Suppose the Warehouse Management Relationship table is storehousemanage (warehouse ID, store item ID, Administrator ID, number), and an administrator works only in one warehouse, and a warehouse can store multiple items. The following decision relationships exist in this database table:

(Warehouse ID, store item id) → (Administrator id, quantity)

(Admin ID, store item id) → (warehouse ID, quantity)

So, (warehouse ID, store item ID) and (Administrator ID, store item ID) are the candidate keywords for storehousemanage, the only non-critical field in the table is the number, which is in accordance with the third paradigm. However, the following decision relationship exists:

(warehouse id) → (Administrator id)

(Administrator id) → (warehouse id)

That is, the key field determines the critical field, so it does not conform to the BCNF paradigm. It will have the following exception:

(1) Delete exception:

When the repository is emptied, all "store item ID" and "quantity" information is deleted, and the "Warehouse ID" and "Administrator ID" information are also deleted.

(2) Insert exception:

You cannot assign an administrator to a warehouse when no items are stored in the warehouse.

(3) Update exception:

If the warehouse has been replaced by an administrator, the administrator ID for all rows in the table is modified.

Break down the Warehouse management relationship table into two relational tables:

Warehouse Management: Storehousemanage (warehouse ID, administrator ID);

Warehouse: Storehouse (warehouse ID, store item ID, quantity).

Such database tables conform to the BCNF paradigm, eliminating deletion exceptions, insert exceptions, and update exceptions. Paradigm Application

Let's step through the database of a forum with the following information:

(1) User: User name, email, homepage, telephone, contact address

(2) Posts: Post title, post content, reply title, reply content

For the first time, we designed the database as a mere existence table:
User name Email Home Phone Contact address Post title Post Content Reply title Reply content

This database table conforms to the first paradigm, but no set of candidate keywords can determine the entire row of the database table, and the Unique key field user name does not fully determine the entire tuple. We need to add the "Post ID", "Reply id" field, and the table will be modified to:
User name Email Home Phone Contact address Post ID Post title Post Content Reply ID Reply title Reply content

The key words in the data table (user name, post ID, reply ID) can determine the entire line:

(username, post ID, reply id) → (email, homepage, phone, contact address, post title, post content, reply title, reply content)

However, such a design does not conform to the second paradigm, as the following determinants exist:

(username) → (email, homepage, phone, contact address)

(post id) → (post title, post content)

(reply id) → (reply to title, reply to content)

That is, the non-critical fields part of the function relies on candidate key fields, and it is clear that this design results in a large amount of data redundancy and operational anomalies.


We decompose the database tables into (underlined keywords):

(1) User information: User name, email, home, phone, contact address

(2) Post information: Post ID, title, content

(3) Reply message: Reply ID, title, content

(4) Posts: User name, Post ID

(5) Reply: Post ID, reply ID

Such a design is to meet the 1th, 2, 3 paradigm and bcnf paradigm requirements, but this design is not the best?

Not necessarily.

It is observed that the 4th "post" in the "User name" and "Posting ID" is a 1:n relationship, so we can put "post" to the 2nd item "Post information", "Reply" in the 5th "Post id" and "Reply ID" is also a 1:n relationship, so we can put "reply" Merge to the "reply message" in item 3rd. This allows for a quantitative reduction in data redundancy, and the new design is:

(1) User information: User name, email, home, phone, contact address

(2) Post information: User name, post ID, title, content

(3) Reply message: Post ID, reply ID, title, content

database table 1 Clearly satisfies all the paradigm requirements;

In the database table 2, there is a non-critical field "title", "Content" to the key field "Post ID" part of the function dependency, that is, does not meet the requirements of the second paradigm, but this design does not result in data redundancy and operational anomalies;

The database table 3 also has a non-critical field "title", "Content" for the key field "Reply ID" part of the function dependency, also does not meet the requirements of the second paradigm, but similar to database table 2, this design does not result in data redundancy and operation exception.

Thus can be seen, do not have to force to meet the requirements of the paradigm, for 1:n relationship, when the 1 side of the merger to the other side of N, n over there will no longer meet the second paradigm, but this design is better!

In the case of m:n, it is not possible to merge m side or N side to the other side, which will result in non-conforming paradigm and result in operation exception and data redundancy.
For a 1:1 relationship, we can merge the Left 1 or 1 on the right side to the other side, and the design leads to a non-conforming paradigm, but does not result in operational anomalies and data redundancy.

Conclusion

The database design that satisfies the paradigm requirement is structurally clear and avoids data redundancy and operation Anomaly. This also means that designs that do not conform to the paradigm requirements must be wrong, and in the case of a 1:1 or 1:n relationship in a database table, the non-conforming paradigm required by the merger is reasonable.

When we design the database, we must always consider the requirements of the paradigm. Original address: http://www.cnblogs.com/elleniou/archive/2012/08/09/2630433.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.