[Summary] Relational database normalization

Source: Internet
Author: User

Absrtact: The core of relational database design theory is the function dependence between data, and one of the measures is the degree of relationship normalization. Redundancy and exceptions to databases are often the function dependencies that come from between attributes.

First, the Relationship model definition

The relational pattern of relational data is a five-tuple:
R (U,D,DOM,F)
r--relationship name;
A collection of u--property names, that is, attribute groups;
The domain (a collection of values of the same type) from which the attributes are d--u.
The mapping of the dom--attribute (U) to the domain (D);
f--a set of data dependencies on the attribute group U.
Because D and Dom have little relation to the pattern design, this paper makes the relational modeling a ternary group, namely: R (U, F).

Second, the function dependence related definition

Set R (U) is the relationship pattern on the attribute set U, and X, Y is a subset of U.
1. function Dependencies :
If any of the possible relationships on R (U) are not present in the r,r of the two tuples on the X subset and the properties on the Y subset are unequal, that is, the Y function depends on X, which is recorded as X→y. (In the same way: Y=f (X), F is a single-valued function).
2. non-trivial function dependencies :
What if X→y and y? X, the x→y is called a non-trivial function dependency.
Trivial function dependency
If X→y and Y⊆x, the x→y is called trivial function dependency.
3. Full function Dependencies
If x→y and for any true subset X ' of X ', there is X ' cannot determine y, then Y is completely dependent on X, which is recorded as X (→┴f) Y. (Example: (x ', X ') →y, for X ' or X ' cannot determine Y, i.e. (x ', X ')-(→┴f) y)
4. Partially dependent
If X→y and Y are not fully dependent on X, it is recorded as X (→┴p) y.
5. transitive dependencies
What if X→y,y? X and Y→z, Z is dependent on the X pass.

Iii. Paradigm

(i) the first paradigm (1NF)
If each component in the relationship mode R is an irreducible data item, then the relational pattern R belongs to the first paradigm. (Understanding: Atomic operations on attributes that require attributes to be atomic and non-biodegradable)
For a two-dimensional table, the minimum requirement is to satisfy 1NF, namely: Each component must be an irreducible data item. This is the most basic paradigm, and if only 1NF is satisfied, then the relationship pattern may have the following problem:

Example 1: Memory relationship mode R (U, F), U={a, B, C, D, E};,f={a→c, C→d, (A, b) →e};

  1. Data redundancy: Information such as attributes C, D, E, etc. may recur in the table, wasting a lot of storage space. (that is, the function dependency causes a lot of redundancy)
  2. Update exception: Information such as attributes C, D, E, etc. may recur in the table, so if they need to be modified to modify each of the corresponding items, it may result in inconsistent data. (that is, data redundancy causes an update exception)
  3. Insert exception: If the newly added element in attribute A, B, E has no corresponding c,d temporarily, then the new element cannot be inserted into the table. (that is, the function dependency caused the insert exception)
  4. Delete exception: If you need to delete an element of an attribute b,e, all elements of the tuple corresponding to that element will also be deleted. (that is, the function dependency causes the delete exception)

(ii) second paradigm (2NF)
If the relational schema is R∈1NF, and each non-principal attribute is completely dependent on the code (the combination of attributes, which uniquely identifies the attribute of the entity), then the relational pattern R belongs to the second paradigm, that is, when 1NF eliminates the partial function dependency of the non-principal attribute pair code, it is said to be 2NF. (Understanding: 2NF is a unique constraint on records, requiring records to have a unique identity, that is, the uniqueness of the entity)

By example 1, it can be seen that dependencies often lead to problems such as data redundancy, update exceptions, insert exceptions, delete exceptions, and the specification of dependency between attributes can be reduced to some extent. The degree of dependence between attributes is the basis of distinguishing different paradigms. As can be seen from Example 1: The code in Example 1 is a property A, a, where a→c, that is, the attribute C is a part of the function of the code dependency, then 2NF to Example 1 is to eliminate the non-primary property of the partial function dependency of the code. As follows:

Example 2: The relationship pattern R1 (U, f), U={a, B, e};,f={(A, b) →e}; Memory Relationship mode R2 (U, f), U={a, C,d,};,f={a→c, c→d};

In Example 2 by dividing the table of example 1 into two tables, the following is also analyzed from the above:

  1. Data redundancy: Attributes C, D, and so on may still be repeated in the table, wasting a lot of storage space, but the ratio of attribute e to 1, and attribute E in Example 2, eliminates redundancy.
  2. Update exception: Except for attribute e, the update exception is eliminated as in the other analysis in Example 1.
  3. Insert exception: Resolves the problem in Example 1, but there are other problems, such as the property c,d has a new element and no corresponding property A.
  4. Delete exception: Resolves the problem in Example 1, can retain the corresponding attribute a,c,d elements, but there are other problems, such as the need to delete an element of the attribute a,c, the corresponding d attribute of the element is also deleted.
    In Example 2, it seems that 2NF to Example 1 can be improved, but there are still some problems, these problems are mainly caused by the transmission of dependence, example 2, Example 1, the biggest improvement is to make each table in Example 2 has a unique identification of the records.

(iii) Third paradigm (3NF)
If there is no such code x in the Relationship mode R (U, F), the attribute group Y and the non-primary attribute group Z (z⊆y) make the X→y (y? X), Y→z is formed, then the relationship mode R belongs to the third paradigm. (Understanding: A constraint on the redundancy of a field (eliminating the transfer function dependency of a non-primary property code), that is, any field cannot be derived from another field, it requires no redundancy in the field)

Example 3: The relationship pattern R1 (U, f), U={a, B, e};,f={(A, b) →e}; notation Relationship pattern R2 (U, f), U={a, c};,f={a→c}; notation Relationship mode R3 (U, f), u={A, d,};,f={a→d};

Example 3 in Example 2 of the transitive dependency section is divided into two tables namely R2, R3, and now again a simple check to find the above redundancy and exception resolved (at my current level it should be resolved).

(iv) other paradigms
There are other paradigms, such as BCNF, 4NF. Other paradigms I'm not quite sure how to understand expression and application, just give me the chart I saw when I checked the data. Later content will have time to further study and supplement.

Reference documents:
Wang Shan, Shaman Xuan Munsu, database system introduction [M], 4th edition, Beijing: Higher Education Press, 2006,5.
Hu San Ming, software designer tutorial [M], 3rd edition revision, Beijing: Tsinghua University Press, 2013,3.

[Summary] Relational database normalization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.