Basic tutorial 2: Relational Database Design Theory

Source: Internet
Author: User

This section describes the theory of Relational Database Design Based on relational algebra. Function dependency is the basic part, and various paradigms are introduced in the constraints of Pattern Design and their coping problems.

3.1 function dependency

The relationship design theory allows people to test and improve a design based on a few simple principles. The design theory sets constraints on the relationship. The most common constraint is function dependency FD.

3.1.1 FD Definition

Function dependency on Relational R: if the two tuples of R are in attributes A1, A2 ,..., if they are the same on an, they must be in the attributes b1, b2 ,..., BM is also consistent. We call the AI function decision BJ, which is recorded in the form of A1, A2 ,.., an-> b1, b2 ,.., BM.

If every instance of the relational R makes an Fd f true, R is called to satisfy the function dependency F.

3.1.2 link key

If the property set T = {A1, A2,..., an} of the link R isKey, You must satisfy 1) These attribute functions determine all other attributes of the relationship, 2) in all true subsets of T, there is no function to determine other attributes of R. Generally, the key is the smallest property set and the function determines other attributes. If an attribute set only meets 1st conditions but does not meet 2nd conditions, it is calledSuperkey.

If a link has multiple keys, specify one of themPrimary Key.

3.2 derivation and rules of function dependencies

3.2.1 FD Derivation

FD has different expressions for relational R. Note that S and T are the set of R function dependencies

S => T: When the relational instance satisfies the function dependency of S, it also satisfies the function dependency t, but vice versa.

S and T are equivalent: relational instances are exactly the same when S is satisfied and T is satisfied. S => T and t => S.

Transmission: If fd x-> Y, Y-> Z exists, X-> Z exists.

Transfer Function dependency: R (u) exists. If X-> Y, Y-> Z, Y-> X does not exist, and Y is not the true subset of X, and Z is not the true subset of Y, the Z transfer depends on X.

Augmented: If X-> Y exists, xz-> YZ exists.

3.2.2 decomposition/combination rules

FD A1, A2,..., An-> b1, b2,..., BM is equivalent to a set of FD:

(1) A1, A2 ,..., an-> B1; (2) A1, A2 ,..., an-> B2 ;...., (m) A1, A2 ,..., an-> BM.

3.2.3 trivial function dependency

The right side of the normal function dependency is the true subset on the left. Formally, for relational R (u), U is the complete set of attributes, with a subset of u X and Y, fd x-> Y. If y ⊆ X, this FD is called an ordinary function dependency. In contrast, FD is called a non-trivial function dependency.

3.2.4 calculate the closure of an attribute set

If t = {A1, A2 ,..., an} is an attribute set, S = {F1, F2 ,.., FM} is a set of function dependencies. The closure of T in the attribute set T in S is recorded as t +. every element in T + satisfies S and can be inferred from T.

Attribute closure algorithm:

Input: Attribute Set t {A1, A2,..., an}, FD set S

Output: Closure {A1, A2,..., an} +

1. If necessary, break down FD in s so that there is only one attribute on the right.

2. Set D to the result set and initialize it to T.

3. Search for fd x-> Y from s so that X contains D, while y is not in D. If found, add y to D and repeat this process. It ends when no element can be added to D.

4 at this time, D is {A1, A2,..., an} +

The validity of an algorithm can be derived from the pass-through.

3.2.5 basic set of function dependencies

Given an FD set F, any FD set equivalent to F is called the basic set of F. Here we only consider the basic set consisting of a single attribute FD on the right.

Minimal basic set: 1) All FD has a single attribute on the right. 2) delete any FD from it. This set is no longer a basic set. 3) delete one or more attributes from any FD, this set is no longer a basic set

3.2.6 projection of function dependencies

For relationship R, and FD set F on it, if a new relationship S is generated by projection of R, the FD set F' corresponding to S is called the projection of function dependent F. F''s fd satisfies the criteria inferred from F and only contains attributes in S.

Projection Algorithm of function dependency set:

Input: relational R and its projection S = π L (r), FD set F of R

Output: FD set established in S

1. Make t the result set to be evaluated, and the initial result is an empty set.

2. Calculate X for each subset of S.+Is calculated based on the FD set F. For all+And belongs to the property A of S. Add all non-trivial fd x-> A to T.

3 at this time, T is the basic set of FD established in S, but may not be the minimum basic set. Follow these steps to construct the minimal basic set of T: (minimum basic set algorithm)

3.1 If a fd in t can be inferred from other FD in T, it will be deleted from T.

3.2 If the left side of each FD is the Fd f of multiple attributes, if the left side of the FD deletes one attribute, the new FD f'1 can still be inferred from the other FD in T, then replace F with F1.

3.3 repeat the above two steps until t does not change.

3.3 Relational Database pattern design and bcnf paradigm

3.3.1 exceptions

Poor design may result in database usage exceptions. The basic types of these exceptions include redundancy (occupying a large amount of space) and update exceptions (not all updates are allowed), deletion exception (some data is lost after deletion ).

Generally, the relationship decomposition method is used to eliminate exceptions. An unreasonable relationship is broken down into multiple rational relationships.

3.3.2 bcnf paradigm

If R belongs to bcnf and is only used: the left side of the extraordinary FD of R is a superkey. If R's non-trivial fd x-> Y is formal, X is the Super key of R.

3.3.3 bcnf decomposition algorithm

Input: relational R0 and function dependencies on it F0

Output: a set of links decomposed by R0. Each of these links belongs to bcnf.

1 Set R = r0, F = f0

2 if R is already bcnf, if {r} is returned}

3If R has a bcnf violation, assume it is X-> Y. Use the attribute closure algorithm to calculate x +, select R1 = x + as the relational mode, and use R2 to include the attribute X and the attribute not in X +.

4. Use the FD projection algorithm to calculate the FD sets of R1 and R2, respectively, as F1 and F2.

5. recursive decomposition of R1 and R2. Returns the result set obtained from decomposition.

3.4 Advantages and Disadvantages of Decomposition

Decomposition has three properties: 1. Eliminate exceptions. 2. Information can be recovered. 3. Dependency persistence.

The bcnf decomposition described above can maintain the first two properties, while the 3nf introduced in the next section can maintain the last two. In fact, there is no way to ensure these three properties at the same time.

3.4.1 restore information from decomposition

If a connection is decomposed and R can be re-obtained through the connection, the decomposition contains lossless connections.

For relational R, the property set of R is x u y u Z. If X-> Y is true, r = π X, Y (r) π X, Z (R ). From this conclusion, the bcnf decomposition algorithm can ensure that the decomposition includes lossless connections.

3.4.2 lossless connection chase test

When a link decomposition contains lossless connections, that is, π S1 (r) π S2 (r... when π Si (R) = r, each tuples In the join result belong to R.

Chase inspection method:

1) initialize the chart. Assume that R contains attributes a, B.... Use a, B,... to represent the component of the tuples on the corresponding attributes. If t is in the connection result, T1 ,..., TK makes the projection result of each ti on the corresponding property set Si join t. Therefore, Ti and t have the same attributes in the corresponding Si, but other ti components are unknown. For Ti, the same letters as t are used to represent the components of the local Si attributes. If it does not belong to Si, the subscript I is added.

2) The Chase process. Use FD in F to be as accurate as possible to the letters in the chart. When an equivalent letter is used, if one of them has no lower mark, the other will also have no lower mark. If it is equivalent to two letters with different underlying symbols, you can change the subscript of any letter to the same one. If a row is the same as t, it can prove that the connection is lossless.

3.4.3 Formal Description of the chase test

Set link mode r = A1 ,..., A decomposition P = {R1 ,..., Rk }. The following are the steps for determining lossless join decomposition: (1) construct a table with K rows and n columns. Each column corresponds to an attribute AJ (1 ≤ j ≤ n ), each row corresponds to a mode RI (1 ≤ I ≤ k ). If AJ is in RI, enter the Aj symbol in column J of row I of the table; otherwise, enter the bij symbol. (2) regard the table as a relation of mode R, and repeatedly check whether each FD in F is valid in the table. If not, modify the elements in the table. The modification method is as follows: For a FD: X → y in F, if two rows in the table are equal on the X component, the Y component is not equal, change the two rows to be equal on the Y component. If one of the components of Y is AJ, the other is also changed to AJ. If there is no AJ, replace the other with one bij (Change IJ to a smaller number as much as possible, that is, the one with a small I value ). If one row in the table is a, that is, A1, A2 ,..., An, then it can be determined immediately that p is a lossless connection decomposition relative to F, and no further modification is required. If the table cannot be modified after multiple modifications and no row in the table is a, the decomposition is lossy. Note that there is a process of repeating the modification here, because a modification may make the table continue to be modified. Pay special attention to the modification process. If a BIJ is modified, all bij in its column must be modified accordingly. An example is provided to clarify this point. For example, we modify the table before FD "H → I" and "K → L", as shown in table 1 (the table has been modified many times, not the initial table, empty unit is omitted ): Table 1
H I J K L
R1   B12     B35
R2 A1 A2   A4 B25
R3 A1 B12   A4 B35
R4   B12     B35
The H component of the row where R2 and R3 are located is A1. According to fd "H → I", You need to modify the I component corresponding to the two rows, while the I component of the row where R2 is located is a2, therefore, you need to change the I component B12 of the row where R3 is located to A2. Note that the H component of the row where R1 and R4 are located is also B12. Therefore, the I component corresponding to these two rows must also be modified to A2. The K component of the row where R2 and R3 are located is A4. According to fd "K → L", You need to modify the L component corresponding to the two rows, therefore, modify the L component b35 of the row where R3 is located to a smaller b25. It is also noted that the L component of the row where R1 and R4 are located is also b35. Therefore, the L component corresponding to the two rows must also be modified to b25. The modified table is shown in table 2: Table 2
  H I J K L
R1   A2     B25
R2 A1 A2   A4 B25
R3 A1 A2   A4 B25
R4   A2     B25

3.4.4 dependency persistence

In some cases, after being decomposed into bcnf, some tuples may not meet the functional dependencies of the original relationship after lossless connection.

3.5 1-3 paradigm

3.5.1 definition of Paradigm

Third paradigm: relational r is the third paradigm. if and only for the extraordinary FD of R, the left side of FD is either a superkey or the right side is only composed of the primary attribute. (The primary attribute is the constituent attribute of the key ).

Second paradigm: relational r is the second paradigm. if and only when r complies with the first paradigm, the non-primary attribute function of R relies entirely on the primary attribute of R.

First paradigm: Each tuples in relational R must have a primary key, and the attribute value can only be a single value.

From the first paradigm to bcnf is a process of increasing the constraints on the non-ordinary FD in R. Under the 1nf constraint, tuples may consist of multiple FD, and there are multiple irrelevant attributes. Under the 2nf constraint, irrelevant attributes are limited to attributes inferred from each FD. Under the 3nf constraint, only one FD can exist or the dependency can be transferred. Only one FD can exist under the bcnf constraint.

3.5.2 3nf decomposition algorithm

Input: Relationship R and Its FD set F

Output: a set of links decomposed by R. Each link belongs to 3nf.

1. Find a minimum basic set of F, and mark it as G.

2. For each fd x-> A in G, XA is used as a relational model Ri.

3. If the relationship obtained after Step 1 is decomposedDo not contain the r superkeysTo add a link. The mode is any key of R.

3.6 multi-value dependency and fourth paradigm

3.6.1 multi-value dependency

Given the relational mode R (u), X, y, and z are the subsets of U, and z = u-x-y. If multiple values depend on X-> Y, if only a set of values of Y are specified for a pair of values (x, z, the value of Y is only determined by X and is irrelevant to Z.

More precisely, for each T, U, Which is consistent in the X attribute set in R, the V can be found in R, meeting the following conditions: 1) V is the same as T and U in the X property set. 2) V is the same as T in the Y property set. 3) V is the same as u in the property set Z.

The number of separate property sets Y and Z based on the X attribute is often used in an attempt to merge (two or more 1: N relations appear in a single relationship ), this leads to the need to completely pair Y and Z in the relationship, resulting in a large amount of redundancy. In the following example, The Guardian column is only related to the name, but not the contact information column.

Name contact information guardian
James 10000 Ming Dad
James 20000 Ming Dad
James 10000 Ming Ma
James 20000 Ming Ma

3.6.2 derivation of multi-value dependency

Ordinary MVPs: if Y is a subset of X, the X-> Y is valid in any relation.

Transfer rule: if the number of values in the input file is "X-> Y, Y-> Z", the attributes of a in X-> Z and Z must be removed from Z.

FD elevation: FD indicates the number of UPS. That is, if X-> Y exists, X-> Y.

Complementary rules: defined by the MVPs, in R, Z = u-x-y. If there is a value of *-> Y, there is also a value of *-> Z.

Append ordinary MVPs: For R (u), if u = x u y, then the value of MVPs X-> Y is true.

3.6.3 fourth paradigm

If and only if the relationship R is 4 NF, X is the superkey for each non-trivial MVPs X-> Y of R.

Fourth paradigm decomposition algorithm:

Input: relational R0 (U0), where FD and SCC set F0

Output: a set of links decomposed by R0. Each link belongs to 4nf. Decomposition is non-destructive.

1 initialize R (u) = r0, F = f0

2. Search for 1 4nf violation in F. If not, R is 4nf, and R is returned.

3 if there is a 4nf violation X-> Y, R is divided into the following two modes: r1 (x, y), R2 (x, U-X-Y)

4. Calculate the projection FD and MVPs of R1 and R2. Recursive decomposition of R1 and R2. (In section 3.7, we will discuss the projection algorithm of the MVPs)

For example, the preceding example can be divided into two modes (name, contact information) (name, guardian)

3.6.4 relationship between paradigms

3.7 MVPs Discovery Algorithm

3.7.1 closure and chase

The closure mentioned above is essentially the same as the Chase algorithm. To determine whether X-> Y can be inferred from F, the chase method is as follows:

1. The initialization legend contains two rows, Which are consistent only on the property set X.

2. Use FD in F to execute the chase process on the legend.

3. If the final legend is consistent among all columns of Y, X-> Y is true; otherwise, X-> Y is invalid.

3.7.2 extend Chase

The FD derivation method of Chase can also be used to derive the MVPs. If you want to export the density X-> Y from the given FD and density set, the initial legend must contain two rows that are consistent on X but are different on all other attributes. Use the given FD to be equivalent to a letter, and use the given MVPs to exchange values of the Y attribute in the existing two rows to generate a new line. If the y attribute of one original tuples is replaced by the y attribute of the other original tuples in the legend, you can export the value of the corresponding MVPs.

A simple method is to define a row of the target tuples and never change it. That is, each component of the target row does not contain any lower-level letters. At first, make all the components of X in the two original rows corresponding to X-and Y in the legend do not carry any lower-case letters. All components of the first line that belong to y do not carry any subscript, and all components of the second line that do not belong to X and Y do not carry any subscript. In this way, during the chase process, you only need to check whether all the components are rows without subscript letters in the legend.

3.7.3 projection MVPs

After the relationship is decomposed, We need to calculate the projects of the projection relationship and the projects of FD. This is already described before the projection algorithm of FD. Here, we can use the chase method to test all FD and MVPs in the relationship obtained by each decomposition. However, we usually do not need to process all the mivs. The following is a simplified method:

1. No need to test the average FD and MVPs.

2 For FD, because there are merging rules, you only need to find the FD with a single attribute on the right.

3. If the left side of an FD or MVPs does not contain the left side of any given dependency, it must not be verified.

References

Multi-value dependency and fourth paradigm http://en.wikipedia.org/wiki/Fourth_normal_form

Basic tutorial on database systems Jeffrey D. Ullman, the third edition of the original book, Jenifer widom

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.