Three paradigms of database design, BCNF, 4NF

Last Update:2017-02-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, understanding the paradigm of the database requires understanding several basic concepts:

Code: A table can uniquely determine a tuple of a property (or attribute group), if such a code has more than one, then everyone is called candidate code, we choose from the candidate code to do the boss, it is called the main code. Equivalent to the value of the key. Main attribute: A property is the primary property as long as it appears in any candidate code. Non-primary attribute: Contrary to the above, it is not present in any candidate code, and this property is not a primary attribute. Foreign code: An attribute (or attribute group), it is not a code, but its other table's code, it is the outer code.

Second, the form of detailed

In order to build a database with less redundancy and reasonable structure, some rules must be followed in designing the database. In a relational database, this rule is called a paradigm. A paradigm is a summary that conforms to a particular design requirement. In order to design a relational database with reasonable structure, a certain paradigm must be satisfied.

The most common design paradigm in actual development is three:

1. First paradigm (ensure that each column remains atomic)

The first paradigm is the most basic paradigm. If all the field values in a database table are atomic values that are not decomposable, the database table satisfies the first normal form.

The rational compliance of the first paradigm needs to be determined according to the actual requirements of the system. For example, some database systems need to use the "address" this attribute, the "address" attribute directly to the design of a database table fields on the line. However, if the system often accesses the "City" section of the "address" attribute, then the "address" attribute must be split into provinces, cities, detailed addresses, and so on, so that it is convenient to operate on one part of the address. This design only satisfies the first normal form of the database, as shown in the following table.

The user information shown in the table above follows the requirements of the first paradigm, so it is convenient to classify users using the city and improve the performance of the database.

2. Second paradigm (ensure that each column in the table is related to the primary key)

The second paradigm is more advanced on the basis of the first paradigm. Second paradigm needs to ensure that each column in a database table is related to a primary key, not just a part of a primary key (primarily for a federated primary key). This means that in a database table, only one data can be saved in a table, and multiple data cannot be saved in the same database table.

For example, to design an order information table, because there may be a variety of items in the order, the order number and the item number are used as the joint primary keys for the database table, as shown in the following table.

Order Information Table

This creates a problem: This table is based on the order number and the item number as the joint primary key. In this table, the product name, unit, commodity price and other information are not related to the table's primary key, but only related to the commodity number. So it violates the design principle of the second paradigm.

And if you split the order Information table, separate the product information into another table, and separate the Order Items table into another table, it's perfect. as shown below.

This design, to a large extent, reduces the redundancy of the database. If you want to obtain the product information of the order, use the item number to inquire in the commodity information table.

3. Third paradigm (ensure that each column is directly related to the primary key column, rather than indirectly related)

The third paradigm needs to ensure that each column of data in the datasheet is directly related to the primary key, not indirectly.

For example, when designing an order datasheet, you can establish a relationship between the customer number as a foreign key and the order table. Instead of adding fields to the order table for other customer information, such as name, company, and so on. The design shown in the following two tables is a database table that satisfies the third normal form.

This way, when you query order information, you can use the customer number to reference the records in the Customer information table, and do not have to enter the contents of the customer information multiple times in the order Information table to reduce the data redundancy.

4. BCNF Paradigm

Boyce-Ke Paradigm (bcnf is an improved form of 3NF)

A condition that satisfies the BCNF relational pattern:

1. All non-primary attributes are fully functional dependent on each code.

2. All the main attributes are also fully functional dependent on each code that does not contain it.

3. No attribute full function relies on any set of attributes that are not code.

Suppose the Warehouse Management relational table is storehousemanage (warehouse ID, store item ID, admin ID, quantity), and one administrator works only in one warehouse; a warehouse can store multiple items. The following decision relationships exist in this database table:

(Warehouse ID, store item id) → (admin id, quantity)

(Admin ID, store item id) → (warehouse ID, quantity)

So, (warehouse ID, store item id) and (admin ID, store item ID) are the candidate keywords for storehousemanage, the only non-critical field in the table is quantity, it is in accordance with the third normal form. However, because of the following decision relationships:

(warehouse id) → (Administrator id)

(Administrator id) → (warehouse id)

There is a key field that determines the key field, so it does not conform to the BCNF paradigm.

Decompose the Warehouse Management relationship table into two relational tables:

Warehouse Management: Storehousemanage (warehouse ID, admin ID);

Warehouse: Storehouse (warehouse ID, store item ID, quantity).

Such a database table conforms to the BCNF paradigm.

5, 4NF

4NF is the restriction of the attributes of a relational pattern between non trivial and multivalued dependencies that are not functional dependencies. Because by definition, for each nontrivial multivalued dependent x→→y,x contains a candidate code, there is a x→y, so the nontrivial multivalued dependencies allowed by 4NF are actually functional dependencies.

Here are some concepts to solve:

function dependencies The simple point is that when a property set determines another property set, another attribute set is said to depend on the property set.

(1) Data dependence in computer science, data dependency refers to a state in which the structure of a program causes data to be processed before the data is referenced. The most important of these is function dependency and multivalued dependency. (2) Function-dependent set X,Y is a set of two properties of the relationship R, and when x property values in any of the two tuples in R are the same, their Y property values are the same, the X function is called Y, or the Y function depends on X. (3) Trivial function dependencies when the attribute set Y in the relationship is a subset of the attribute set X (y?) X), there is a function-dependent x→y, a set of property functions that determines all subsets of it, a function dependency called trivial function dependencies. (4) Non-trivial function dependencies when the attribute set Y in a relationship is not a subset of the attribute set X, there is a function dependency x→y, which is called a nontrivial function dependency. (5) A complete function-dependent set X,Y is a set of two properties of the relationship R, X ' is a true subset of X, there is x→y, but for every x ' there is an X '!→y, then the Y-complete function is called X. (6) Partial function-dependent set X,Y is a set of two attributes of the relation R, existence x→y, if X ' is a true subset of x, there is X ' →y, then the Y part function is called X. (7) The Transfer function dependency setting X,y,z is a separate set of attributes in relation R, exists x→y (Y!→x) and Y→z, then the Z transfer function is called X. (8) Multivalued dependency setting R (U) is a relational pattern on the attribute set U. X,y,z is a subset of U, and z=u-x-y. The multivalued dependency x→→y in the relational mode R (U) is established, and when and only if you have any relation to r (U), a given pair of (x,z) values have a set of Y values that are determined only by x values and are independent of the z value. Trivial multivalued dependencies and nontrivial multivalued dependencies: If x→→y, and Z is the empty set, then the x→→y is called a trivial multivalued dependency; if z is not empty, it is called a non-trivial multivalued dependency.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More