The simplest database "paradigm" tutorial

Last Update:2016-06-11 Source: Internet

Author: User

Tags one table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Since all are called the simplest database "paradigm" tutorial, I think it must meet this requirement: After reading this blog, you will understand the "paradigm" of the database and those annoying concepts such as "full function dependency", "partial function dependency", "transfer function dependency" and so on, if you have to follow my thinking carefully read it, good , are you ready for half an hour? Let's start.

Directory

What is a paradigm?
An example that runs through the full text.
First Paradigm (1NF)
A few important concepts.
Second Paradigm (2NF)
Third Paradigm (3NF)
BC Paradigm (BCNF)
Paradigm Four (4NF)

1. What is a paradigm?

Paradigm is actually 关系数据库规范程度的级别 , to give an example of our lives, the teacher let the cleaning, the minimum standard is that the level two standard is, the level three standard is, in fact, in the process of rising standards, the 扫地 扫地+擦桌子 扫地+擦桌子+擦玻璃 first high-level standard is to meet the low-level standard, the paradigm is the same, Only its standard describes the degree of database normalization. The paradigm in the database is 1NF, 2NF, 3NF, BCNF, 4NF,5NF (not discussed in this document). The relationship between them (the higher the default, the greater the 1NF < 2NF < 3NF < BCNF < 4NF level).

2. An example that runs through the full text.

This example has the following relationship

(School number, class name)--Score
Name of study number
School Number--name
Dean, Department name
Note: a->b: Understand that the A attribute can uniquely determine the B attribute.

This example is gradually optimized when it comes to the second and third paradigms.

3. First paradigm (1NF)

each column of a database table is an atomic data item that cannot be re-divided , it is the minimum standard for a relational database, and if the first paradigm is not met, then the database is not a relational database. As shown below:

This table can not be created directly in the database because it has 系 this column, which has a 系名 系主任 conflict, and if the adjustment is made to conform to the first paradigm, then it should be:

But even then, there are exceptions such as insertions, deletions, data redundancy, and so on in the database. For example, for the following table:

Data redundancy: You can see names, department names, and Dean. The three columns of redundancy are very large.
Insert exception: If the computer department is currently open, but not yet enrolled, then the computer system can not be inserted into the table of the name of the system, (student number, class name) is the main attribute, cannot be empty.
Delete exception: If all students in the Department of Economics graduate, the Department of Economy will be deleted when all students are removed.

So the design of this database is flawed, in order to optimize it, we continue to the second paradigm. Before we say the second paradigm, let's start with a few important concepts.

4. A few important concepts.

Function dependency: In a table, if given a property (or attribute group) value, must be able to uniquely determine the value of the B attribute, then said B depends on a attribute (or attribute group), as A->b, such as a given number, you can determine the name, a school number must be able to determine a name, so simple, is it familiar? Yes, we've actually used this concept in the second part of the story that runs through the full-text example.

Full function dependency: On the basis of a function dependency, our A property is an attribute group, and only all of the properties in this attribute group can determine a unique B property, and no subset of a is allowed. For example (school number, class name) and scores, and the individual school number or class names can not determine the results, this is called complete function dependence.

Part of the function depends on: Compared to the full function, a property group of some of the properties can determine the B property, other properties are optional, such as (school number, class name), name, in fact, as long as the study number on it, such a dependency is called part of the function dependency.

Transfer function Dependency: If a->b,b->c, and B cannot->a (prevent direct A->C), then we can conclude a->c, called C transfer function dependent A. For example, the name of the school, department name, department head, and (department name can not determine the number), so the Department Director transfer function dependent number.

Code: A property or group of attributes that, in addition to this attribute or attribute group, is the same as the entire relationship 完全函数依赖 , which is the code. For example (school number, score), their two combinations can be all other attributes are decided, (school number, class name), the score, learning number, name, school number, the department name, school number, and the Department of the Dean, so (study number, class name) is a code in this relationship, Is there any other code in this relationship? Who knows, but our analysis should look like this: from one property to the N properties all over again.

An attribute: study number, name, name, department head, class name, score. To find a counter example: the number can not determine the score, (must be the name of the class), the name can not be decided, the name of the department is only the decision of the Dean, the class name can not decide what, the same score. So there must be no code in an attribute.

Two attributes: (school number, name), (school number, department name), (school number, Department director), (school number, class name), (school number, score), (name, department name), (name, Department director), (name, class name), (name, Grade), (department name, department head), (name, Class), (department, Score), (Dean, Class name), (Department Director, score), (class name, score). Call ~ finally finished, we further analysis, found that only (school number, class name), this combination and other relationships is a complete functional dependence. That is, they are a code.

Three properties: ...

Four properties: ...

Five attributes: ...

Six properties: ...

All right, is that the only way? Yes, only this, in fact, there are some ways to reduce the workload, such as if we are in the analysis of two attributes have been determined (school number, class name) is 码 , then in the future analysis, if the inclusion (school number, class name) that must not be the code, because the full function depends. For example (school number, class name, department name) will not be analyzed. Other ways to summarize it yourself.

Main properties: All the attributes contained in the code are the primary properties.

Non-primary property: An attribute other than the property that is included in the code.

Shout ~, go back and look again, we are going to the second paradigm.

5. Second paradigm (2NF)

On the basis of the first paradigm, we eliminate the partial function dependence of the principal attribute on the main property , and for our example, we analyze the following:
Main attribute: study number, class name
Non-main attributes: Name, name, department head, score

There is currently only one table:
(School number, name, name, department head, class name, score)

Obviously, the name and the surname are the part of the function that is dependent on (school number, class name). Because in fact, only need to study number to determine the name and department name. So the table is currently not in accordance with the second paradigm, we have to decompose the table, of course, the decomposition of the pattern is not unique, the following is just a case, as follows:

Elective Course (School number, class name, score)
Student (school number, name, department name, dean)

We'll see if the two tables meet the second paradigm at this point:
Choose a timetable: Main attribute: (school number, class name). Non-primary attribute: score. There is no partial function dependency of the non-primary attribute on the main attribute, which satisfies the second normal form.
Student table: Main attribute: Name. Non-main attribute: Department name, Dean, school number. Because there is only one primary attribute, there must be no partial function dependency on the primary property of the non-primary property. satisfies the second paradigm.

As for how to Brahma attributes, I don't need to emphasize it, if you forget, look back. At this point the data becomes this:

Let's look back and see if the previous question has been improved:

Data redundancy problem: The data redundancy of name, department name and Dean has been improved significantly.
Insert Exception Issue: A new department has been opened, but it is still not possible to insert the name into the student table because the number is the primary attribute and cannot be empty. No improvement.
Delete Exception issue: A department graduate, delete all student information of this department, it will also delete the information of the department. No improvement.

Therefore, it is not enough to satisfy the second paradigm only. There are still a lot of problems.

6. The third paradigm (3NF)

On the basis of the second paradigm, eliminating the transfer function dependency of the non-principal attribute on the main attribute , again, we continue to analyze our example:
Main attribute: study number, class name
Non-main attributes: Name, name, department head, score

Our table is as follows:
Elective Course (School number, class name, score)
Student (school number, name, department name, dean)

We found in the student table: there 系主任 学号 is a transfer function dependency on the main property of the non-primary attribute. Because 学号->系名，系名->系主任 that's why there are so many problems ahead. Then we try to decompose the student table again, eliminating this transfer function dependency. After decomposition, the following:

Elective Course (School number, class name, score)
Student (school number, name, name)
Department (department name, department head)

We continue to analyze:
The selection of the curriculum in the code for (school number, class name), non-primary attributes are fractions, they are fully functional dependence of the relationship, there is no non-primary property of the main property of the transfer function dependency. Conform to the third paradigm.
For the student table, the code is the school number, the main attribute is the school number, the non-main attribute is the system name, cannot exist the non-main attribute to the code the transfer function dependence, conforms to the third normal form,
For the system table, the code is the system name, the main attribute is the department name, the non-main attribute is the head of the department, it is impossible to have a non-main attribute for the transfer function dependency of the code (at least three attributes may exist in the transfer function dependency), also conform to the third paradigm.

At this point the data becomes now:

Let's look back and see if the previous question has been improved:

Insert Exception Issue: Now a new department, not yet enrolled we can save the information of the Department, because we have this table. The problem has been improved.
Delete Exception problem: A department graduate, delete all student information of this department, now will not be associated with the deletion of information, because we have this table. The problem has been improved.

When the database reaches the third paradigm, basically about data redundancy, data insertion, deletion, update of the exception problem has been resolved, this is a "legitimate" database the most basic requirements, but the efficiency problem is another matter, because the more tables, the more connected operations, but the connection is a more resource-intensive operation. For our previous example, it has been optimized to the best, and there is no place to optimize it again. Here's another example when we talk about the BC paradigm.

7. BC Paradigm (BCNF)

On the basis of the third paradigm, we eliminate the dependency of the main attribute on the main attribute and the transfer function . You're not mistaken, are you? Yes, I'm not mistaken. Let's look at the following example.
If:
1: A company has a number of warehouses;
2: Only one administrator per warehouse, an administrator can only work in a warehouse;
3: A warehouse can store a variety of items, an item can also be stored in different warehouses. Each item has a corresponding quantity in each warehouse.

As shown below:

Properties are: Warehouse, administrator, item, quantity.
Let's start by looking for the code:

An attribute: No
Two attributes: (admin, Item), (Warehouse, item)
Three attributes: Contains two attributes, pass
Four properties: Contains two attributes, pass

We get the main attribute: admin, item, Warehouse
Non-master properties: Quantity

There is no number of partial function dependencies and transitive functions dependent on the primary attribute, some people may have doubts? Clearly (warehouse, goods) and quantity, why is not part of the function depends on, because we say here the main attribute is the code, at this time the three main property is composed of two yards (administrators, goods), (warehouses, items) together, any one and the number of them can not be partially dependent on the function. So the table is full of the third paradigm, but it still has the following problem:

Delete the problem: for the last record, (Beijing Warehouse, John Doe, ipad mini,60), we will never store the ipad Mini in this warehouse, the deletion can only be deleted together with the warehouse.
Insert Issue: If you create a new warehouse, you cannot assign an administrator to the warehouse if you have not already deposited the item.
Modify exception: If a warehouse is replaced by an administrator, the changes need to be modified by one record at a time.

Now that we are satisfied with the third paradigm, why are there so many problems? Because there is a partial function dependency of the main attribute on the main attribute , this example exists (管理员，物品名)->仓库 , but in fact 管理员->仓库 . Therefore, there is a partial function dependency of the main attribute on the main attribute (code).

We decompose its pattern:
Warehouse (warehouse name, administrator)
Inventory (warehouse name, item name, quantity)

Looking back again with the original question has not been resolved

Delete the problem: Because the warehouse now has a special table, so when deleting items will not affect the warehouse. The problem has been improved.
Insert issue: Now create a new warehouse directly in the warehouse table, which is not related to the administrator. be improved.
Modify the exception: Modify the administrator of a warehouse, directly modify the warehouse table can be. be improved.

The database that complies with BCNF is already very strict. But this is strictly at the 函数依赖 level. What? Rise level? No, that's right! The next fourth Paradigm Research area is the 多值依赖 level.

8. Paradigm Four (4NF)

Well, let's just take a straight look at one example:

Teaching (c,t,b), C denotes a course, T denotes a teacher, B is a reference book.
There are the following denormalized relationships:

Let's first make it meet the 1NF as follows:

There are many problems found:

Information redundancy: There is a large amount of data redundancy in courses and teachers.
Insert problem: When a course adds a classroom teacher, you need to insert multiple tuples.
Delete a problem: If you delete a book, you need to delete a lot of records, very troublesome.

Let's see how it belongs to the first paradigm. Found in this relationship, C, T, B are unique, that is, the full key. There is no non-primary attribute, and there is no partial function dependency and transitive function dependency of the primary attribute on the primary attribute (since at least two main attributes in that code), it has only one code (C, T, B). That is, the relationship belongs to BCNF. Then why is it that there are so many problems? That's because it has a multivalued dependency.

Multivalued dependency: In relational mode R (U), X, Y, Z are subsets of U, Z=u-x-y. If a multivalued dependent x->->z is established, then given (x, y), a set of values for Z can be determined, and this value is determined only by x, not with Y.

For example: In Teaching, C, B, T is a subset of teaching, T=teaching-c-b. Given (C,B), for example (Database principle and application, SQL Server 2000), can determine a set of T values (Deng Yu, Magosawa). But this set of T values is only related to C (Database principle and application), and B (SQL Server 2000) does not matter, we say c->->t. Called T multi-valued depends on C.

Trivial multi-valued dependency: If Y is an empty set. X->->z is a trivial multi-valued dependency
Non-trivial multivalued dependency: Y is not empty.

In fact, it is because of the existence of multi-valued dependencies in the teaching relationship, so we can say the problem, we do the pattern decomposition, the resulting table is as follows:

TC Table (course, teacher)
BC Table (course, reference book)

See if our problem has been solved:

Information redundancy: improved.
Insert problem: Add a class to a course teacher, just adding a record to the TC table is fine. Improve
Delete a problem: Delete a book and simply delete a record in the BC table. Improve

Now the table is to meet the four paradigm, said so much, we finally give the definition of the fourth paradigm: the relationship mode R satisfies the premise of 1NF, if r in every non-trivial multivalued dependent x->->y,x contains keys, then R belongs to 4NF, in the relationship discussed earlier. Keys are (T, C, B) and are all keys. C->->b,c->->t, this multivalued dependency is non-trivial, but the individual C is not a key, so the IV paradigm cannot be satisfied.

Summary: function dependency and multivalued dependency are two important data dependencies. If you only consider function dependencies, BCNF is already very high level, but also to consider the addition of multi-valued dependency and connection dependency, connection dependency belongs to the category of 5NF. As can be seen from the three different examples in this article, database optimization for different databases to reach the appropriate level, a simple database to reach the third paradigm is already very perfect. and increasing the paradigm means decomposition, which means that the query may be connected later, and that will affect efficiency. At last, we give the famous database optimization 16 words by: from low to high, gradually standardize, weigh the pros and cons

If you seriously see here, I think you must have something to gain, so praise a bar.

Reference: Liu Yan teacher

The simplest database "paradigm" tutorial

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More