Basic Introduction to Database standardization principle _ MySQL

Source: Internet
Author: User
Tags types of tables
Basic Introduction to Database standardization principles DatabaseIf you have been dealing with things long enough, you must have heard of the term "normalization. Someone may ask you, "is your database a standardized database ?" Or "does your database meet the BCNF paradigm ?" People often leave standardization aside and think that this is a superfluous thing. they think that only people with academic style have the spare time to normalize data. However, understanding the principles of standardization and applying these principles to routine database design tasks is not that complex, and standardization can significantly improve the performance of the database management system.

This article will introduce the concept of standardization and describe the most common paradigms in detail.

What is normalization?

Standardization is the process of effectively organizing database data. The two main purposes of the standardization process are to eliminate redundant data (such as storing the same data in more than one table) and make sure that the data dependency is in the valid state (the relevant data is stored in only one table ). The implementation of these two goals is meaningful because it can reduce the space consumption of databases and tables and ensure data storage consistency and logic.

Paradigm

The international database community has formulated a series of special rules that must be followed to build databases to ensure database standardization. In relational databases, this rule is a paradigm. in the world of databases, we use numbers to define different levels of paradigms. There are five patterns in total: the first paradigm (1NF), 2NF, 3NF, 4NF, and 5NF ). The first paradigm must meet the minimum requirements. The second paradigm adds more requirements on the basis of the first paradigm, and so on. In practical application, the most common ones are the first, second, and third paradigms, which will also be the focus of this article. sometimes the fourth paradigm is satisfied, and the fifth paradigm is even more rare, this article will not go into detail.

Before we begin to discuss specific paradigms, we must understand the concept that these paradigms are rules and they are just rules. Sometimes, in order to meet the actual needs of commercial applications, we will inevitably deviate from these rules. In any case, if such bias occurs, it is important to evaluate any impact on the system and possible data inconsistency.

1NF)

For relational databases, the first paradigm is the most basic rule settings and must meet the following two requirements:

Delete duplicate columns from the same table

Create an independent table for each data Group and identify each row with a unique column or column set (primary key.

What do these rules mean when considering the actual design of a database? It is actually very simple.

The first rule indicates that duplicate data cannot appear in the same row of a table. it indicates the inseparable attribute of a database table. For example, in a human resource database, the correspondence between the supervisor and its subordinates is stored. To illustrate the rules of the first paradigm, we have set such commercial rules that each supervisor can have one or more subordinates, and each subordinate can have only one supervisor. When we create a list or spreadsheet to track this information, we may create a table with the following attribute columns:

Supervisor

Subordinate 1

Subordinate 2

Subordinate 3

Subordinate 4

A

James

Xiaolan

Xiaojun

B

Dan

C

Rongo

Tom

Xiaodong

John

However, think about the rule in the first paradigm: delete duplicate columns in the same table. Obviously, the four columns from subordinate 1 to subordinate 4 are repeated. Stop and think about the problem. If a supervisor has only one subordinate, the columns of subordinate 2-subordinate 4 are a waste of storage space (you must know that the storage space is the most useful thing for the database ). In addition, if a supervisor already has four subordinates, what should he do if he has recruited another one? You may need to modify the structure of the entire table.

At this time, new database users often come up with the second "good idea": try the following structure, so that we don't need to add another column, and can flexibly use the data storage space:

Supervisor

Subordinates)

A

Xiao Ming, Xiao Lan, Xiao Jun

B

Dan

C

Rongo, Xiao Bai, Xiao Dong, Xiao Wang

In this way, you can enter multiple values for the subordinate items in the table. If supervisor A has three subordinates, you can enter "Xiao Ming, Xiao Lan, and Xiao Jun" in this way ".

This solutionMethodIt is very close, but there are still defects. The subordinate columns are still repeated (multiple values) and are not inseparable. If we need to add or delete a subordinate, we need to read and write all the contents of the table. If this is not serious, what if there are more than one hundred subordinates under a supervisor? In addition, such a structure will complicate the query of data from the database in the future. The following table conforms to the first paradigm:

Supervisor

Subordinates

A

James

A

Xiaolan

A

Xiaojun

B

Dan

C

Rongo

C

Tom

C

Xiaodong

C

John

In this case, each subordinate can enter only once, while the supervisor can input multiple times.

Next, let's take a look at the second rule: unique columns or column sets (primary keys) to identify each row. Looking at the table above, you may suggest using subordinate columns as the primary key. We assume that each subordinate can only have one business rule for the supervisor. the subordinate column is indeed the first choice column for the primary key. However, the data stored in this table makes this column unsuitable for primary keys. Imagine what if we hire two employees, James? How should we store their superior-subordinate relationship with our supervisor in the database? Therefore, it is best to use a truly unique identifier (such as employee ID) as the primary key. The final table we get will be like this:

Supervisor ID

Subordinate ID

200101

200506

200101

200708

200101

200709

200302

200404

200010

200102

200010

200303

200010

200507

200010

200608

2NF)

The second paradigm (2NF) further deepens the concept of removing duplicate data. the main requirements include:

Meeting all requirements of the first paradigm

Remove the data subsets that a table applies to multiple rows and separate them into independent tables.

Use a foreign key to create a join relationship between these newly created tables and the original table.

These rules can be summarized as follows: The second paradigm extracts redundant data, places it in the new table, and establishes a join relationship for the new table to reduce the redundant data volume. Let's look at an example. assume that an online store stores all the customer data in a database. it may be a single table named customer and contains the following attributes: the customer ID, name, address, city, province, and zip code are shown in the following table:

Customer ID

Name

Address

Province

City

Zip Code

01

James

No. 12, rural Forest Road

Guangdong

Guangzhou

510000

02

Tom

No. 14 Beijing Road

Guangdong

Guangzhou

510000

03

Xiaojun

No. 4 Zhongshan Road

Zhejiang

Hangzhou

310000

04

Xiaolan

No. 2 Binjiang Road

Sichuan

Chengdu

610000

A glance at the table above will reveal a small amount of redundant data. Datasets such as Guangdong, Guangzhou, and 510000 may have been input several times. From the simple example table above, it seems that there are not many redundant data, but if the table has thousands of rows of data, it will waste a lot of space. Furthermore, assume that the zip code in Guangzhou has changed (although this is unlikely, the relevant information may change in other types of tables ), we need to modify many aspects of the entire database. In the database structure following the second paradigm, this redundant information will be extracted and stored in another independent Table. We call this new table a zip table, which is shown as follows:

Zip Code

Province

City

510000

Guangdong

Guangzhou

310000

Zhejiang

Hangzhou

610000

Sichuan

Chengdu

If you want to improve efficiency, you can even go to the post office to collect information about city provinces and zip codes, and fill out the form in advance. This type of database may be very useful when placing an order. when an operator places an order for you, he may first ask for your zip code and then know the province of your city. This setting helps reduce operator errors and improve efficiency.

Now we have removed duplicate data from the customer table to meet the first rule of the second paradigm. Next, we need to use a foreign key to associate the two tables. In this example, we use the zip code (the primary key in the ZIP code table) as the foreign key to create a join. The following is a new customer table:

Customer ID

Name

Address

Zip Code

01

James

No. 12, rural Forest Road

510000

02

Tom

No. 14 Beijing Road

510000

03

Xiaojun

No. 4 Zhongshan Road

310000

04

Xiaolan

No. 2 Binjiang Road

610000

In this way, we can minimize the redundant information stored in the database and make our table structure meet the requirements of the second paradigm.

3NF)

Databases that comply with the third paradigm must meet the following two basic requirements:

First, we must meet the requirements of the first and second paradigms.

Delete all columns that are not fully dependent on the primary key.

Assume that we have a commodity order table that contains the following attributes: order number, customer number, unit price, quantity, and total price. See the following table:

Order No.

Customer ID

Unit Price

Quantity

Total price

A1001

001

20

30

600

A1002

002

18

50

900

A1003

003

30

5

150

A1004

001

15

100

1500

A1005

004

20

40

800

Remember, our first requirement is that this table must meet the requirements of the first and second paradigms. Check whether there are repeated columns? Good. no. Is there any feasible primary key? Yes. The order number can be used as the primary key. In this way, we have met the requirements of the first paradigm. Is there any subset of data applied to multiple rows? No. This satisfies the requirements of the second paradigm.

Now, check whether all columns depend entirely on the primary key rather than other attributes in the table? The customer ID changes with the order number and does not seem to depend on other attributes. What about the unit price? If a standard price is set for each customer, this attribute may vary depending on the customer id attribute. However, looking at the data in the table above, we can find that even for the same customer, the unit price may be different (see the customer numbered 001 ). Therefore, the unit price is completely dependent on the order number. The quantity of commodities does not need to be changed with different orders, so there is no problem with the quantity attribute.

What about the total price? The problem lies here. The total price is multiplied by the unit price and the number of items, so it is not entirely dependent on the primary key. Therefore, in order to meet the requirements of the third paradigm, we must remove the total price attribute from this table. This table only uses the following attributes: order number, customer number, unit price, and quantity, and meets the requirements of the third paradigm. However, you may ask, what should we do with the total price? At this time, a derivative attribute should not be stored in the database. We can easily calculate the total price while performing database queries. For exampleSummaryBefore the attribute, we may use the following query statement to obtain the order number and total price information:

SELECT order number, total price

FROM commodity order

After the total price attribute is deleted, we can use the same method to obtain the same results without violating the paradigm rules. we only need to make a small change to the query statement:

SELECT order number, unit price * quantity AS total price

FROM commodity order

It is quite difficult to design the database to conform to the fourth and fifth paradigms. in particular, the fifth paradigm is difficult to implement, and forcible implementation may damage the database integrity, so it is seldom considered. Database standardization and the paradigm that needs to be followed when designing a database. The basic theory of real-time database design is based on any database.Tutorial. However, it is often difficult for many people to follow the above paradigm when they enter the practical operation stage of database design. I hope that when designing databases, we 'd better seriously consider whether we have followed these paradigm rules, after all, reasonable database structure, elimination of redundant data, and data consistency are harmless to the improvement of database management and query performance.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.