Basic Introduction to Database standardization principle

Basic Introduction to Database standardization principle _ MySQL

Last Update:2017-05-12 Source: Internet

Author: User

Tags types of tables

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic Introduction to Database standardization principles DatabaseIf you have been dealing with things long enough, you must have heard of the term "normalization. Someone may ask you, "is your database a standardized database ?" Or "does your database meet the BCNF paradigm ?" People often leave standardization aside and think that this is a superfluous thing. they think that only people with academic style have the spare time to normalize data. However, understanding the principles of standardization and applying these principles to routine database design tasks is not that complex, and standardization can significantly improve the performance of the database management system.

This article will introduce the concept of standardization and describe the most common paradigms in detail.

What is normalization?

Standardization is the process of effectively organizing database data. The two main purposes of the standardization process are to eliminate redundant data (such as storing the same data in more than one table) and make sure that the data dependency is in the valid state (the relevant data is stored in only one table ). The implementation of these two goals is meaningful because it can reduce the space consumption of databases and tables and ensure data storage consistency and logic.

Paradigm

The international database community has formulated a series of special rules that must be followed to build databases to ensure database standardization. In relational databases, this rule is a paradigm. in the world of databases, we use numbers to define different levels of paradigms. There are five patterns in total: the first paradigm (1NF), 2NF, 3NF, 4NF, and 5NF ). The first paradigm must meet the minimum requirements. The second paradigm adds more requirements on the basis of the first paradigm, and so on. In practical application, the most common ones are the first, second, and third paradigms, which will also be the focus of this article. sometimes the fourth paradigm is satisfied, and the fifth paradigm is even more rare, this article will not go into detail.

Before we begin to discuss specific paradigms, we must understand the concept that these paradigms are rules and they are just rules. Sometimes, in order to meet the actual needs of commercial applications, we will inevitably deviate from these rules. In any case, if such bias occurs, it is important to evaluate any impact on the system and possible data inconsistency.

1NF)

For relational databases, the first paradigm is the most basic rule settings and must meet the following two requirements:

Delete duplicate columns from the same table

Create an independent table for each data Group and identify each row with a unique column or column set (primary key.

What do these rules mean when considering the actual design of a database? It is actually very simple.

The first rule indicates that duplicate data cannot appear in the same row of a table. it indicates the inseparable attribute of a database table. For example, in a human resource database, the correspondence between the supervisor and its subordinates is stored. To illustrate the rules of the first paradigm, we have set such commercial rules that each supervisor can have one or more subordinates, and each subordinate can have only one supervisor. When we create a list or spreadsheet to track this information, we may create a table with the following attribute columns:

Supervisor	Subordinate 1	Subordinate 2	Subordinate 3	Subordinate 4
A	James	Xiaolan	Xiaojun
B	Dan
C	Rongo	Tom	Xiaodong	John

However, think about the rule in the first paradigm: delete duplicate columns in the same table. Obviously, the four columns from subordinate 1 to subordinate 4 are repeated. Stop and think about the problem. If a supervisor has only one subordinate, the columns of subordinate 2-subordinate 4 are a waste of storage space (you must know that the storage space is the most useful thing for the database ). In addition, if a supervisor already has four subordinates, what should he do if he has recruited another one? You may need to modify the structure of the entire table.

At this time, new database users often come up with the second "good idea": try the following structure, so that we don't need to add another column, and can flexibly use the data storage space:

Supervisor	Subordinates)
A	Xiao Ming, Xiao Lan, Xiao Jun
B	Dan
C	Rongo, Xiao Bai, Xiao Dong, Xiao Wang

In this way, you can enter multiple values for the subordinate items in the table. If supervisor A has three subordinates, you can enter "Xiao Ming, Xiao Lan, and Xiao Jun" in this way ".

This solutionMethodIt is very close, but there are still defects. The subordinate columns are still repeated (multiple values) and are not inseparable. If we need to add or delete a subordinate, we need to read and write all the contents of the table. If this is not serious, what if there are more than one hundred subordinates under a supervisor? In addition, such a structure will complicate the query of data from the database in the future. The following table conforms to the first paradigm:

Supervisor	Subordinates
A	James
A	Xiaolan
A	Xiaojun
B	Dan
C	Rongo
C	Tom
C	Xiaodong
C	John

In this case, each subordinate can enter only once, while the supervisor can input multiple times.

Next, let's take a look at the second rule: unique columns or column sets (primary keys) to identify each row. Looking at the table above, you may suggest using subordinate columns as the primary key. We assume that each subordinate can only have one business rule for the supervisor. the subordinate column is indeed the first choice column for the primary key. However, the data stored in this table makes this column unsuitable for primary keys. Imagine what if we hire two employees, James? How should we store their superior-subordinate relationship with our supervisor in the database? Therefore, it is best to use a truly unique identifier (such as employee ID) as the primary key. The final table we get will be like this:

Supervisor ID	Subordinate ID
200101	200506
200101	200708
200101	200709
200302	200404
200010	200102
200010	200303
200010	200507
200010	200608

2NF)

The second paradigm (2NF) further deepens the concept of removing duplicate data. the main requirements include:

Meeting all requirements of the first paradigm

Remove the data subsets that a table applies to multiple rows and separate them into independent tables.

Use a foreign key to create a join relationship between these newly created tables and the original table.

These rules can be summarized as follows: The second paradigm extracts redundant data, places it in the new table, and establishes a join relationship for the new table to reduce the redundant data volume. Let's look at an example. assume that an online store stores all the customer data in a database. it may be a single table named customer and contains the following attributes: the customer ID, name, address, city, province, and zip code are shown in the following table:

Customer ID	Name	Address	Province	City	Zip Code
01	James	No. 12, rural Forest Road	Guangdong	Guangzhou	510000
02	Tom	No. 14 Beijing Road	Guangdong	Guangzhou	510000
03	Xiaojun	No. 4 Zhongshan Road	Zhejiang	Hangzhou	310000
04	Xiaolan	No. 2 Binjiang Road	Sichuan	Chengdu	610000

A glance at the table above will reveal a small amount of redundant data. Datasets such as Guangdong, Guangzhou, and 510000 may have been input several times. From the simple example table above, it seems that there are not many redundant data, but if the table has thousands of rows of data, it will waste a lot of space. Furthermore, assume that the zip code in Guangzhou has changed (although this is unlikely, the relevant information may change in other types of tables ), we need to modify many aspects of the entire database. In the database structure following the second paradigm, this redundant information will be extracted and stored in another independent Table. We call this new table a zip table, which is shown as follows:

Zip Code	Province	City
510000	Guangdong	Guangzhou
310000	Zhejiang	Hangzhou
610000	Sichuan	Chengdu

If you want to improve efficiency, you can even go to the post office to collect information about city provinces and zip codes, and fill out the form in advance. This type of database may be very useful when placing an order. when an operator places an order for you, he may first ask for your zip code and then know the province of your city. This setting helps reduce operator errors and improve efficiency.

Now we have removed duplicate data from the customer table to meet the first rule of the second paradigm. Next, we need to use a foreign key to associate the two tables. In this example, we use the zip code (the primary key in the ZIP code table) as the foreign key to create a join. The following is a new customer table:

Customer ID	Name	Address	Zip Code
01	James	No. 12, rural Forest Road	510000
02	Tom	No. 14 Beijing Road	510000
03	Xiaojun	No. 4 Zhongshan Road	310000
04	Xiaolan	No. 2 Binjiang Road	610000

In this way, we can minimize the redundant information stored in the database and make our table structure meet the requirements of the second paradigm.

3NF)

Databases that comply with the third paradigm must meet the following two basic requirements:

First, we must meet the requirements of the first and second paradigms.

Delete all columns that are not fully dependent on the primary key.

Assume that we have a commodity order table that contains the following attributes: order number, customer number, unit price, quantity, and total price. See the following table:

Order No.	Customer ID	Unit Price	Quantity	Total price
A1001	001	20	30	600
A1002	002	18	50	900
A1003	003	30	5	150
A1004	001	15	100	1500
A1005	004	20	40	800

Remember, our first requirement is that this table must meet the requirements of the first and second paradigms. Check whether there are repeated columns? Good. no. Is there any feasible primary key? Yes. The order number can be used as the primary key. In this way, we have met the requirements of the first paradigm. Is there any subset of data applied to multiple rows? No. This satisfies the requirements of the second paradigm.

Now, check whether all columns depend entirely on the primary key rather than other attributes in the table? The customer ID changes with the order number and does not seem to depend on other attributes. What about the unit price? If a standard price is set for each customer, this attribute may vary depending on the customer id attribute. However, looking at the data in the table above, we can find that even for the same customer, the unit price may be different (see the customer numbered 001 ). Therefore, the unit price is completely dependent on the order number. The quantity of commodities does not need to be changed with different orders, so there is no problem with the quantity attribute.

What about the total price? The problem lies here. The total price is multiplied by the unit price and the number of items, so it is not entirely dependent on the primary key. Therefore, in order to meet the requirements of the third paradigm, we must remove the total price attribute from this table. This table only uses the following attributes: order number, customer number, unit price, and quantity, and meets the requirements of the third paradigm. However, you may ask, what should we do with the total price? At this time, a derivative attribute should not be stored in the database. We can easily calculate the total price while performing database queries. For exampleSummaryBefore the attribute, we may use the following query statement to obtain the order number and total price information:

SELECT order number, total price

FROM commodity order

After the total price attribute is deleted, we can use the same method to obtain the same results without violating the paradigm rules. we only need to make a small change to the query statement:

SELECT order number, unit price * quantity AS total price

FROM commodity order

It is quite difficult to design the database to conform to the fourth and fifth paradigms. in particular, the fifth paradigm is difficult to implement, and forcible implementation may damage the database integrity, so it is seldom considered. Database standardization and the paradigm that needs to be followed when designing a database. The basic theory of real-time database design is based on any database.Tutorial. However, it is often difficult for many people to follow the above paradigm when they enter the practical operation stage of database design. I hope that when designing databases, we 'd better seriously consider whether we have followed these paradigm rules, after all, reasonable database structure, elimination of redundant data, and data consistency are harmless to the improvement of database management and query performance.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More