Relational Database: Normalization

Source: Internet
Author: User

In the design phase, Data normalization may take more time than any other task. The more data, the longer the process takes. Based on past experience, you may find that the most difficult thing to do is to meet all the requirements of 1nf, because when you move duplicate values to another table, improper dependencies are often eliminated.
  
After completing the most difficult part, you may choose to stop after 1nf, but do not. Please continue to normalize the data through the second Paradigm (2nf), the third paradigm (3nf), or even the Boyce-codd Paradigm (bcnf ). In this way, we can find those data elements that are dependent, otherwise they will slide quietly during the design process and cause problems in the future. It is best to discover these problems during the design period-do not wait until the user finds that his or her work cannot be completed, or when you begin to lose money.
  
Introduction to articles
You are reading an article in The builder.com Relational Database Design Series. If you miss the first three articles, read them first:
Relational Database: inspiration behind Theory
Relational Database: Using paradigm to create a database
Relational Database: application first paradigm
  
In the previous article in this series, we have started from a table and processed it to meet the 1nf requirements. The table is eventually changed to four tables. Now, we can apply 2nf, 3nf, and bcnf to complete the normalization process.
  
Continue unfinished work
Our demo database will be used to store books-related data after completion. This is a very simple purpose, so we only need a simple database. We now have four tables, all of which are normalized to 1nf (Remember, the key field is represented by an asterisk ):
  
Books: {* Title, * ISBN, price}
Authors: {* firstname, * lastname, * state, * zip}
Categories: {* category, description}
Publishers: {* publisher}
Apply 2nf
  
  
  
To meet 2nf requirements, the table must first be normalized to 1nf, that is, there are no multi-value items and no repeated groups. Each field can only contain Atomic values, each table must contain a key. So far, it seems that all tables meet this requirement. The second requirement of 2nf is that all fields (usually called "attributes" in the design stage) must depend on the primary key and can only rely on the primary key. For the moment, it seems that all attributes meet the 2nf requirement and no further action is required.
  
On the other hand, it is assumed that the books table also stores a large number of attributes used to describe the borrower. Some attributes do not violate 2nf requirements, such as an lent date (borrow date) attribute in the books table. However, other data (such as the name and address of the borrower) violates 2nf because the borrowing-related information cannot fully support or describe the book itself.
  
Application 3nf
  
After 2nf normalization is completed for a table, you can check whether it violates 3nf. 3nf requires that all fields are independent of each other. Any field dependent on one non-Keyword segment must be transferred to another table. To find out where 3nf is violated, the simplest way is to modify the value of each attribute to see if it immediately invalidates the data contained in other attributes. Although this simple test cannot find all the places that violate 3nf, it is a good start.
  
The authors table may violate 3nf: if the state value is changed, the zip file may be updated at the same time, and vice versa. For example, if the author moves to another State, both values must be modified. To avoid this form of dependency, you need to transfer the state attribute to a new table, as shown below:
  
Authors: {* firstname, * lastname, * zip}
States: {* State}
  
The result of the above modification is that each author has a zip value, and the central score may be repeated. However, in the States table, each State occupies only one record. If an author moves to another state, you only need to associate the record with a different state even though you need to update the zip value. If it is a new state, you may need to enter a new state value, but at least the state value will not be repeated.
  
For the moment, it seems that you are creating a lookup table ). In the future, these tables will interact with each other through their primary keys and foreign key values. However, it may be difficult to operate according to the above logic before the formal establishment of the contact. However, do not worry if you do not know the current situation. Currently, you only need to focus on the rules.
  
  
Now there are five tables, all of which are normalized to 3nf:
  
Books: {* Title, * ISBN, price}
Authors: {* firstname, * lastname, * zip}
Categories: {* category, description}
Publishers: {* publisher}
States: {* State}
  
Some people may have questions about this, because one attribute does not seem to be taken into account, that is, the zip in the authors table. As mentioned above, the zip value may be repeated. In this simple application, it seems acceptable to leave the ZIP file in the authors table; in any case, the database should be able to run efficiently. However, this table is not fully normalized, so we try to transfer the ZIP file to a new table. After moving the ZIP file, We have 6 tables:
  
Books: {* Title, * ISBN, price}
Authors: {* firstname, * lastname}
Zipcodes: {* zip}
Categories: {* category, description}
Publishers: {* publisher}
States: {* State}
  
Normalization may not guarantee efficiency
  
Not every table must be fully normalized before efficiency can be achieved.



In other words, if you find that you can make the database more efficient, you can perform reverse normalization on a table.
  
Use bcnf
  
Bcnf is essentially a sub-rule of 3nf. Many developers think that it is completely unnecessary, so it will stop regularization after 3nf. Some people think that if this rule is enforced, performance will be reduced. However, for systems that are usually very powerful at present, performance may not be a big problem unless you try to manipulate millions of records. Of course, you do not have to include bcnf. You must weigh whether the impact on performance after bcnf normalization is worth the flexibility advantage of a fully normalized database.
  
Bcnf requires that no attribute can depend on a non-Keyword field. For the moment, our tables seem to be able to meet this requirement. Therefore, we can add a city attribute to the States table to make the situation more complex:
  
States: {* city, * State}
  
Each city and state record is unique and related to a zip value. However, the state value shows a duplicate group because each State may have multiple cities. The solution is to transfer the city attribute to its own table, as shown below:
  
Cities: {* city}
States: {* State}
  
Although the city and state attributes are an inappropriate dependency, the cities table needs to be created only because duplicate values exist. This problem can be captured when 1nf is enforced, but only bcnf can be forced to completely capture errors that cause repeated values. In general, although a dependency problem will confuse your eyes for the moment, the dependency on non-Keyword fields can be completely eliminated only after bcnf is forced.
  
After the table originally created through bcnf is normalized, the total number of tables is increased to seven:
  
Books: {* Title, * ISBN, price}
Authors: {* firstname, * lastname}
Zipcodes: {* zip}
Categories: {* category, description}
Publishers: {* publisher}
States: {* State}
Cities: {* city}
  
Although the number of tables is increasing rapidly, do not worry. In fact, we are not finished yet. In the next article in this series, more tables may even appear. At that time, we will discuss primary key and foreign key fields and explain how to use them to establish relationships between multiple tables.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.