Database normalization Techniques (MS)

Source: Internet
Author: User
Tags define contains one table crosstab
Specification | standardization | skill | data | Normalization skills of database Database

Luke Chung
President of FMS
September 2002
Apply to:

Microsoft®access

Summary: This article provides developers with the skills to avoid certain problems when designing an Access table. This article applies to Microsoft Access databases (. mdb) and Microsoft Access projects (. adp).

Directory

Brief introduction
Understand your data
What kind of data do you need?
What do you intend to do with this data?
How do data relate to each other?
What happens to the data over time?
Learn how to use queries
Database Normalization Concepts
Store unique information in one place
The records are free, and the new fields are very expensive.
Learn when to copy data
Use a field with no exact meaning as the primary key field
Using referential integrity
Summary

Brief introduction
When designing a database, the most important step is to ensure that the data is correctly distributed to the tables in your database. With the right data structure, you can greatly simplify other content in your application (queries, forms, reports, code, and so on). The formal name for proper table design is "database normalization".

This article briefly introduces the basic concepts of database normalization and some common problems that need to be noticed and avoided.

Understand your data
Before you design a table, you should be clear about how you intend to work with the data, and how the data will change over time. The assumptions you make will affect the final design.

What kind of data do you need?
When designing an application, it is critical to understand the end result of the design to ensure that you have all the necessary data ready and know its source. For example, the appearance of the report, the source of each data, and whether all the data you want exists. The biggest loss to the project is the lack of data at the end of the project to find important reports.

Once you know what kind of data you need, you must determine the source of the data. Is the data imported from another data source? Does the data need to be cleaned or validated? Does the user need to enter data?

Defining the type and source of data you need is the first step in database design.

What do you intend to do with this data?
Does the user need to edit this data? If so, how should the data be displayed for the user to understand and edit? Are there validation rules and related lookup tables? Are there any audit issues associated with the need to edit and delete data entry for the reserved backup? What summary information do you want to display for the user? Do you need to generate an export file? Once you know this information, you can imagine how the fields are interconnected.

How do data relate to each other?
Group data into related fields (such as customer-related information, invoice-related information, and so on), and each field group represents the table to be established. Then consider how these tables are related to each other. For example, which tables have a one-to-many relationship (for example, a customer might hold multiple invoices)? Which tables have a one-to-one relationship (in this case, you'll typically consider grouping them into one table)?

What happens to the data over time?
After you design a table, you will often have serious problems after you have not considered the impact of time. Many table designs work very well at that time, but they often crash as users modify data, add data, and over time. Developers often find it necessary to redesign the structure of the table to accommodate these changes. When a table's structure changes, all related content (queries, forms, reports, code, and so on) must also be updated. Understand and predict what the data will change over time, and you can achieve better design and reduce the occurrence of problems.

Learn how to use queries
It is also important to know how to analyze and manage data. You should understand how the query works, how to use queries to link data across multiple tables, how to group and summarize data using queries, and how to use crosstab queries when you don't need to display data in canonical format.

The ultimate goal of good data design is to balance two needs: to efficiently store data over time, and to easily retrieve and analyze data. Understanding the functionality of a query can be helpful in designing a table correctly.

Database Normalization Concepts
This section introduces the basic concepts involved in database normalization, rather than the theoretical discussion of database normalization. How you apply these concepts to your actual situation may vary depending on your application needs. The purpose of this section is to understand these basic concepts, apply them to actual needs, and understand the problems that may arise from deviating from these concepts.

Store unique information in one place
Most database developers understand the basic concepts of database normalization. Ideally, you would want to store the same data in the same place and use an ID to refer to it when you need it. Therefore, if some information changes, you can make changes in one place, and the corresponding information in the entire program changes.

For example, a customer table stores records for each customer, including name, address, phone number, e-mail address, and other feature information. The Customer table may contain a unique CustomerID field (usually the Autonumber field), which is the primary key field of the table that other tables use to refer to the customer. Therefore, the invoice table can reference only the customer's ID value instead of storing all of the customer's information in each invoice (because the same customer may hold multiple invoices), so that the customer's ID value can be used to find the customer's details from the Customer table. You can easily do this by using powerful forms in Access, which use combo boxes and subforms. If you need to modify customer information (for example, a new phone number), simply modify it in the Customer table, and any other part of the application that references that information will automatically be updated.

With a properly normalized database, you can easily handle changes in data over time with simple editing. With improperly normalized databases, you typically need to use programming or queries to change multiple records or multiple tables. This not only increases the workload, but also increases the likelihood of data inconsistency due to improper execution of code or queries.

The records are free, and the new fields are very expensive.
The ideal database should simply add new records over time, and the database tables should be able to keep a large number of records. However, if you find that you need to add more fields, you may encounter design problems.

Spreadsheet experts often encounter these problems because they are accustomed to designing a database in the form of a spreadsheet. Designing fields that often change over time (for example, year, quarter, product, and salesperson) will need to add new fields in the future. The correct design should be to transform the information and place the data over time in a field so that more records can be added. For example, simply create the Year field and then enter the year values for each record in the field without creating a separate field for each year.

Adding additional fields can cause problems because changes in the table structure can affect other parts of the application. When you add more fields to a table, the objects and code that depend on the table also need to be updated. For example, the query needs to get extra fields, the form needs to display these fields, and the report needs to include those fields, and so on. However, if the data is normalized, the existing object automatically retrieves the new data and calculates or displays the data correctly. The query is particularly powerful because it allows you to group by the year field to display the summary every year, regardless of which year the table contains.

However, data normalization does not mean that you cannot display or use fields that vary over time or depend on time. Developers who need to browse or display this type of information can often use a crosstab query to achieve this goal. If you are unfamiliar with crosstab queries, you should learn how to use them. Although they differ from tables (especially when users cannot edit the results of crosstab queries), they can indeed be used to display information in a datasheet (up to 255 fields). If you want to use them in a report, it is more complex because the report needs to contain additional or changing field names. This is why most reports display data as separate groupings rather than as separate columns. For those with no choice, you have to take the time to solve the problem. It was to be hoped that all could understand the impact that such decisions would have on other resources as time changed.

This is why adding records is free (which is a huge advantage of the database) and adding fields is so expensive. If the database design is correct, you can adapt to a variety of changes.

Learn when to copy data
Sometimes data needs to be normalized to preserve information that may change over time.

In a simple example of linking an invoice to a Customer table through a customer ID number, we might want to retain the customer's address when the invoice was invoiced (not the address when the invoice was made, because the customer information might change between the two events). If the customer address is not retained at the time the invoice is invoiced, and the customer information must be updated in the future, the exact address of some invoices may not be determined. This can lead to very serious business problems. Of course, some information (such as the customer's phone number) can be saved without saving. Therefore, it is a choice to decide what data needs to be replicated.

Another example of data that needs to be replicated is the itemized item that fills the invoice. Quotations are often used to pick goods that customers order. We can store only quotation IDs, and IDs point to quotations that contain product descriptions, prices, and other details. However, product descriptions and prices will change over time. If you do not copy data from a quote to a schedule, you will not be able to print the original invoice accurately in the future. If you have not received the payment, the problem will be very serious.

Thus, while normalization can preserve the same data well in one place and simplifies editing, in some cases it does not need these advantages. If you later need a snapshot of your data for historical reasons, you must design it from the outset in the database. Otherwise, once the data is overwritten, it cannot be retrieved.

Use a field with no exact meaning as the primary key field
To improve efficiency, each table should have a primary key field. The primary key field defines uniqueness in a table and is used by the index in other fields to improve search performance. For example, a customer table can contain a CustomerID field that defines a unique number for each customer. For discussion purposes, it is assumed that a table contains more than one field, rather than simply a single table lookup (such as a country/region list).

In general, primary key fields should have the following characteristics:

Should contain only one field
You can define multiple fields as primary key fields for a table, but it is best to use one field. First, if you need to use more than one field to define uniqueness, you need to occupy more space to store the primary key. Second, other indexes in the table must also use a combination of primary key fields, which takes up more space than a single field. Finally, you need to get a field combination to identify the record in the table. Using a CustomerID field to define a customer is much better than using a different field combination.
Should be a numeric type
Access provides a AutoNumber field type that is long integer, and is ideal for primary key fields. These values automatically guarantee the uniqueness of each record and also support multiuser data entry.
Will not change over time
Primary key fields should not be changed over time. Once the primary key field is identified, it should never change (like the Social Security number). The changed primary key field will make it difficult to use historical data again because the links in it are corrupted.
There should be no exact meaning.
To make sure that the primary key field does not change over time, it should have no exact meaning. Primary key values that have no exact meaning are also useful when other data is incomplete. For example, you can specify a customer number without the full address of the customer. The rest of the application works well, and you can add information when you retrieve records. If a table uses a Country field or another identity field that you do not have as part of a primary key, it is likely that the application will not be available.
For these reasons, we recommend that you use the AutoNumber field as the primary key field in most tables. By using combo boxes and hidden columns, you can bind a field to a AutoNumber field and hide it so that the user cannot see it.

Using referential integrity
After you define a table and understand how each table is associated, be sure to add referential integrity to consolidate the relationships between the tables. This avoids the erroneous modification of the link field and leaves an orphaned record. The Microsoft Jet database engine supports complex referential integrity, allowing users to cascade updates and deletes. In general, you should not modify the ID field. Therefore, cascading updates are used less, but cascading deletes are useful.

For example, if an invoice table is associated with an order table, one of the invoices may have an unlimited number of orders (detail items), and each order record contains the invoice number that it links to, you can use the Cascade delete operation to delete the invoice record and automatically delete all corresponding order records. This avoids the occurrence of an order record that does not have a corresponding invoice record.

Summary
We hope that you can apply these database design concepts to your application design as soon as possible, thereby minimizing problems and reducing the revisions that are required to implement such designs. Good luck to you.

Luke Chung is the founder and President of FMS Inc. FMS Inc. is an industry-leading Third-party product vendor whose products are designed for Microsoft Access users and developers.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.