Database Normalization three Paradigm application examples

Source: Internet
Author: User
Tags date definition empty insert new set one table sort zip
Specification | normalization | data | database | Why is the application instance normalization important? At present, many databases have not been standardized for a variety of reasons. Some of these reasons are explained in this article and a claim form for an insurance company is normalized with different forms of normalization (normal form). The changes in the table and the additional tables added during this process make the database more efficient, less error, and easier to maintain.

Database normalization is the practice of optimizing the structure of tables and organizing data into tables so that data can be made more explicit. Normalization allows you to change business rules, requirements, and data without having to reconstruct the entire system.

By changing the way that data is stored-just a little bit-and changing the program that accesses that information, you can eliminate many of the opportunities for error or spam data and reduce the amount of effort necessary to update the information.

One of the real problems of the company can be summed up in a word "we usually do it". We generally like to store information in that way, and we generally allow people to write any information to <insert field name>; the way we typically program. This is usually a bad thing, especially for young and studying companies. However, when there is a new system and a better way to complete the task, sometimes the phrase "in that way the task is done well" may need to be explored and revised again. Canonical data is one of the useful ways companies often use.

While using data for COBOL programs, such as those familiar to any COBOL programmer, storing them in a relational database is similar to storing them in a flat file, the method stored in a flat file is not the best way to accomplish a task. Especially because you do not understand the difference between the two or fear change, but simply bring the idea of the past into the present way.

Note: Dictionary.com is so defined to be standardized: "To make it standard, especially to cause it to conform to a certain standard or specification." "or" mandatory acceptance of a certain standard. " Webopedia that normalization is "the process of organizing data to minimize redundancy in relational database design." Normalization typically involves dividing a database into two or more tables and defining relationships between tables. The goal is to isolate the data so that a field can be added, deleted, and modified only in one table, and then passed through the defined relationships to the remaining tables in the database. I like this definition better.

Terms

Before you know about an insurance company in the real world, you need to know some of the terms that are used in the discussion. When working with databases, especially when dealing with canonicalization issues, a new set of keywords mentioned in the following sections are useful:

· Relationship (relation): Essentially, a relationship is a two-dimensional table or array that contains rows and columns.

· Association (Relationship): An association is a method of associating data between different tables. The association exists between the data items that form different entities and the table entity itself, which constitutes the basic core problem of database normalization. There are three basic types of data associations, and it is important to understand them:

One-to-one (1:1): A one-to-one association means that any given instance of each (rather than most) is closely aligned with an instance of another entity. Each person has only one correct fingerprint is unique. Each phone number is accurately aligned with an independent private customer who pays the bill (not the company). Everyone in the United States has only one social security number.

One-to-many (1:m): A one-to-many association means that one instance of a given entity can be associated with 0 instances of another entity, one instance, or multiple instances. Each person may not have a child, have a child or many children. Everyone may not have a car, a car or many cars.
 
Many-to-many (m:n): a Many-to-many Association (0, one, or more instances of a given entity associated with 0, one, or more instances of another entity) is a straightforward simulation of a complex association that is often decomposed into multiple 1:m associations. Because multiple families are mixed together, one or more children may not have parents (orphans), one parent (single parent), more than one parent (two still together or divorced two parents, or divorced and remarried parents). A house or property may be transferred to a person or persons, who (one or more) may have one or more houses or property in their wills.

· Attribute: A property is considered to be a modifiable attribute or feature of certain components in a program or database and can be set to a different value or a relationship or a column in a table.

· Tuple:tuple is a set of values or value attributes that are sorted in a relational or non relational database: a row in a relationship.

· Delete exception: Delete exception refers to the data caused by intentional deletion of other data contradictory or unexpected data (information) loss.

· Insert Exception: Insert exception refers to the inability to add information to the database due to lack or lack of data.

· Update exception: Update exception refers to data inconsistency due to data redundancy or incomplete update of redundant data.

· Decomposition of a relationship: the decomposition of a relationship is the decomposition of a relationship into multiple relationships so that the relationship conforms to a higher paradigm.

· Data redundancy: Data redundancy means that there is no necessary duplication of data in the database.

· Data integrity: Data integrity refers to the consistency of data in a database. It is important to ensure data integrity so that users know that the data they rely on is correct, and that the results of their queries and procedures are accurate and expected.

· Atomic value: An atomic value is a value that is neither a set of values that can be further split nor a repeating group. Each column has a full value, but only one value--the value cannot be decomposed into parts that are either used by the database or accessed by users of the database.

· Referential integrity rules: Referential integrity rules mean that values stored in Non-empty external health must be key data items in a relationship.

· External health: An external health is a set of attributes (one or more columns) in a relationship and is also a primary key in some (same or other) relationship. It is a logical link between relationships. The external health reference to its own relationship is called a recursive external health.

· Functional dependencies: Functional dependencies mean that the value of one property in a row is determined by the value of another property in that row. This usually occurs between the primary key (which makes a unique piece of information for a row) and other information for that row. The combination of city and state relies on zip (postal) code, even though many zip codes in a given state are associated with a city. Every legal person in the United States relies on his social security number.

· Deterministic: function relies on the left attribute to determine the value of other attributes in the row (the ZIP code determines the city and state; The social security number determines the person's identity; the license number and the state determine the owner of the car).

· Entity integrity rules: an entity integrity rule means that the key attribute of a row may be empty (if you have a zip code in a city; If you have a car, you have a license number).

· Constraints: A constraint is a rule that qualifies a value in a database. The telephone number must be numeric, the number of dollars must be a number, the state must be a lawful states or province; the country must be a lawful state; the date cannot be February 31.

Now that you know a lot of relevant terminology, we can look at the meaning of the specification in the relevant terminology. The following example is not a typical employee-manager-department example or student-professor-course-provided sample. I will demonstrate a hypothetical database of insurance companies. The tables in the database are much more complex than those used in this example, but they are similar to what people encounter.

Figure 1 shows the non-standard definition of the claims (claim) table. Although there are many more tables in an insurance company's database, these tables provide us with some background, through which we can see normalization and its branches. Keep in mind that the examples in each chapter have only a partial column, simplifying the example and making it easy for you to see what is changing.

Claim_num, Occurance_num, Claim_status, Accdnt_yr, Accdnt_dt, Reported_dt, Entered_dt, Claim_dt1, CLAIM_DT2, CLAIM_DT3, CLAIM_DT4, Claim_dt4, Claim_dt5, Claim_dt6, Claim_dt7, Claim_dt8, Claim_dt9, claim_dt10, Closed_dt, DEATH_DT, ASSIG Ned_dt, ADJSTER_CD, Adjuster_name, AGENT_CD, AWARD_CD, CAUSE_CD, Cause_desc, LOCATION, SITE, COVERAGE_CD, Coverage_ DESC, Ded_recov, Deductible_remain, Paid_1, Reserved_1, Paid_2, Reserved_2, Paid_3, Reserved_3, Paid_4, RESERVED_4 , Paid_5, Reserved_5, Paid_6, Reserved_6, paid_7, reserved_7, Paid_8, Reserved_8, Paid_9, Reserved_9, PAID_10, Reserved_10, LEGAL_FLG, KEY1, KEY2, KEY3, KEY4, KEY5, KEY6, KEY7, KEY8, KEY9, KEY10, SEVERITY_CD, Policy_num, Payment_n UM, SSN, State, Actvy_dt, Entry_dt, ADMIN_CD, Admin_desc, Reopen_dt, Insured_name, insured_address, Insured_phone, INSURE D_city, Insured_state, Insured_zip, Claimant_name, claimant_address, claimant_city, Claimant_state, CLAIMANT_ZIP, Claimant_phone, Special_dt_1, SPECial_dt_2, Special_dt_3, Special_dt_4, Special_dt_5, Special_dt_6, special_dt_7, Special_dt_8, SPECIAL_DT_9, SPECIAL_D T_10, GROSS_PD, policy_id
Figure 1: Columns of the not-normalized claim table


First Paradigm (1NF)

Converting a database or database table to the first paradigm is generally fairly straightforward. The first paradigm requires the elimination of duplicate groups in the data, which is achieved by establishing a separate table of related data. It determines the table to complete the first normal form by observing the data and the structure of the table.

The first paradigm is to eliminate repeating groups by placing them in separate tables and linking them through a one-to-many association.

There are no duplicate attributes and a set of values that are not duplicated-that sounds simple enough. However, there are times when there are no other options to convince people that it's difficult to simply add any other set to the design, but that's what you do.

If we want the claims to be expressed in the first paradigm, we need to find all the attributes that are really associated with a claim. What exactly constitutes a claim?

· Claims must be numbered.

· Claims must be made by the person making the claim.

· Claims must have a reporting date.

· Claim to have an accident or illness date.

· Claims must have the amount of an item retained as a result of an accident or illness.

· Claims are or are written according to a policy.

· Claims can end.

· Claims can start again.

· Is there some sort of coverage for claims? Or some sort of strategy with more things?

· Is there a cause for the claim? Or is there a cause of accident or illness?

· Did you pay the claim? Or did you pay the invoice?

· Does the claim have a social security number? Or sometimes a social security number belongs to the person making the request?

· The date of death is an interesting part. Are the claims dead? No, but if it's life insurance, it may be related to claims, so it should be kept.

Modify the columns that are directly related to claims, and the results are as shown in Figure 2:

Claim_num, Claim_status, Accident_yr, Accident_dt, Reported_dt, Entered_dt, Closed_dt, Death_dt, ASSIGNED_DT, ADJSTER_ CD, Adjuster_name, AGENT_CD, Agent_name, AWARD_CD, Award_desc, Payment_num, LOCATION, SITE, Deductible_recover, Deductible_remain, Policy_no, Policy_description, State, Run_dt, Activity_dt, Entry_dt, Reopen_dt, INSURED_NAME, Insured_address, Insured_phone, insured_city, Insured_state, Insured_zip, Claimant_name, CLAIMANT_ADDRESS, CLAIMANT_ City, Claimant_state, Claimant_zip, Claimant_phone, GROSS_PD
Figure 2: Claim form for the first paradigm

Claims table revisions that conform to the first paradigm will contain information that is only relevant to claims, and does not include payments or invoices, policies, or incidents.

Payment_numclaim_statusaccident_dtaccident_yrreported_dt entered_dt123456789 Open 20-JUN-2000200028-JUN-2000 29- jun-2000234567890reviewed 15-feb-1984 1984 19-feb-1984 20-feb-1984147258369reopened 08-APR-2003 2003 10-APR-2003 11- apr-2003258369147closed 18-dec-19801980 18-dec-1980 19-dec-1980
If you have a payment form and you store a number of reserved claims to pay for other different bills, why not store them in the payment form? In short, you store some information in the payment form, so why not put it in it and not put it in the claim form?

If the only reason that this information is placed in a claim form is that a user may need this information when claiming a claim, the claim form and the payment table can be connected (join) and the information can come from all the payment totals that occur for a single claim. And since you have different types of insurance policies (so there are different types of claims), why not store payment information for all types of claims in a single table? It is logical to store all payment information in the same table. Most of the information associated with some kind of payment (attribute) is the same, regardless of the type of payment or type of claim. However, the account information for different types of claims is somewhat different.

Second Normal form (2NF)

The second paradigm deals with the deletion of redundant data. When the information in a table depends on other columns in the table that are not part of the primary key, the second normal form is usually violated.

If the list of claims for the new first paradigm is listed below, then the redundant data that can be quickly and easily seen is the city and state of the insured city and state and the person who made the claim. Cities and states are directly dependent on zip code, and do not rely on anything related to claims.

Claim_num, Claim_status, Accident_yr, Accident_dt, Reported_dt, Entered_dt, Closed_dt, Death_dt, ASSIGNED_DT, ADJSTER_ CD, Adjuster_name, AGENT_CD, Agent_name, AWARD_CD, Award_desc, LOCATION, SITE, Deductible_recover, Deductible_remain, Policy_no, Policy_description, State, Run_dt, Activity_dt, Entry_dt, Reopen_dt, Insured_name, insured_address, INSURED_ PHONE, Insured_city, Insured_state, Insured_zip, Claimant_name, claimant_address, claimant_city, CLAIMANT_STATE, Claimant_zip
Fig. 3. Claim of the second normal form

Claim_numclaimant_name Claimant_addressclaimant_cityclaimant_stateclaimant_zip123456789jennifer Smith1234 MAINPITTSBURGHPA 15201234567890Bill Smith7852 Eagle pittsburghpa 15202147258369 John Jones4562 edgeeighty Four PA 1533025 8369147 Eleanor Stillwater7531 West EASTERNSOMERSETPA 15510
zip_codecitystate15330 Eighty fourpa15510somersetpa15201pittsburghpa15202 PittsburghPA15203 Pittsburgh Pa15204pittsburgh PA15205 Pittsburghpa15206pittsburgh Pa15207pittsburgh Pa15208pittsburgh PA15209 Pa15210pittsburgh PA
Because Pittsburgh, eighty Fou and Somerset, PA are not dependent on claims, but rely on the information-related ZIP code, it is not directly attributable to the payment table. Although this is not the only problem with this table, it eliminates the difficulties associated with city, state, and zip code dependencies.

Claim_numclaimant_nameclaimant_address claimant_zip123456789jennifer Smith 1234 Main 15201234567890Bill Smith7852 Eagle15202147258369 John Jones 4562 Edge15330258369147 Eleanor Stillwater7531 West Eastern 15510
Other information that can be migrated to other tables so that the claim form conforms to the second normal form includes the compensation number and the compensation description combination, which is only required to store the compensation number in the claim form. When this method is used, any updates to the description for a given number require changes that can change a column of a row in the compensation table, and this does not occur, but updates may occur if you update a column in a table that affects hundreds of entities. The same logic can be applied to moderators and agents, and their information is migrated to their own tables, with only the value of numbered columns stored in the claims table, which makes it easy to access the auxiliary information through the connection.

Award_cdaward_desc
Adjuster_cdadjuster_name
AGENT_CD Agent_name
Third Paradigm (3NF)

Third normal form rules are found to eliminate properties that do not directly depend on the primary key of the table formed by the first and second normal forms. We have created a new table for all the information that is not associated with the primary key of the table. Each new table holds information from the source table and the primary keys on which they depend.

Note: The third normal form is usually said to be "key, all keys, there is no information except the key".

Claim_num, Claim_status, Accident_dt, Reported_dt, Entered_dt, Closed_dt, Death_dt, Assigned_dt, ADJSTER_CD, AGENT_CD, AWARD_CD, LOCATION, SITE, Deductible_recover, Deductible_remain, Policy_no, State, Run_dt, Activity_dt, Entry_dt, REOPEN _dt, Insured_name, insured_address, Insured_phone, Insured_zip, Claimant_name, claimant_address, CLAIMANT_ZIP
Figure 4: Claim form for third normal form

In the third paradigm you can see more changes in the claim table, where the insured's name, address, telephone number, zip code are more dependent on the signed strategy than on the claims themselves. Therefore, we can put the information of the insured into the policy table. This makes the rest of the claim table more directly relevant to claims, putting all other information into its own table and guaranteeing enough (no missing) information. A simple connection to these tables can reconstruct the source table's information, which is the goal of relational algebra and relational computing (relational theory and relational database dependencies).

Policy_noinsured_nameinsured_address Insured_phoneinsured_zip
The third paradigm is usually the highest level of normalization that people can get, and it is generally the highest level of standardized and standardized data in practice. But there are more paradigms. The higher the hierarchy, the greater the difficulty of using simple steps to complete, and it is getting closer to the theory.

Canonicalization or no normalization--and the canonicalization extensions you can use--are usually the result of a combination of related people. If there is enough important need to store a piece of information in a certain location, and it does not necessarily conform to the definition of a certain paradigm, such storage should be respected. In addition, the results of normalization need to be based on the use of tables and databases. Typically, in a decision support system or data warehouse, because of the image of a data warehouse's time variable component, we are strongly anxious to get extreme information that is not normalized (especially in the fact table).

In a team-oriented environment, these decisions are departmental decisions (or common guidance decisions). Hopefully these guidelines will help you understand the norm and help you make more informed decisions.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.