Why database design?
Here we have two questions:
Is it necessary to design a Mouch? Do I need to design a building?
The conclusion is: when the database is complex (such as large data volumes, large tables, and complex business relationships), we need to design the database first;
Because good database design can:
Q. Saving data storage space
Q can ensure data integrity
Q: Facilitate the development of database application systems
Poor Database Design:
Q. data redundancy and storage space waste
Q: memory space waste
Q Data Update and insertion exceptions
Software Project development cycle
Let's take a look at the development cycle of the software project:
• Demand analysis stage: analyzes customers' business and data processing needs;
• Outline Design Phase: design the E-R model diagram of the database to confirm that the requirement information is correct and complete;
• Detailed design phase: the E-R diagram is converted into multiple tables for logical design, and the three paradigm of Database Design for review;
• Code writing stage: select a specific database for physical implementation, and write code to implement front-end applications;
• Software testing stage :......
• Installation and deployment :......
Design Database
• In the requirement analysis stage, the general steps for designing a database are as follows:
-Collect information
-ID object
-Identifies the attributes of each object.
-Identify the relationship between objects
• The steps for designing a database are as follows:
-Drawing a E-R Diagram
-Convert a E-R diagram to a table
-Apply the three paradigm normalization tables
Next we will take the database design of a BBS simple forum as an example to look at the steps for designing a database:
• Collect information:
Communicate with the relevant personnel of the system, and fully understand the tasks to be completed by the database.
Basic functions of BBS Forum:
L user registration and logon. The background database must store user registration information and online status information;
L The user posts, and the background database needs to store information related to the posts, such as the posts content and titles;
L Forum Management: the background database must store information about each forum, such as the Moderator, Forum name, and number of posts;
• Identify object (entity-entity)
Identifies key objects or entities to be managed by the database
Entities are generally nouns:
L User: common forum users and moderators of Various Forum sections.
L user-published post
L user post (reply)
L Forum: Forum information
• Attribute of each object)
• Relationship between objects)
L follow-up has a master-slave relationship with the main post: We need to indicate in the follow-up object who it is;
L Forum has a relationship with users: users can find the corresponding moderator users based on Forum objects;
L there is a master-slave relationship between the main post and the Forum: You need to indicate which forum the post belongs;
L follow-up has a master-slave relationship with the Forum: You need to indicate which forum the follow-up belongs;
• Drawing E-R Diagrams
• Convert E-R diagrams into tables
• Convert each object to a corresponding table and convert each attribute to the corresponding column of each table
• To identify the primary key columns of each table, note that the ID number column is added to a table without a primary key, which has no actual meaning and is used as the primary key or foreign key, for example, the "uid" column in the User table, the "Sid" column in the forum table, and the "TID" column in the posting and following Tables
• Create a primary foreign key between tables to reflect the ing between objects
Here we can use Microsoft's word or Visio and Sybase's powerdesigner to draw an erdiagram. It is mainly used to communicate with the customer and modify the diagram repeatedly until the customer confirms. The customer confirms and then converts the E-R diagram to a table. We have done this job well. The next step is the last step: Apply the three major paradigms to review and standardize the structure of multiple tables.
Data Standardization
• Only good RDBMS is not enough to avoid data redundancy. A table structure must be created in the database design. After the table is designed, it is likely that the structure is unreasonable and data is retained. data redundancy is short for data redundancy. This causes a lot of problems in addition, deletion, modification, and query of data. Therefore, we need to check whether the structure is reasonable, just as after the construction drawing design, other organizations need to review whether the drawings are properly designed.
• How to review it? Some theoretical guidelines on database design are required. These rules are short for the database paradigm in the industry. Dr E. F. codd initially defines three levels of normalization. The paradigm is a table structure with minimal redundancy. These paradigms are:
-1st NF-first normal fromate)
-2nd NF-second normal fromate)
-Third Paradigm (3rd NF-third normal fromate)
• If each column is a minimum data unit that cannot be further divided (also known as the smallest atomic unit), the first paradigm (1nf) is satisfied ). The goal of the first paradigm is to ensure the atomicity of each column.
• If a relation satisfies 1nf and all columns other than the primary key depend on this primary key, the second Paradigm (2nf) is satisfied ). The second paradigm requires that each table only describes one thing, so that each column in the table is related to the primary key.
• If a link satisfies 2nf, and other columns except the primary key are not passed dependent on the primary key column, the third paradigm (3nf) is satisfied ). The third paradigm ensures that each column is directly related to the primary key column, rather than indirectly related.
Let's look at an image example! Suppose a construction company wants to design a database. The company's business rules are described as follows:
• The company undertakes multiple engineering projects, including the project number, project name, and construction personnel.
• The company has multiple employees, each of which includes employee ID, name, gender, position (engineer, Technician), etc.
• The company pays wages based on the working hours and hourly wage rate, which is determined by the employee's position (for example, the hourly wage rate of technicians is different from that of Engineers)
• The company regularly develops a salary report, as shown in-1.
Project No. |
Project name |
Employee ID |
Name |
Title |
Hourly wage rate |
Working hours |
Pay-as-you-go |
A1 |
Garden Tower |
1001 |
Qi Guangming |
Engineer |
65 |
13 |
845.00 |
1002 |
Li Siqi |
Technician |
60 |
16 |
960.00 |
1004 |
GE Yuhong |
Attorney |
60 |
19 |
1140.00 |
|
|
|
Subtotal |
|
|
|
2945.00 |
A2 |
Overpass |
1001 |
Qi Guangming |
Engineer |
65 |
15 |
975.00 |
1003 |
Ju Mingliang |
Workers |
55 |
17 |
935.00 |
|
|
|
Subtotal |
|
|
|
1910.00 |
A3 |
Linjiang Hotel |
1002 |
Li Siqi |
Technician |
60 |
18 |
1080.00 |
1004 |
GE Yuhong |
Attorney |
60 |
14 |
840.00 |
|
|
|
Subtotal |
|
|
|
1920.00 |
Figure-1 salary report printed by a company
Project No. |
Project name |
Employee ID |
Name |
Title |
Hourly wage rate |
Working hours |
A1 |
Garden Tower |
1001 |
Qi Guangming |
Engineer |
65 |
13 |
A1 |
Garden Tower |
1002 |
Li Siqi |
Technician |
60 |
16 |
A1 |
Garden Tower |
1004 |
GE Yuhong |
Attorney |
60 |
19 |
A2 |
Overpass |
1001 |
Qi Guangming |
Engineer |
65 |
15 |
A2 |
Overpass |
1003 |
Ju Mingliang |
Workers |
55 |
17 |
A3 |
Linjiang Hotel |
1002 |
Li Siqi |
Technician |
60 |
18 |
A3 |
Linjiang Hotel |
1004 |
GE Yuhong |
Attorney |
60 |
14 |
Figure-2 project man-hours table of a Company
As you can see, the tables designed above have many problems:
1. The table contains a large amount of redundancy, which may cause data exceptions:
• Update exception
For example, to change the duty of employee ID = 1001, you must modify the row of employee ID = 1001.
• Addition exception
To add a new employee, assign a project to the employee first. Or, to add data for a new employee, assign a virtual project to the employee. (Because the primary keyword cannot be blank)
• Deletion exception
For example, if an employee No. 1001 wants to resign, all data lines with employee No. = 1001 must be deleted. Such a delete operation may lose other useful data.
2. This method is used to design the table structure. Although it is easy to generate a salary report, a large amount of data must be repeatedly input every time an employee allocates a project. This type of Repeated input operation may cause data inconsistency.
We use the second paradigm to standardize:
Let's use the third paradigm to standardize it. Is it much clearer ?!
Relationship between standardization and Performance
• Database performance is more important than standardized databases to meet certain business goals
-Add additional fields to a given table to greatly reduce the time required to search information from the table.
-Insert a calculated column (such as the total score) into a given table to facilitate Query
• While standardizing, you also need to consider the database performance comprehensively. The three major database paradigms and database performance are sometimes in conflict.
For example, we all know that environmental protection is very important. The West always challenges environmental protection and China, saying that China is desperate for environmental protection and ecological nature. However, China's current economic strength is not strong enough. If people are not full enough, what is the use of environmental protection? Therefore, we can only focus on environmental protection while maintaining regional economic development. This is a typical compromise. This is also the case in this example: to meet the three paradigms, we will split the more detailed tables when standardizing the tables. However, the Customer prefers the integrated information. To meet the customer's needs, we need to restore these tables to the customer's favorite integrated data through connection queries. This greatly affects the database query performance compared to reading data from a table. Therefore, for the sake of performance, it is necessary to make a proper compromise and sacrifice the requirements of standardization to improve the performance of the database. Another example: adding a column in the score table-"total score" is data redundancy, because the total score can be obtained by each score during query. However, if you frequently query the total score and want to save it, you can simply add the total score column in the table.