Table Structure Design Skills in Database Design

Source: Internet
Author: User

1. Data Tables with tree relationships

Many programmers have encountered tree-based data during database design. For example, a common category table, that is, a large category, contains several sub-classes, some sub-classes have sub-classes. When the category is not determined, you want to add a new subclass under any category, or delete a category and all its subclasses, and the number of them is expected to increase gradually in the future, in this case, we will consider using a data table to save the data. According to the textbooks, the second type of programmers will probably design a data table structure similar to this:

Category Table _ 1 (type_table_1)

Name Type Constraints Description
Type_id Int No duplicates Category id, primary key
Type_name Char (50) Cannot be blank Type name, which cannot be repeated
Type_father Int Cannot be blank The parent category ID of this class. If it is a top node, it is set to a unique value.

This design is short and concise, fully satisfying 3nf, and can satisfy all user requirements. Is that all right? The answer is no! Why?

Let's estimate how the user wants to list the data of this table. For the user, he certainly expects to list all the categories at a time based on the hierarchy set by him, for example:
Total category
Category 1
CATEGORY 1.1
CATEGORY 1.1.1
CATEGORY 1.2
Category 2
CATEGORY 2.1
Category 3
CATEGORY 3.1
CATEGORY 3.2
......

Let's see how many times the above table needs to be searched to display such a list (first-order tree traversal? Note that although category 1.1.1 may be a record added after category 3.2, the answer is still n times. This efficiency does not affect a small amount of data, but after the data type is expanded to dozens or even hundreds of records in the future, the table will be retrieved dozens of times for a single column, the efficiency of the entire program is not flattering. Maybe the second type of programmers will say, then I will create a temporary array or temporary table to save the first order traversal results of the type table, so that the results will be retrieved dozens of times at the first run, when listing all the type relationships again, you can directly read the temporary array or temporary table. In fact, you don't need to allocate a new memory to save the data. You just need to expand the data table and then constrain the number of types to be added, you only need to retrieve the list once. The expanded data table structure is as follows:

Category Table _ 2 (type_table_2)

Name Type Constraints Description
Type_id Int No duplicates Category id, primary key
Type_name Char (50) Cannot be blank Type name, which cannot be repeated
Type_father Int Cannot be blank The parent category ID of this class. If it is a top node, it is set to a unique value.
Type_layer Char (6) Three Layers limited. The initial value is 000000. Class first traversal, mainly to reduce the number of times the database is retrieved

According to this table structure, let's take a look at the data recorded in the table in the above example:

Type_id type_name type_father type_layer
1. Total Category 0 000000
2 Category 1 1 010000
3 category 1.1 2 010100
4 category 1.2 2 010200
5 Category 2 1 020000
6 category 2.1 5 020100
7 category 3 1 030000
8 category 3.1 7 030100
9 category 3.2 7 030200
10 category 1.1.1 3 010101
......

Search by type_layer: Select * From type_table_2 order by type_layer

The record set is listed as follows:

Type_id type_name type_father type_layer
1. Total Category 0 000000
2 Category 1 1 010000
3 category 1.1 2 010100
10 category 1.1.1 3 010101
4 category 1.2 2 010200
5 Category 2 1 020000
6 category 2.1 5 020100
7 category 3 1 030000
8 category 3.1 7 030100
9 category 3.2 7 030200
......

The record order is exactly the result of first-order traversal. When controlling the hierarchy of display classes, you only need to judge the values in the type_layer field, each of which has two digits. If the value is greater than 0, move two spaces to the right. Of course, in this example, the limit is a maximum of three layers, each layer can have a maximum of 99 sub-categories, as long as you modify the length and number of digits of the type_layer according to user requirements, you can change the number of restricted layers and the number of sub-categories. In fact, the above design is not only used in the category table, some online forum programs that can be displayed by tree type list mostly adopt similar design.

Some people may think that the type_father field in type_table_2 is redundant data and can be removed. In this case, when inserting or deleting a category, you have to make a tedious judgment on the content of the type_layer, so I have not deleted the type_father field, this is also in line with the principle of appropriately retaining redundant data in the database design to reduce Program Complexity. I will introduce a case of intentionally increasing data redundancy.

Ii. Design of commodity information table

Assume that you are an electronic brain developer of a department store. One day, the boss asks you to develop an online e-commerce platform for the company. The department store sells thousands of products, however, currently, we only plan to sell dozens of products online for convenient transportation. Of course, new products may be added to the e-commerce platform in the future. Now we have begun to design the product information table of the platform database. Each sold item has the same attributes, such as the product ID, product name, product category, related information, supplier, number of items included, inventory, purchase price, sales price, and discount price. You will soon design four tables: wares_type, wares_provider, and wares_info ):

Item type table (wares_type)

Name Type Constraints Description
Type_id Int No duplicates Category id, primary key
Type_name Char (50) Cannot be blank Type name, which cannot be repeated
Type_father Int Cannot be blank The parent category ID of this class. If it is a top node, it is set to a unique value.
Type_layer Char (6) Three Layers limited. The initial value is 000000. Class first traversal, mainly to reduce the number of times the database is retrieved

Supplier table (wares_provider)

Name Type Constraints Description
Provider_id Int No duplicates Supplier ID, primary key
Provider_name Char (100) Cannot be blank Supplier name

Item info table (wares_info)

Name Type Constraints Description
Wares_id Int No duplicates Item ID, primary key
Wares_name Char (100) Cannot be blank Product Name
Wares_type Int Cannot be blank Product type identifier, associated with wares_type.type_id
Wares_info Char (200) Allowed to be empty Related information
Provider Int Cannot be blank Supplier ID, associated with wares_provider.provider_id
Setnum Int The initial value is 1. Number of embedded parts. The default value is 1.
Stock Int The initial value is 0. Inventory. The default value is 0.
Buy_price Money Cannot be blank Import price
Sell_price Money Cannot be blank Sales Price
Discount Money Cannot be blank Discount price

You check the three tables for the boss. The boss hopes to add another item image field, but only some items have images. OK. You added a haspic bool field to the item info table (wares_info), and then created a new table -- Item image table (wares_pic ):

Product image table (wares_pic)

Name Type Constraints Description
Pic_id Int No duplicates Product image ID, primary key
Wares_id Int Cannot be blank Product ID, associated with wares_info.wares_id
Pic_address Char (200) Cannot be blank Image Storage path

After the program development is completed, it fully meets the current requirements of the boss and is officially enabled. After a while, the boss plans to launch new product sales on this platform. All of these products must be added with the "length" attribute. The first round of hard work ...... Of course, you added a bool field of haslength in the item info table (wares_info) according to the old method of adding the item image table, and created a new table-the item length table (wares_length):

Item length table (wares_length)

Name Type Constraints Description
Length_id Int No duplicates Product image ID, primary key
Wares_id Int Cannot be blank Product ID, associated with wares_info.wares_id
Length Char (20) Cannot be blank Product length description

Not long after the change, the boss planned to add a new batch of products. This time, all the products of this type need to be added with the "width" attribute. You have bitten your teeth, caught your teeth, and added the commodity width table (wares_width ). After a while, the boss has some new products that need to be added with the "height" attribute. Do you feel that the database you designed is growing in this way, will it soon become a maze? So, is there any way to curb this unpredictability, but it is similar to repeated database expansion? I have read Agile Software Development: Principles, models, and practices and I have found a similar example: 7.3 "copy" program. Among them, I strongly agree with the idea of Agile Software Development: At first there was almost no pre-design, but once the demand changes, as a programmer pursuing excellence, the entire architecture design should be reviewed from the ground up to design a system architecture that can meet similar changes in the future. The following is the modification scheme I provided when I want to add the "length" attribute:

Remove the haspic field in the item info table (wares_info) and add two additional item attribute tables (wares_ex_property) and item extra information table (wares_ex_info) to add new attributes.

Additional item Attribute Table (wares_ex_property)

Name Type Constraints Description
Ex_pid Int No duplicates Additional item property identifier, primary key
P_name Char (20) Cannot be blank Additional property name

Additional item info table (wares_ex_info)

Name Type Constraints Description
Ex_iid Int No duplicates Additional item information identifier, primary key
Wares_id Int Cannot be blank Product ID, associated with wares_info.wares_id
Property_id Int Cannot be blank Additional product property identifier, associated with wares_ex_property.ex_pid
Property_value Char (200) Cannot be blank Additional item property value

Add two records to the additional item Attribute Table (wares_ex_property:
Ex_pid p_name
1. product images
2 Commodity Length

Add an additional item property management feature to the background management function of the entire e-commerce platform. New attributes will appear when new items are added later, you only need to use this function to add a record to the additional product Attribute Table (wares_ex_property. Don't be afraid of changes. It's not a bad thing to be hit by the first bullet. The bad thing is to be hit by the second and third bullets flying in the same orbit. The sooner the first bullet arrives, the heavier the injury, and the stronger the resistance. 8)

Iii. Design of multiple users and their permission management

Developing database management software cannot ignore the issue of multi-user and user permission settings. Although the large and medium-sized back-end database system software on the market currently provides multiple users and the permission setting function for a table in a database, I personally suggest: A mature set of database management software should still design the user management function on its own, for two reasons:

1. The multi-user and permission settings provided by the large and medium-sized back-end database system software are aimed at the common attributes of the database and may not fully meet the requirements of some special cases;

2. Do not rely too much on some special functions of the background database system software. Various large and medium-sized background database system software are not fully compatible. Otherwise, the previous architecture design may not be reused once the database platform or background database system software version needs to be changed in the future.

The following describes how to design a flexible multi-user management module. That is, the system administrator of the database management software can add new users, modify the permissions of existing users, and delete existing users. First, analyze the user requirements and list all the functions that need to be implemented by the database management software. Then, sort these functions according to certain contacts, that is, classify the functions that a certain user needs into one category; finally, start table creation:

Menu (function_table)

Name Type Constraints Description
F_id Int No duplicates Function ID, primary key
F_name Char (20) Cannot be blank Function name, which cannot be repeated
F_desc Char (50) Allowed to be empty Function Description

User Group Table (user_group)

Name Type Constraints Description
Group_id Int No duplicates User Group ID, primary key
Group_name Char (20) Cannot be blank User Group Name
Group_power Char (100) Cannot be blank User Group permission table, which contains a set of f_id

User table (user_table)

Name Type Constraints Description
User_id Int No duplicates User ID, primary key
User_name Char (20) No duplicates User Name
User_pwd Char (20) Cannot be blank User Password
User_type Int Cannot be blank User Group ID, associated with user_group.group_id

This user group architecture is used. To add a new user, you only need to specify the user group to which the new user belongs. When the system needs to add new features or modify the permissions of old features, only the operation menu and user group table records are used, and the functions of the original user can be changed accordingly. Of course, this architecture design moves the function determination of the database management software to the front-end, making the front-end development more complex. However, when the number of users is large (more than 10) or the probability of software upgrade is high in the future, this price is worthwhile.

Iv. Simple Batch M: N Design

When we encounter the M: n relationship, we usually create three tables, M, N, and M: N. However, M: N sometimes encounters batch processing. For example, to borrow books from the library, users are generally allowed to borrow n books at the same time. If you want to query borrowing records by batch, that is, how to design a list of all books borrowed by a user? Let's create three required tables first:

Book_table)

Name Type Constraints Description
Book_id Int No duplicates Book ID, primary key
Book_no Char (20) No duplicates Book No.
Book_name Char (100) Cannot be blank Book name
......      

Borrow User table (renter_table)

Name Type Constraints Description
Renter_id Int No duplicates User ID, primary key
Renter_name Char (20) Cannot be blank User Name
......      

Borrow record table (pai_log)

Name Type Constraints Description
Consumer _id Int No duplicates Borrow Record ID, primary key
R_id Int Cannot be blank User ID, associated with renter_table.renter_id
B _id Int Cannot be blank Book ID, associated with book_table.book_id
1__date Datetime Cannot be blank Borrow time
......      

To query borrowing records by batch, we can create another table to store the information of batch borrowing, for example:

Batch_rent)

Name Type Constraints Description
Batch_id Int No duplicates Batch borrow ID, primary key
Batch_no Int Cannot be blank Batch borrow number, which is the same as batch_no
Consumer _id Int Cannot be blank ID of the borrow record, associated with pai_log.pai_id
Batch_date Datetime Cannot be blank Batch borrow time

Is this a good design? Let's take a look at how to query to list all the books borrowed by a user? First, retrieve the batch_rent table, save the data in the region _id field of all records that meet the conditions, and then use the data as the query conditions to bring the data to the borrow _log table for query. So, is there any way to improve it? The following provides a simple batch design scheme. You do not need to add a new table. You only need to modify the borrowing record table (pai_log. The modified record table (pai_log) is as follows:

Borrow record table (pai_log)

Name Type Constraints Description
Consumer _id Int No duplicates Borrow Record ID, primary key
R_id Int Cannot be blank User ID, associated with renter_table.renter_id
B _id Int Cannot be blank Book ID, associated with book_table.book_id
Batch_no Int Cannot be blank Batch borrow number, which is the same as batch_no
1__date Datetime Cannot be blank Borrow time
......      

Here, the batch_no of the same lending is the same as the region _id of the first batch of warehouse receiving. For example, if the maximum shard _id is 64 and a user borrows three books at a time, the batch_no values of the three borrowing records inserted in batch are 65. Then another user rented a set of discs and inserted the portable _id of the rental record as 68. With this design, you only need to use a standard t_ SQL nested query to query the batch borrowing information. Of course, this design does not comply with 3nf, but which one is better than the above standard 3nf design? I don't need to say the answer.

V. Selection of redundant data

In the previous article, a redundant field is retained in the "data table with tree relationships". The example here goes further-a redundant table is added. Let's take a look at the example: in order to solve the employees' work meals in my original company, I contacted a nearby small restaurant and recorded the meals every day. The expenses were shared by the number of people and settled by the company's cash at the end of the month, each person's work meal fee is deducted from the employee's salary every month. Of course, the number of people who eat each day is not fixed, and the cost of each meal varies according to the color of the dishes selected for each work meal. For example, five people spent 40 yuan for lunch on Monday, 20 yuan for dinner, 36 yuan for lunch on Tuesday, and 18 yuan for dinner. To facilitate the calculation of each person's work meal fee per month, I wrote a simple dining Accounting Management Program, with three tables in the database:

Employee table (clerk_table)

Name Type Constraints Description
Clerk_id Int No duplicates Employee ID, primary key
Clerk_name Char (10) Cannot be blank Employee name

Total table for each meal (eatdata1)

Name Type Constraints Description
Totle_id Int No duplicates Table ID for each meal, primary key
Persons Char (100) Cannot be blank Staff identification set for dining staff
Eat_date Datetime Cannot be blank Dining date
Eat_type Char (1) Cannot be blank Dining type, used to distinguish between lunch and dinner
Totle_price Money Cannot be blank Total cost per meal
Persons_num Int Cannot be blank Number of diners

Dining billing table (eatdata2)

Name Type Constraints Description
ID Int No duplicates Dining billing table ID, primary key
T_id Int Cannot be blank Table ID of each meal, associated with eatdata1.totle _ id
C_id Int Cannot be blank Employee ID, associated with clerk_table.clerk_id
Price Money Cannot be blank Cost per meal per person

The record of the dining Bill detail table (eatdata2) is to split one record of each meal summary table (eatdata1) by the dining staff, which is a redundant table without compromise. Of course, you can also combine some fields of the eatdata1 table into the eatdata2 table, so that the eatdata1 table becomes a redundant table, however, the design of the dining billing table has more duplicate data, which is better than the above solution. However, the redundant table eatdata2 simplifies programming while collecting the monthly meal fee statistics, you can use a query statement like this to calculate the number of meals sent per person per month and the general ledger of meals:

Select clerk_name as personname, count (c_id) as eattimes, sum (price) as ptprice from eatdata2 join clerk_tabsystemic on (c_id = clerk_id) join eatdata1 on (totleid = tid) where eat_date> = convert (datetime, '"& the_date &"') and eat_date <dateadd (month, 1, convert (datetime, '"& the_date &"')) group by c_id

Imagine that if you don't need this redundant table, every time you calculate the average monthly meal fee, it will be a lot of trouble, and the program efficiency is also awkward. So when can we add some redundant data? I think there are two principles:

1. Overall user requirements. When the user pays more attention to the data that is listed after the database standard records are processed according to certain algorithms. If the algorithm can be completed directly using the built-in functions of the background database system, you can add redundant fields or even redundant tables to store the data processed by the algorithm. You must know that the efficiency of the background database system for querying, modifying, or deleting large volumes of data is much higher than that of the code we write.

2. simplify development. There are many methods to implement the same functions in modern software development. Although programmers do not need to be proficient in the vast majority of development tools and platforms, they still need to know which method to use with which development tool programs are more concise and more efficient. The essence of redundant data is to use space for time, especially the current development of hardware is much higher than that of software, so proper redundancy is acceptable. However, I would like to emphasize at the end that we should not rely too much on the features of platforms and development tools to simplify development. If we are not sure about this degree, we will continue to maintain and upgrade later.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.