1. Data Tables with tree relationships
Many programmers have encountered tree-based data during database design. For example, a common category table, that is, a large category, contains several sub-classes, some sub-classes have sub-classes. When the category is not determined, you want to add a new subclass under any category, or delete a category and all its subclasses, and the number of them is expected to increase gradually in the future, in this case, we will consider using a data table to save the data. According to the textbooks, the second type of programmers will probably design a data table structure similar to this:
Category Table _ 1 (type_table_1)
Name |
Type |
Constraints |
Description |
Type_id |
Int |
No duplicates |
Category id, primary key |
Type_name |
Char (50) |
Cannot be blank |
Type name, which cannot be repeated |
Type_father |
Int |
Cannot be blank |
The parent category ID of this class. If it is a top node, it is set to a unique value. |
This design is short and concise, fully satisfying 3nf, and can satisfy all user requirements. Is that all right? The answer is no! Why?
Let's estimate how the user wants to list the data of this table. For the user, he certainly expects to list all the categories at a time based on the hierarchy set by him, for example:
Total category
Category 1
CATEGORY 1.1
CATEGORY 1.1.1
CATEGORY 1.2
Category 2
CATEGORY 2.1
Category 3
CATEGORY 3.1
CATEGORY 3.2
......
Let's see how many times the above table needs to be searched to display such a list (first-order tree traversal? Note that although category 1.1.1 may be a record added after category 3.2, the answer is still n times. This efficiency does not affect a small amount of data, but after the data type is expanded to dozens or even hundreds of records in the future, the table will be retrieved dozens of times for a single column, the efficiency of the entire program is not flattering. Maybe the second type of programmers will say, then I will create a temporary array or temporary table to save the first order traversal results of the type table, so that the results will be retrieved dozens of times at the first run, when listing all the type relationships again, you can directly read the temporary array or temporary table. In fact, you don't need to allocate a new memory to save the data. You just need to expand the data table and then constrain the number of types to be added, you only need to retrieve the list once. The expanded data table structure is as follows:
Category Table _ 2 (type_table_2)
Name |
Type |
Constraints |
Description |
Type_id |
Int |
No duplicates |
Category id, primary key |
Type_name |
Char (50) |
Cannot be blank |
Type name, which cannot be repeated |
Type_father |
Int |
Cannot be blank |
The parent category ID of this class. If it is a top node, it is set to a unique value. |
Type_layer |
Char (6) |
Three Layers limited. The initial value is 000000. |
Class first traversal, mainly to reduce the number of times the database is retrieved |
According to this table structure, let's take a look at the data recorded in the table in the above example:
Type_id type_name type_father type_layer
1. Total Category 0 000000
2 Category 1 1 010000
3 category 1.1 2 010100
4 category 1.2 2 010200
5 Category 2 1 020000
6 category 2.1 5 020100
7 category 3 1 030000
8 category 3.1 7 030100
9 category 3.2 7 030200
10 category 1.1.1 3 010101
......
Search by type_layer: Select * From type_table_2 order by type_layer
The record set is listed as follows:
Type_id type_name type_father type_layer
1. Total Category 0 000000
2 Category 1 1 010000
3 category 1.1 2 010100
10 category 1.1.1 3 010101
4 category 1.2 2 010200
5 Category 2 1 020000
6 category 2.1 5 020100
7 category 3 1 030000
8 category 3.1 7 030100
9 category 3.2 7 030200
......
The record order is exactly the result of first-order traversal. When controlling the hierarchy of display classes, you only need to judge the values in the type_layer field, each of which has two digits. If the value is greater than 0, move two spaces to the right. Of course, in this example, the limit is a maximum of three layers, each layer can have a maximum of 99 sub-categories, as long as you modify the length and number of digits of the type_layer according to user requirements, you can change the number of restricted layers and the number of sub-categories. In fact, the above design is not only used in the category table, some online forum programs that can be displayed by tree type list mostly adopt similar design.
Some people may think that the type_father field in type_table_2 is redundant data and can be removed. In this case, when inserting or deleting a category, you have to make a tedious judgment on the content of the type_layer, so I have not deleted the type_father field, this is also in line with the principle of appropriately retaining redundant data in the database design to reduce Program Complexity. I will introduce a case of intentionally increasing data redundancy.
Ii. Design of commodity information table
Assume that you are an electronic brain developer of a department store. One day, the boss asks you to develop an online e-commerce platform for the company. The department store sells thousands of products, however, currently, we only plan to sell dozens of products online for convenient transportation. Of course, new products may be added to the e-commerce platform in the future. Now we have begun to design the product information table of the platform database. Each sold item has the same attributes, such as the product ID, product name, product category, related information, supplier, number of items included, inventory, purchase price, sales price, and discount price. You will soon design four tables: wares_type, wares_provider, and wares_info ):
Item type table (wares_type)
Name |
Type |
Constraints |
Description |
Type_id |
Int |
No duplicates |
Category id, primary key |
Type_name |
Char (50) |
Cannot be blank |
Type name, which cannot be repeated |
Type_father |
Int |
Cannot be blank |
The parent category ID of this class. If it is a top node, it is set to a unique value. |
Type_layer |
Char (6) |
Three Layers limited. The initial value is 000000. |
Class first traversal, mainly to reduce the number of times the database is retrieved |
Supplier table (wares_provider)
Name |
Type |
Constraints |
Description |
Provider_id |
Int |
No duplicates |
Supplier ID, primary key |
Provider_name |
Char (100) |
Cannot be blank |
Supplier name |
Item info table (wares_info)
Name |
Type |
Constraints |
Description |
Wares_id |
Int |
No duplicates |
Item ID, primary key |
Wares_name |
Char (100) |
Cannot be blank |
Product Name |
Wares_type |
Int |
Cannot be blank |
Product type identifier, associated with wares_type.type_id |
Wares_info |
Char (200) |
Allowed to be empty |
Related information |
Provider |
Int |
Cannot be blank |
Supplier ID, associated with wares_provider.provider_id |
Setnum |
Int |
The initial value is 1. |
Number of embedded parts. The default value is 1. |
Stock |
Int |
The initial value is 0. |
Inventory. The default value is 0. |
Buy_price |
Money |
Cannot be blank |
Import price |
Sell_price |
Money |
Cannot be blank |
Sales Price |
Discount |
Money |
Cannot be blank |
Discount price |
You check the three tables for the boss. The boss hopes to add another item image field, but only some items have images. OK. You added a haspic bool field to the item info table (wares_info), and then created a new table -- Item image table (wares_pic ):
Product image table (wares_pic)
Name |
Type |
Constraints |
Description |
Pic_id |
Int |
No duplicates |
Product image ID, primary key |
Wares_id |
Int |
Cannot be blank |
Product ID, associated with wares_info.wares_id |
Pic_address |
Char (200) |
Cannot be blank |
Image Storage path |
After the program development is completed, it fully meets the current requirements of the boss and is officially enabled. After a while, the boss plans to launch new product sales on this platform. All of these products must be added with the "length" attribute. The first round of hard work ...... Of course, you added a bool field of haslength in the item info table (wares_info) according to the old method of adding the item image table, and created a new table-the item length table (wares_length):
Item length table (wares_length)
Name |
Type |
Constraints |
Description |
Length_id |
Int |
No duplicates |
Product image ID, primary key |
Wares_id |
Int |
Cannot be blank |
Product ID, associated with wares_info.wares_id |
Length |
Char (20) |
Cannot be blank |
Product length description |
Not long after the change, the boss planned to add a new batch of products. This time, all the products of this type need to be added with the "width" attribute. You have bitten your teeth, caught your teeth, and added the commodity width table (wares_width ). After a while, the boss has some new products that need to be added with the "height" attribute. Do you feel that the database you designed is growing in this way, will it soon become a maze? So, is there any way to curb this unpredictability, but it is similar to repeated database expansion? I have read Agile Software Development: Principles, models, and practices and I have found a similar example: 7.3 "copy" program. Among them, I strongly agree with the idea of Agile Software Development: At first there was almost no pre-design, but once the demand changes, as a programmer pursuing excellence, the entire architecture design should be reviewed from the ground up to design a system architecture that can meet similar changes in the future. The following is the modification scheme I provided when I want to add the "length" attribute:
Remove the haspic field in the item info table (wares_info) and add two additional item attribute tables (wares_ex_property) and item extra information table (wares_ex_info) to add new attributes.
Additional item Attribute Table (wares_ex_property)
Name |
Type |
Constraints |
Description |
Ex_pid |
Int |
No duplicates |
Additional item property identifier, primary key |
P_name |
Char (20) |
Cannot be blank |
Additional property name |
Additional item info table (wares_ex_info)
Name |
Type |
Constraints |
Description |
Ex_iid |
Int |
No duplicates |
Additional item information identifier, primary key |
Wares_id |
Int |
Cannot be blank |
Product ID, associated with wares_info.wares_id |
Property_id |
Int |
Cannot be blank |
Additional product property identifier, associated with wares_ex_property.ex_pid |
Property_value |
Char (200) |
Cannot be blank |
Additional item property value |
Add two records to the additional item Attribute Table (wares_ex_property:
Ex_pid p_name
1. product images
2 Commodity Length
Add an additional item property management feature to the background management function of the entire e-commerce platform. New attributes will appear when new items are added later, you only need to use this function to add a record to the additional product Attribute Table (wares_ex_property. Don't be afraid of changes. It's not a bad thing to be hit by the first bullet. The bad thing is to be hit by the second and third bullets flying in the same orbit. The sooner the first bullet arrives, the heavier the injury, and the stronger the resistance. 8)
Iii. Design of multiple users and their permission management
Developing database management software cannot ignore the issue of multi-user and user permission settings. Although the large and medium-sized back-end database system software on the market currently provides multiple users and the permission setting function for a table in a database, I personally suggest: A mature set of database management software should still design the user management function on its own, for two reasons:
1. The multi-user and permission settings provided by the large and medium-sized back-end database system software are aimed at the common attributes of the database and may not fully meet the requirements of some special cases;
2. Do not rely too much on some special functions of the background database system software. Various large and medium-sized background database system software are not fully compatible. Otherwise, the previous architecture design may not be reused once the database platform or background database system software version needs to be changed in the future.
The following describes how to design a flexible multi-user management module. That is, the system administrator of the database management software can add new users, modify the permissions of existing users, and delete existing users. First, analyze the user requirements and list all the functions that need to be implemented by the database management software. Then, sort these functions according to certain contacts, that is, classify the functions that a certain user needs into one category; finally, start table creation:
Menu (function_table)
Name |
Type |
Constraints |
Description |
F_id |
Int |
No duplicates |
Function ID, primary key |
F_name |
Char (20) |
Cannot be blank |
Function name, which cannot be repeated |
F_desc |
Char (50) |
Allowed to be empty |
Function Description |
User Group Table (user_group)
Name |
Type |
Constraints |
Description |
Group_id |
Int |
No duplicates |
User Group ID, primary key |
Group_name |
Char (20) |
Cannot be blank |
User Group Name |
Group_power |
Char (100) |
Cannot be blank |
User Group permission table, which contains a set of f_id |
User table (user_table)
Name |
Type |
Constraints |
Description |
User_id |
Int |
No duplicates |
User ID, primary key |
User_name |
Char (20) |
No duplicates |
User Name |
User_pwd |
Char (20) |
Cannot be blank |
User Password |
User_type |
Int |
Cannot be blank |
User Group ID, associated with user_group.group_id |
This user group architecture is used. To add a new user, you only need to specify the user group to which the new user belongs. When the system needs to add new features or modify the permissions of old features, only the operation menu and user group table records are used, and the functions of the original user can be changed accordingly. Of course, this architecture design moves the function determination of the database management software to the front-end, making the front-end development more complex. However, when the number of users is large (more than 10) or the probability of software upgrade is high in the future, this price is worthwhile.
Iv. Simple Batch M: N Design
When we encounter the M: n relationship, we usually create three tables, M, N, and M: N. However, M: N sometimes encounters batch processing. For example, to borrow books from the library, users are generally allowed to borrow n books at the same time. If you want to query borrowing records by batch, that is, how to design a list of all books borrowed by a user? Let's create three required tables first:
Book_table)
Name |
Type |
Constraints |
Description |
Book_id |
Int |
No duplicates |
Book ID, primary key |
Book_no |
Char (20) |
No duplicates |
Book No. |
Book_name |
Char (100) |
Cannot be blank |
Book name |
...... |
|
|
|
Borrow User table (renter_table)
Name |
Type |
Constraints |
Description |
Renter_id |
Int |
No duplicates |
User ID, primary key |
Renter_name |
Char (20) |
Cannot be blank |
User Name |
...... |
|
|
|
Borrow record table (pai_log)
Name |
Type |
Constraints |
Description |
Consumer _id |
Int |
No duplicates |
Borrow Record ID, primary key |
R_id |
Int |
Cannot be blank |
User ID, associated with renter_table.renter_id |
B _id |
Int |
Cannot be blank |
Book ID, associated with book_table.book_id |
1__date |
Datetime |
Cannot be blank |
Borrow time |
...... |
|
|
|
To query borrowing records by batch, we can create another table to store the information of batch borrowing, for example:
Batch_rent)
Name |
Type |
Constraints |
Description |
Batch_id |
Int |
No duplicates |
Batch borrow ID, primary key |
Batch_no |
Int |
Cannot be blank |
Batch borrow number, which is the same as batch_no |
Consumer _id |
Int |
Cannot be blank |
ID of the borrow record, associated with pai_log.pai_id |
Batch_date |
Datetime |
Cannot be blank |
Batch borrow time |
Is this a good design? Let's take a look at how to query to list all the books borrowed by a user? First, retrieve the batch_rent table, save the data in the region _id field of all records that meet the conditions, and then use the data as the query conditions to bring the data to the borrow _log table for query. So, is there any way to improve it? The following provides a simple batch design scheme. You do not need to add a new table. You only need to modify the borrowing record table (pai_log. The modified record table (pai_log) is as follows:
Borrow record table (pai_log)
Name |
Type |
Constraints |
Description |
Consumer _id |
Int |
No duplicates |
Borrow Record ID, primary key |
R_id |
Int |
Cannot be blank |
User ID, associated with renter_table.renter_id |
B _id |
Int |
Cannot be blank |
Book ID, associated with book_table.book_id |
Batch_no |
Int |
Cannot be blank |
Batch borrow number, which is the same as batch_no |
1__date |
Datetime |
Cannot be blank |
Borrow time |
...... |
|
|
|
Here, the batch_no of the same lending is the same as the region _id of the first batch of warehouse receiving. For example, if the maximum shard _id is 64 and a user borrows three books at a time, the batch_no values of the three borrowing records inserted in batch are 65. Then another user rented a set of discs and inserted the portable _id of the rental record as 68. With this design, you only need to use a standard t_ SQL nested query to query the batch borrowing information. Of course, this design does not comply with 3nf, but which one is better than the above standard 3nf design? I don't need to say the answer.
V. Selection of redundant data
In the previous article, a redundant field is retained in the "data table with tree relationships". The example here goes further-a redundant table is added. Let's take a look at the example: in order to solve the employees' work meals in my original company, I contacted a nearby small restaurant and recorded the meals every day. The expenses were shared by the number of people and settled by the company's cash at the end of the month, each person's work meal fee is deducted from the employee's salary every month. Of course, the number of people who eat each day is not fixed, and the cost of each meal varies according to the color of the dishes selected for each work meal. For example, five people spent 40 yuan for lunch on Monday, 20 yuan for dinner, 36 yuan for lunch on Tuesday, and 18 yuan for dinner. To facilitate the calculation of each person's work meal fee per month, I wrote a simple dining Accounting Management Program, with three tables in the database:
Employee table (clerk_table)
Name |
Type |
Constraints |
Description |
Clerk_id |
Int |
No duplicates |
Employee ID, primary key |
Clerk_name |
Char (10) |
Cannot be blank |
Employee name |
Total table for each meal (eatdata1)
Name |
Type |
Constraints |
Description |
Totle_id |
Int |
No duplicates |
Table ID for each meal, primary key |
Persons |
Char (100) |
Cannot be blank |
Staff identification set for dining staff |
Eat_date |
Datetime |
Cannot be blank |
Dining date |
Eat_type |
Char (1) |
Cannot be blank |
Dining type, used to distinguish between lunch and dinner |
Totle_price |
Money |
Cannot be blank |
Total cost per meal |
Persons_num |
Int |
Cannot be blank |
Number of diners |
Dining billing table (eatdata2)
Name |
Type |
Constraints |
Description |
ID |
Int |
No duplicates |
Dining billing table ID, primary key |
T_id |
Int |
Cannot be blank |
Table ID of each meal, associated with eatdata1.totle _ id |
C_id |
Int |
Cannot be blank |
Employee ID, associated with clerk_table.clerk_id |
Price |
Money |
Cannot be blank |
Cost per meal per person |
The record of the dining Bill detail table (eatdata2) is to split one record of each meal summary table (eatdata1) by the dining staff, which is a redundant table without compromise. Of course, you can also combine some fields of the eatdata1 table into the eatdata2 table, so that the eatdata1 table becomes a redundant table, however, the design of the dining billing table has more duplicate data, which is better than the above solution. However, the redundant table eatdata2 simplifies programming while collecting the monthly meal fee statistics, you can use a query statement like this to calculate the number of meals sent per person per month and the general ledger of meals:
Select clerk_name as personname, count (c_id) as eattimes, sum (price) as ptprice from eatdata2 join clerk_tabsystemic on (c_id = clerk_id) join eatdata1 on (totleid = tid) where eat_date> = convert (datetime, '"& the_date &"') and eat_date <dateadd (month, 1, convert (datetime, '"& the_date &"')) group by c_id
Imagine that if you don't need this redundant table, every time you calculate the average monthly meal fee, it will be a lot of trouble, and the program efficiency is also awkward. So when can we add some redundant data? I think there are two principles:
1. Overall user requirements. When the user pays more attention to the data that is listed after the database standard records are processed according to certain algorithms. If the algorithm can be completed directly using the built-in functions of the background database system, you can add redundant fields or even redundant tables to store the data processed by the algorithm. You must know that the efficiency of the background database system for querying, modifying, or deleting large volumes of data is much higher than that of the code we write.
2. simplify development. There are many methods to implement the same functions in modern software development. Although programmers do not need to be proficient in the vast majority of development tools and platforms, they still need to know which method to use with which development tool programs are more concise and more efficient. The essence of redundant data is to use space for time, especially the current development of hardware is much higher than that of software, so proper redundancy is acceptable. However, I would like to emphasize at the end that we should not rely too much on the features of platforms and development tools to simplify development. If we are not sure about this degree, we will continue to maintain and upgrade later.