Go to database design skills

Source: Internet
Author: User

When it comes to databases, I don't think we have to talk about the data structure first. In 1996, when I first went to college to study computer programming, the teacher told us: ComputerProgram= Data Structure +Algorithm. Although the current development of programs has gradually transitioned from process-oriented to object-oriented, I strongly agree with the formula that the teacher told us eight years ago: computer programs = data structures + algorithms. The first thing to do in object-oriented program development is to analyze the data to be processed in the entire program, extract the abstract template from it, and design the class with this abstract template, then, add a function (algorithm) to process the data. Finally, assign access permissions to the data members and functions in the class to encapsulate the data.

The original prototype of the database is said to have originated from the bookkeeping of a dairy farm in the United States (paper, it can be seen that the database is not necessarily stored in the computer's data ^_^ ), it records the income and expenditure accounts of the dairy farm. Programmers are inspired to organize and input the income and expenditure accounts into the computer. When the amount of data collected according to the specified data structure reaches a certain level, the programmer separates the retrieval, update, maintenance, and other functions from the program execution efficiency, as a separate calling module, this module gradually develops and evolves into the database management system (DBMS) that we are currently exposed to-an important branch in program development.

Next, let's start with the question. First of all, we will give the Database Designer the skills of my personal programs:
1. programmers who have not systematically studied data structures. This type of programmer's work is often just their impromptu toys. They often get used to designing only a few limited tables, and all the data for certain functions is inserted in one table, with almost no association between the tables. A lot of free management software on the internet is like this. When the program function is limited and the data volume is small, it is no problem to run the program. However, if it is used to manage important data, high risk.
2. The system has learned the data structure, but has not developed any programmers who want to manage software with high program efficiency. Most of these people just graduated from school soon, they in the design of the database table structure, strictly in accordance with the provisions of the textbook, the E-R and 3nf (Don't lose heart, all database design experts start from this step ). Their work is sufficient for general access-type lightweight management software. However, once the system needs to add new features, the original database tables will almost have to be replaced.
3. The second type of programmers, after several program efficiency improvements and functional upgrades, finally upgraded to the database design veteran, the first class of programmers in the eyes of high people. These programmers can develop medium-sized commercial data management systems with more than 20 tables. They know under what circumstances they should keep a certain amount of redundant data to improve program efficiency, and their database design is scalable. When users need to add new features, you only need to make a few changes to the original database table.
4. After having undergone repeated design of ten similar database management software, the third type of programmers will gradually become enlightened by those who wish to find out the "lazy" tricks, thus, the transformation from the completion volume to the qualitative change. The database table structure they designed has a certain foresight and can predict the data required for future function upgrades, thus leaving a foreshadowing in advance. Most of these programmers are now promoted to senior software developers in data mining.
5. Third-class programmers or fourth-class programmers, after studying the principles and development of existing database management systems, either conduct secondary development based on them, you can either develop a universal database management system with independent copyrights.

I am personally at the end of the third category, so some of the design skills listed below are only applicable to the second and some third-category database designers. At the same time, as I rarely encounter colleagues who are interested in drilling down in this area, it is inevitable that errors and omissions will appear in this article. I would like to make a statement first. You are welcome to correct it and do not hide it in private. 8)

1. Data Tables with tree relationships
Many programmers have encountered tree-based data during database design. For example, a common category table, that is, a large category, contains several sub-classes, some sub-classes have sub-classes. When the category is not determined, you want to add a new subclass under any category, or delete a category and all its subclasses, and the number of them is expected to increase gradually in the future, in this case, we will consider using a data table to save the data. According to the textbooks, the second type of programmers will probably design a data table structure similar to this:

Category Table _ 1 (type_table_1)
Description of name type constraints
Type_id int no duplicate category ID, primary key
Type_name char (50) cannot be empty or repeated
Type_father int cannot be null. If it is a top node, it is set to a unique value.

This design is short and concise, fully satisfying 3nf, and can satisfy all user requirements. Is that all right? The answer is no! Why?

Let's estimate how the user wants to list the data of this table. For the user, he certainly expects to list all the categories at a time based on the hierarchy set by him, for example:
Total category
Category 1
CATEGORY 1.1
CATEGORY 1.1.1
CATEGORY 1.2
Category 2
CATEGORY 2.1
Category 3
CATEGORY 3.1
CATEGORY 3.2
......

Let's see how many times the above table needs to be searched to display such a list (first-order tree traversal? Note that although category 1.1.1 may be a record added after category 3.2, the answer is still n times. This efficiency does not affect a small amount of data, but after the data type is expanded to dozens or even hundreds of records in the future, the table will be retrieved dozens of times for a single column, the efficiency of the entire program is not flattering. Maybe the second type of programmers will say, then I will create a temporary array or temporary table to save the first order traversal results of the type table, so that the results will be retrieved dozens of times at the first run, when listing all the type relationships again, you can directly read the temporary array or temporary table. In fact, you don't need to allocate a new memory to save the data. You just need to expand the data table and then constrain the number of types to be added, you only need to retrieve the list once. The expanded data table structure is as follows:

Category Table _ 2 (type_table_2)
Description of name type constraints
Type_id int no duplicate category ID, primary key
Type_name char (50) cannot be empty or repeated
Type_father int cannot be null. If it is a top node, it is set to a unique value.
Type_layer char (6) is limited to three layers. The initial value is the first sequential traversal of the 000000 category, mainly to reduce the number of times the database is retrieved.

According to this table structure, let's take a look at the data recorded in the table in the above example:

Type_id type_name type_father type_layer
1. Total Category 0 000000
2 Category 1 1 010000
3 category 1.1 2 010100
4 category 1.2 2 010200
5 Category 2 1 020000
6 category 2.1 5 020100
7 category 3 1 030000
8 category 3.1 7 030100
9 category 3.2 7 030200
10 category 1.1.1 3 010101
......

Search by type_layer: Select * From type_table_2 order by type_layer

The record set is listed as follows:

Type_id type_name type_father type_layer
1. Total Category 0 000000
2 Category 1 1 010000
3 category 1.1 2 010100
10 category 1.1.1 3 010101
4 category 1.2 2 010200
5 Category 2 1 020000
6 category 2.1 5 020100
7 category 3 1 030000
8 category 3.1 7 030100
9 category 3.2 7 030200
......

The record order is exactly the result of first-order traversal. When controlling the hierarchy of display classes, you only need to judge the values in the type_layer field, each of which has two digits. If the value is greater than 0, move two spaces to the right. Of course, in this example, the limit is a maximum of three layers, each layer can have a maximum of 99 sub-categories, as long as you modify the length and number of digits of the type_layer according to user requirements, you can change the number of restricted layers and the number of sub-categories. In fact, the above design is not only used in the category table, some online forum programs that can be displayed by tree type list mostly adopt similar design.

Some people may think that the type_father field in type_table_2 is redundant data and can be removed. In this case, when inserting or deleting a category, you have to make a tedious judgment on the content of the type_layer, so I have not deleted the type_father field, this is also in line with the principle of appropriately retaining redundant data in the database design to reduce Program Complexity. I will introduce a case of intentionally increasing data redundancy.


Ii. Design of commodity information table
Assume that you are an electronic brain developer of a department store. One day, the boss asks you to develop an online e-commerce platform for the company. The department store sells thousands of products, however, currently, we only plan to sell dozens of products online for convenient transportation. Of course, new products may be added to the e-commerce platform in the future. Now we have begun to design the product information table of the platform database. Each sold item has the same attributes, such as the product ID, product name, product category, related information, supplier, number of items included, inventory, purchase price, sales price, and discount price. You will soon design four tables: wares_type, wares_provider, and wares_info ):

Item type table (wares_type)
Description of name type constraints
Type_id int no duplicate category ID, primary key
Type_name char (50) cannot be empty or repeated
Type_father int cannot be null. If it is a top node, it is set to a unique value.
Type_layer char (6) is limited to three layers. The initial value is the first sequential traversal of the 000000 category, mainly to reduce the number of times the database is retrieved.

Supplier table (wares_provider)
Description of name type constraints
Provider_id int no duplicate supplier ID, primary key
Provider_name char (100) cannot be empty Supplier name

Item info table (wares_info)
Description of name type constraints
Wares_id int no duplicate item ID, primary key
Wares_name char (100) cannot be empty product names
Wares_type int cannot be a null product type identifier, which is associated with wares_type.type_id
Wares_info char (200) allows null Information
Provider int cannot be a blank supplier ID, which is associated with wares_provider.provider_id
The initial value of setnum int is 1. The default value is 1.
The initial value of stock int is 0. The default value is 0.
Buy_price money cannot be empty purchase price
Sell_price money cannot be empty sale price
Discount money cannot be empty discount price

You check the three tables for the boss. The boss hopes to add another item image field, but only some items have images. OK. You added a haspic bool field to the item info table (wares_info), and then created a new table -- Item image table (wares_pic ):

Product image table (wares_pic)
Description of name type constraints
Pic_id int no duplicate product image ID, primary key
Wares_id int cannot be empty product identifier, associated with wares_info.wares_id
Pic_address char (200) cannot be empty image storage path

After the program development is completed, it fully meets the current requirements of the boss and is officially enabled. After a while, the boss plans to launch new product sales on this platform. All of these products must be added with the "length" attribute. The first round of hard work ...... Of course, you added a bool field of haslength in the item info table (wares_info) according to the old method of adding the item image table, and created a new table-the item length table (wares_length):

Item length table (wares_length)
Description of name type constraints
Length_id int no duplicate product image ID, primary key
Wares_id int cannot be empty product identifier, associated with wares_info.wares_id
Length char (20) cannot be empty item length description

Not long after the change, the boss planned to add a new batch of products. This time, all the products of this type need to be added with the "width" attribute. You have bitten your teeth, caught your teeth, and added the commodity width table (wares_width ). After a while, the boss has some new products that need to be added with "height" attributes. Do you feel that the database you designed is growing in this way, will it soon become a maze? So, is there any way to curb this unpredictability, but it is similar to repeated database expansion? I have read Agile Software Development: Principles, models, and practices and I have found a similar example: 7.3 "copy" program. Among them, I strongly agree with the idea of Agile Software Development: At first there was almost no pre-design, but once the demand changes, as a programmer pursuing excellence, the entire architecture design should be reviewed from the ground up to design a system architecture that can meet similar changes in the future. The following is a solution for modifying the "length" attribute:

Remove the haspic field in the item info table (wares_info) and add two additional item attribute tables (wares_ex_property) and item extra information table (wares_ex_info) to add new attributes.

Additional item Attribute Table (wares_ex_property)
Description of name type constraints
Ex_pid int no duplicate item additional attribute identifier, primary key
P_name char (20) cannot be empty.

Additional item info table (wares_ex_info)
Description of name type constraints
Ex_iid int no duplicate item additional information identifier, primary key
Wares_id int cannot be empty product identifier, associated with wares_info.wares_id
Property_id int cannot be empty item additional property identifier, associated with wares_ex_property.ex_pid
Property_value char (200) does not allow null additional item attribute values

Add two records to the additional item Attribute Table (wares_ex_property:
Ex_pid p_name
1. product images
2 Commodity Length

Add an additional item property management feature to the background management function of the entire e-commerce platform. New attributes will appear when new items are added later, you only need to use this function to add a record to the additional product Attribute Table (wares_ex_property. Don't be afraid of changes. It's not a bad thing to be hit by the first bullet. The bad thing is to be hit by the second and third bullets flying in the same orbit. The sooner the first bullet arrives, the heavier the injury, and the stronger the resistance. 8)

Iii. Design of multiple users and their permission management
Developing database management software cannot ignore the issue of multi-user and user permission settings. Although the large and medium-sized back-end database system software on the market currently provides multiple users and the permission setting function for a table in a database, I personally suggest: A mature set of database management software should still design the user management function on its own, for two reasons:
1. The multi-user and permission settings provided by the large and medium-sized back-end database system software are aimed at the common attributes of the database and may not fully meet the requirements of some special cases;
2. Do not rely too much on some special functions of the background database system software. Various large and medium-sized background database system software are not fully compatible. Otherwise, the previous architecture design may not be reused once the database platform or background database system software version needs to be changed in the future.

The following describes how to design a flexible multi-user management module. That is, the system administrator of the database management software can add new users, modify the permissions of existing users, and delete existing users. First, analyze the user requirements and list all the functions that need to be implemented by the database management software. Then, sort these functions according to certain contacts, that is, classify the functions that a certain user needs into one category; finally, start table creation:

Menu (function_table)
Description of name type constraints
F_id int has no duplicate Function Identifier, primary key
F_name char (20) is not allowed to be a blank function name. Duplicate function names are not allowed.
F_desc char (50) allows null Function Description

User Group Table (user_group)
Description of name type constraints
Group_id int no repeated user group ID, primary key
Group_name char (20) cannot be empty user group name
Group_power char (100) allows an empty user group permission table, which contains a set of f_id

User table (user_table)
Description of name type constraints
User_id int no Duplicate User ID, primary key
User_name char (20) No Duplicate User Name
User_pwd char (20) cannot be empty User Password
User_type int cannot be empty user group ID, associated with user_group.group_id

This user group architecture is used. To add a new user, you only need to specify the user group to which the new user belongs. When the system needs to add new features or modify the permissions of old features, only the operation menu and user group table records are used, and the functions of the original user can be changed accordingly. Of course, this architecture design moves the function determination of the database management software to the front-end, making the front-end development more complex. However, when the number of users is large (more than 10) or the probability of software upgrade is high in the future, this price is worthwhile.

Iv. Simple Batch M: N Design
When we encounter the M: n relationship, we usually create three tables, M, N, and M: N. However, M: N sometimes encounters batch processing. For example, to borrow books from the library, users are generally allowed to borrow n books at the same time. If you want to query borrowing records by batch, that is, how to design a list of all books borrowed by a user? Let's create three required tables first:

Book_table)
Description of name type constraints
Book_id int no duplicate book id, primary key
Book_no char (20) No duplicate Book Number
Book_name char (100) cannot be empty book names
......

Borrow User table (renter_table)
Description of name type constraints
Renter_id int no Duplicate User ID, primary key
Renter_name char (20) cannot be empty User Name
......

Borrow record table (pai_log)
Description of name type constraints
Unique _id int no repeated borrowing record ID, primary key
R_id int cannot be a blank user ID, which is associated with renter_table.renter_id
B _id int cannot be an empty book identifier, which is associated with book_table.book_id.
Rent_date datetime does not allow empty borrowing time
......

To query borrowing records by batch, we can create another table to store the information of batch borrowing, for example:

Batch_rent)
Description of name type constraints
Batch_id int no duplicate batch borrowing ID, primary key
The batch_no int cannot be an empty batch borrow number. The same batch of borrowed batch_no is the same.
Invalid _id int cannot be empty borrow Record ID, which is associated with pai_log.#_id
The batch_date datetime value cannot be empty.

Is this a good design? Let's take a look at how to query to list all the books borrowed by a user? First, retrieve the batch_rent table, save the data in the region _id field of all records that meet the conditions, and then use the data as the query conditions to bring the data to the borrow _log table for query. So, is there any way to improve it? The following provides a simple batch design scheme. You do not need to add a new table. You only need to modify the borrowing record table (pai_log. The modified record table (pai_log) is as follows:

Borrow record table (pai_log)
Description of name type constraints
Unique _id int no repeated borrowing record ID, primary key
R_id int cannot be a blank user ID, which is associated with renter_table.renter_id
B _id int cannot be an empty book identifier, which is associated with book_table.book_id.
The batch_no int cannot be an empty batch borrow number. The same batch of borrowed batch_no is the same.
Rent_date datetime does not allow empty borrowing time
......

Here, the batch_no of the same lending is the same as the region _id of the first batch of warehouse receiving. For example, if the maximum shard _id is 64 and a user borrows three books at a time, the batch_no values of the three borrowing records inserted in batch are 65. Then another user rented a set of discs and inserted the portable _id of the rental record as 68. With this design, you only need to use a standard t_ SQL nested query to query the batch borrowing information. Of course, this design does not comply with 3nf, but which one is better than the above standard 3nf design? I don't need to say the answer.

V. Selection of redundant data
In the previous article, a redundant field is retained in the "data table with tree relationships". The example here goes further-a redundant table is added. Let's take a look at the example: in order to solve the employees' work meals in my original company, I contacted a nearby small restaurant and recorded the meals every day. The expenses were shared by the number of people and settled by the company's cash at the end of the month, each person's work meal fee is deducted from the employee's salary every month. Of course, the number of people who eat each day is not fixed, and the cost of each meal varies according to the color of the dishes selected for each work meal. For example, five people spent 40 yuan for lunch on Monday, 20 yuan for dinner, 36 yuan for lunch on Tuesday, and 18 yuan for dinner. To facilitate the calculation of each person's work meal fee per month, I wrote a simple dining Accounting Management Program, with three tables in the database:

Employee table (clerk_table)
Description of name type constraints
Clerk_id int no duplicate employee ID, primary key
Clerk_name char (10) cannot be empty employee name

Total table for each meal (eatdata1)
Description of name type constraints
Totle_id int no duplicate table ID for each meal, primary key
Persons char (100) does not allow empty employee identification sets for dining
Eat_date datetime does not allow empty dining date
Eat_type char (1) cannot be empty dining type, used to distinguish between lunch and dinner
Totle_price money cannot be empty for the total cost of each meal
The persons_num int cannot be empty.

Dining billing table (eatdata2)
Description of name type constraints
Id int, primary key
T_id int cannot be empty. It is associated with eatdata1.totle _ id.
C_id int cannot be a blank employee ID and is associated with clerk_table.clerk_id
Price money cannot be empty for each meal

The record of the dining Bill detail table (eatdata2) is to split one record of each meal summary table (eatdata1) by the dining staff, which is a redundant table without compromise. Of course, you can also combine some fields of the eatdata1 table into the eatdata2 table, so that the eatdata1 table becomes a redundant table, however, the design of the dining billing table has more duplicate data, which is better than the above solution. However, the redundant table eatdata2 simplifies programming while collecting the monthly meal fee statistics, you can use a query statement like this to calculate the number of meals sent per person per month and the general ledger of meals:

Select clerk_name as personname, count (c_id) as eattimes, sum (price) as ptprice from eatdata2 join clerk_tabsystemic on (c_id = clerk_id) join eatdata1 on (totleid = tid) where eat_date> = convert (datetime, '"& the_date &"') and eat_date

Imagine that if you don't need this redundant table, every time you calculate the average monthly meal fee, it will be a lot of trouble, and the program efficiency is also awkward. So when can we add some redundant data? I think there are two principles:

1. Overall user requirements. When the user pays more attention to the data that is listed after the database standard records are processed according to certain algorithms. If the algorithm can be completed directly using the built-in functions of the background database system, you can add redundant fields or even redundant tables to store the data processed by the algorithm. You must know that the efficiency of the background database system for querying, modifying, or deleting large volumes of data is much higher than that of the Code compiled by ourselves.
2. Simplified development complexity. There are many methods to implement the same functions in modern software development. Although programmers do not need to be proficient in the vast majority of development tools and platforms, they still need to know which method to use with which development tool programs are more concise and more efficient. The essence of redundant data is to use space for time, especially the current development of hardware is much higher than that of software, so proper redundancy is acceptable. However, I would like to emphasize at the end that we should not rely too much on the features of platforms and development tools to simplify development. If we are not sure about this degree, we will continue to maintain and upgrade later.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.