When it comes to databases, I don't think we have to talk about the data structure first. In 1996, when I first went to college to study computer programming, the teacher told us that computer programs = data structures + algorithms. Although the current development of programs has gradually transitioned from process-oriented to object-oriented, I strongly agree with the formula that the teacher told us eight years ago: computer programs = data structures + algorithms. The first thing to do in object-oriented program development is to analyze the data to be processed in the entire program, extract the abstract template from it, and design the class with this abstract template, then, add a function (algorithm) to process the data. Finally, assign access permissions to the data members and functions in the class to encapsulate the data.
The original prototype of the database is said to have originated from the bookkeeping of a dairy farm in the United States (paper, it can be seen that the database is not necessarily stored in the computer's data ^_^ ), it records the income and expenditure accounts of the dairy farm. Programmers are inspired to organize and input the income and expenditure accounts into the computer. When the amount of data collected according to the specified data structure reaches a certain level, the programmer separates the retrieval, update, maintenance, and other functions from the program execution efficiency, as a separate calling module, this module gradually develops and evolves into the database management system (DBMS) that we are currently exposed to-an important branch in program development.
Next, let's start with the question. First of all, we will give the Database Designer the skills of my personal programs:
1. programmers who have not systematically studied data structures. This type of programmer's work is often just their impromptu toys. They often get used to designing only a few limited tables, and all the data for certain functions is inserted in one table, with almost no association between the tables. A lot of free management software on the internet is like this. When the program function is limited and the data volume is small, it is no problem to run the program. However, if it is used to manage important data, high risk.
2. The system has learned the data structure, but has not developed any programmers who want to manage software with high program efficiency. Most of these people just graduated from school soon, they in the design of the database table structure, strictly in accordance with the provisions of the textbook, the E-R and 3nf (Don't lose heart, all database design experts start from this step ). Their work is sufficient for general access-type lightweight management software. However, once the system needs to add new features, the original database tables will almost have to be replaced.
3. The second type of programmers, after several program efficiency improvements and functional upgrades, finally upgraded to the database design veteran, the first class of programmers in the eyes of high people. These programmers can develop medium-sized commercial data management systems with more than 20 tables. They know under what circumstances they should keep a certain amount of redundant data to improve program efficiency, and their database design is scalable. When users need to add new features, you only need to make a few changes to the original database table.
4. After having undergone repeated design of ten similar database management software, the third type of programmers will not change their minds, but those who wish to find out the "lazy" tips will gradually become enlightened, thus, the transformation from the completion volume to the qualitative change. The database table structure they designed has a certain foresight and can predict the data required for future function upgrades, thus leaving a foreshadowing in advance. Most of these programmers are now promoted to senior software developers in data mining.
5. Third-class programmers or fourth-class programmers, after studying the principles and development of existing database management systems, either conduct secondary development based on them, you can either develop a universal database management system with independent copyrights.
I am personally at the end of the third category, so some of the design skills listed below are only applicable to the second and some third-category database designers. At the same time, as I rarely encounter colleagues who are interested in drilling down in this area, it is inevitable that errors and omissions will appear in this article. I would like to make a statement first. You are welcome to correct it and do not hide it in private. 8)
1. Data Tables with tree relationships
Many programmers have encountered tree-based data during database design. For example, a common category table, that is, a large category, contains several sub-classes, some sub-classes have sub-classes. When the category is not determined, you want to add a new subclass under any category, or delete a category and all its subclasses, and the number of them is expected to increase gradually in the future, in this case, we will consider using a data table to save the data. According to the textbooks, the second type of programmers will probably design a data table structure similar to this:
Category Table _ 1 (type_table_1)
Description of name type constraints
Type_id int no duplicate category ID, primary key
Type_name char (50) cannot be empty or repeated
Type_father int cannot be null. If it is a top node, it is set to a unique value.
This design is short and concise, fully satisfying 3nf, and can satisfy all user requirements. Is that all right? The answer is no! Why?
Let's estimate how the user wants to list the data of this table. For the user, he certainly expects to list all the categories at a time based on the hierarchy set by him, for example:
Total category
Category 1
CATEGORY 1.1
CATEGORY 1.1.1
CATEGORY 1.2
Category 2
CATEGORY 2.1
Category 3
CATEGORY 3.1
CATEGORY 3.2
......
Let's see how many times the above table needs to be searched to display such a list (first-order tree traversal? Note that although category 1.1.1 may be a record added after category 3.2, the answer is still n times. This efficiency does not affect a small amount of data, but after the data type is expanded to dozens or even hundreds of records in the future, the table will be retrieved dozens of times for a single column, the efficiency of the entire program is not flattering. Maybe the second type of programmers will say, then I will create a temporary array or temporary table to save the first order traversal results of the type table, so that the results will be retrieved dozens of times at the first run, when listing all the type relationships again, you can directly read the temporary array or temporary table. In fact, you don't need to allocate a new memory to save the data. You just need to expand the data table and then constrain the number of types to be added, you only need to retrieve the list once. The expanded data table structure is as follows:
Category Table _ 2 (type_table_2)
Description of name type constraints
Type_id int no duplicate category ID, primary key
Type_name char (50) cannot be empty or repeated
Type_father int cannot be null. If it is a top node, it is set to a unique value.
Type_layer char (6) is limited to three layers. The initial value is the first sequential traversal of the 000000 category, mainly to reduce the number of times the database is retrieved.
According to this table structure, let's take a look at the data recorded in the table in the above example:
Type_id type_name type_father type_layer
1. Total Category 0 000000
2 Category 1 1 010000
3 category 1.1 2 010100
4 category 1.2 2 010200
5 Category 2 1 020000
6 category 2.1 5 020100
7 category 3 1 030000
8 category 3.1 7 030100
9 category 3.2 7 030200
10 category 1.1.1 3 010101
......
Search by type_layer: Select * From type_table_2 order by type_layer
The record set is listed as follows:
Type_id type_name type_father type_layer
1. Total Category 0 000000
2 Category 1 1 010000
3 category 1.1 2 010100
10 category 1.1.1 3 010101
4 category 1.2 2 010200
5 Category 2 1 020000
6 category 2.1 5 020100
7 category 3 1 030000
8 category 3.1 7 030100
9 category 3.2 7 030200
......
The record order is exactly the result of first-order traversal. When controlling the hierarchy of display classes, you only need to judge the values in the type_layer field, each of which has two digits. If the value is greater than 0, move two spaces to the right. Of course, in this example, the limit is a maximum of three layers, each layer can have a maximum of 99 sub-categories, as long as you modify the length and number of digits of the type_layer according to user requirements, you can change the number of restricted layers and the number of sub-categories. In fact, the above design is not only used in the category table, some online forum programs that can be displayed by tree type list mostly adopt similar design.
Some people may think that the type_father field in type_table_2 is redundant data and can be removed. In this case, when inserting or deleting a category, you have to make a tedious judgment on the content of the type_layer, so I have not deleted the type_father field, this is also in line with the principle of appropriately retaining redundant data in the database design to reduce Program Complexity. I will introduce a case of intentionally increasing data redundancy.
Ii. Design of commodity information table
Assume that you are an electronic brain developer of a department store. One day, the boss asks you to develop an online e-commerce platform for the company. The department store sells thousands of products, however, currently, we only plan to sell dozens of products online for convenient transportation. Of course, new products may be added to the e-commerce platform in the future. Now we have begun to design the product information table of the platform database. Each sold item has the same attributes, such as the product ID, product name, product category, related information, supplier, number of items included, inventory, purchase price, sales price, and discount price. You will soon design four tables: wares_type, wares_provider, and wares_info ):
Item type table (wares_type)
Description of name type constraints
Type_id int no duplicate category ID, primary key
Type_name char (50) cannot be empty or repeated
Type_father int cannot be null. If it is a top node, it is set to a unique value.
Type_layer char (6) is limited to three layers. The initial value is the first sequential traversal of the 000000 category, mainly to reduce the number of times the database is retrieved.
Supplier table (wares_provider)
Description of name type constraints
Provider_id int no duplicate supplier ID, primary key
Provider_name char (100) cannot be empty Supplier name
Item info table (wares_info)
Description of name type constraints
Wares_id int no duplicate item ID, primary key
Wares_name char (100) cannot be empty product names
Wares_type int cannot be a null product type identifier, which is associated with wares_type.type_id
Wares_info char (200) allows null Information
Provider int cannot be a blank supplier ID, which is associated with wares_provider.provider_id
The initial value of setnum int is 1. The default value is 1.
The initial value of stock int is 0. The default value is 0.
Buy_price money cannot be empty purchase price
Sell_price money cannot be empty sale price
Discount money cannot be empty discount price
You check the three tables for the boss. The boss hopes to add another item image field, but only some items have images. OK. You added a haspic bool field to the item info table (wares_info), and then created a new table -- Item image table (wares_pic ):
Product image table (wares_pic)
Description of name type constraints
Pic_id int no duplicate product image ID, primary key
Wares_id int cannot be empty product identifier, associated with wares_info.wares_id
Pic_address char (200) cannot be empty image storage path
After the program development is completed, it fully meets the current requirements of the boss and is officially enabled. After a while, the boss plans to launch new product sales on this platform. All of these products must be added with the "length" attribute. The first round of hard work ...... Of course, you added a bool field of haslength in the item info table (wares_info) according to the old method of adding the item image table, and created a new table-the item length table (wares_length):
Item length table (wares_length)
Description of name type constraints
Length_id int no duplicate product image ID, primary key
Wares_id int cannot be empty product identifier, associated with wares_info.wares_id
Length char (20) cannot be empty item length description
Not long after the change, the boss planned to add a new batch of products. This time, all the products of this type need to be added with the "width" attribute. You have bitten your teeth, caught your teeth, and added the commodity width table (wares_width ). After a while, the boss has some new products that need to be added with the "height" attribute. Do you feel that the database you designed is growing in this way, will it soon become a maze? So, is there any way to curb this unpredictability, but it is similar to repeated database expansion? I have read Agile Software Development: Principles, models, and practices and I have found a similar example: 7.3 "copy" program. Among them, I strongly agree with the idea of Agile Software Development: At first there was almost no pre-design, but once the demand changes, as a programmer pursuing excellence, the entire architecture design should be reviewed from the ground up to design a system architecture that can meet similar changes in the future. The following is the modification scheme I provided when I want to add the "length" attribute:
Remove the haspic field in the item info table (wares_info) and add two additional item attribute tables (wares_ex_property) and item extra information table (wares_ex_info) to add new attributes.
Additional item Attribute Table (wares_ex_property)
Description of name type constraints
Ex_pid int no duplicate item additional attribute identifier, primary key
P_name char (20) cannot be empty.
Additional item info table (wares_ex_info)
Description of name type constraints
Ex_iid int no duplicate item additional information identifier, primary key
Wares_id int cannot be empty product identifier, associated with wares_info.wares_id
Property_id int cannot be empty item additional property identifier, associated with wares_ex_property.ex_pid
Property_value char (200) does not allow null additional item attribute values
Add two records to the additional item Attribute Table (wares_ex_property:
Ex_pid p_name
1. product images
2 Commodity Length
Add an additional item property management feature to the background management function of the entire e-commerce platform. New attributes will appear when new items are added later, you only need to use this function to add a record to the additional product Attribute Table (wares_ex_property. Don't be afraid of changes. It's not a bad thing to be hit by the first bullet. The bad thing is to be hit by the second and third bullets flying in the same orbit. The sooner the first bullet arrives, the heavier the injury, and the stronger the resistance. 8) (to be continued)