Turning to a database design technique written by others is not necessarily correct. Open up your ideas. When it comes to databases, I don't think we have to talk about the data structure first. In 1996, when I first went to college to study computer programming, the teacher told us: ComputerProgram= Data Structure +Algorithm. Although the current development of programs has gradually transitioned from process-oriented to object-oriented, I strongly agree with the formula that the teacher told us eight years ago: computer programs = data structures + algorithms. The first thing to do in object-oriented program development is to analyze the data to be processed in the entire program, extract the abstract template from it, and design the class with this abstract template, then, add a function (algorithm) to process the data. Finally, assign access permissions to the data members and functions in the class to encapsulate the data. The original prototype of the database is said to have originated from the bookkeeping of a dairy farm in the United States (paper, it can be seen that the database is not necessarily stored in the computer data Pai_^ ), Which records the income and expenditure accounts of the dairy farm. Programmers are inspired to organize and input it into the computer. When the amount of data collected according to the specified data structure reaches a certain level, the programmer separates the retrieval, update, maintenance, and other functions from the program execution efficiency, as a separate calling module, this module gradually develops and evolves into the database management system (DBMS) that we are currently exposed to-an important branch in program development. Next, let's start with the question. First of all, we will give the Database Designer the skills of my personal programs: 1 Programmers who have not systematically studied data structures. This type of programmer's work is often just their impromptu toys. They often get used to designing only a few limited tables, and all the data for certain functions is inserted in one table, with almost no association between the tables. A lot of free management software on the internet is like this. When the program function is limited and the data volume is small, it is no problem to run the program. However, if it is used to manage important data, high risk. 2 The system has learned the data structure, but has not developed any management software programmers who require high program efficiency. Most of these students have just graduated from school. When designing the database table structure, they strictly follow the rules in the textbooks and do nothing- R diagram and 3nf (Don't be discouraged, all database design experts start from this step ). Their work is sufficient for general access-type lightweight management software. However, once the system needs to add new features, the original database tables will almost have to be replaced. 3 The second type of programmers, after several program efficiency improvements and functional upgrades, finally upgraded to the database design laruence, the first class of programmers in the eyes of high people. These programmers can develop medium-sized commercial data management systems with more than 20 tables. They know under what circumstances they should keep a certain amount of redundant data to improve program efficiency, and their database design is scalable. When users need to add new features, you only need to make a few changes to the original database table. 4 After experiencing repeated design of ten similar database management software programs, the third type of programmers insisted on not switching, but hoped to find out the "lazy" tips from them, and those who are interested will gradually become enlightened, thus, the transformation from the completion volume to the qualitative change. The database table structure they designed has a certain foresight and can predict the data required for future function upgrades, thus leaving a foreshadowing in advance. Most of these programmers are now promoted to senior software developers in data mining. 5 Third-class programmers or fourth-class programmers, after studying the principles and development of existing database management systems, either conduct secondary development based on them, you can either develop a universal database management system with independent copyrights. I am personally at the end of the third category, so some of the design skills listed below are only applicable to the second and some third-category database designers. At the same time, as I rarely encounter colleagues who are interested in drilling down in this area, it is inevitable that errors and omissions will appear in this article. I would like to make a statement first. You are welcome to correct it and do not hide it in private. 8) i. Tree-based data tables many programmers have encountered tree-based data during database design. For example, a common category table is a large category. There are several subclasses below, some sub-classes have sub-classes. When the category is not determined, you want to add a new subclass under any category, or delete a category and all its subclasses, and the number of them is expected to increase gradually in the future, in this case, we will consider using a data table to save the data. According to the textbooks, the second type of programmers will probably design a data table structure similar to this: Category Table _ 1 (type_table_1) Name type constraints description type_id Int No duplicate category ID, primary key type_name Char ( 50 ) It cannot be a null type name or repeated type_father Int It is not allowed to empty the parent category ID of this class. If it is a top node, it is set to a unique value. This design is short and concise, fully satisfying 3nf, and can satisfy all user requirements. Is that all right? The answer is no! Why? Let's estimate how the user wants to list the data of this table. For the user, he certainly expects to list all the categories at a time based on the hierarchy set by him, for example: Total Category 1 Category 1. 1 Category 1. 1.1 Category 1. 2 Category 2 Category 2. 1 Category 3 Category 3. 1 Category 3. 2 ...... Let's see how many times the above table needs to be searched to display such a list (first-order tree traversal? Note that although category 1. 1 . 1 may be the record added after category 3.2, and the answer is still n times. This efficiency does not affect a small amount of data, but after the data type is expanded to dozens or even hundreds of records in the future, the table will be retrieved dozens of times for a single column, the efficiency of the entire program is not flattering. Maybe the second type of programmers will say, then I will create a temporary array or temporary table to save the first order traversal results of the type table, so that the results will be retrieved dozens of times at the first run, when listing all the type relationships again, you can directly read the temporary array or temporary table. In fact, you don't need to allocate a new memory to save the data. You just need to expand the data table and then constrain the number of types to be added, you only need to retrieve the list once. The following is the expanded data table structure: Category Table _ 2 (type_table_2) Name type constraints description type_id Int No duplicate category ID, primary key type_name Char ( 50 ) It cannot be a null type name or repeated type_father Int The parent category ID of the class cannot be empty. If it is a top node, it is set to a unique value type_layer. Char ( 6 ) It is limited to three layers and the initial value is 000000 class first order traversal, mainly to reduce the number of times the database is retrieved according to this table structure. Let's take a look at the data recorded in the above example in the table: type_id type_name type_father type_layer 1 Total category 0 000000 2 Category 1 1 010000 3 Category 1. 1 2 010100 4 Category 1. 2 2 010200 5 Category 2 1 020000 6 Category 2. 1 5 020100 7 Category 3 1 030000 8 Category 3. 1 7 030100 9 Category 3. 2 7 030200 10 Category 1. 1.1 3 010101 ...... Search by the type_layer size: select *From type_table_2 order by type_layer lists the record sets as follows: type_id type_name type_father type_layer 1 Total category 0 000000 2 Category 1 1 010000 3 Category 1. 1 2 010100 10 Category 1. 1.1 3 010101 4 Category 1. 2 2 010200 5 Category 2 1 020000 6 Category 2. 1 5 020100 7 Category 3 1 030000 8 Category 3. 1 7 030100 9 Category 3. 2 7 030200 ...... The record order is exactly the result of first-order traversal. When controlling the hierarchy of display classes, you only need to judge the values in the type_layer field, each of which has two digits. If the value is greater than 0, move two spaces to the right. Of course, in this example, the limit is a maximum of three layers, each layer can have a maximum of 99 sub-categories, as long as you modify the length and number of digits of the type_layer according to user requirements, you can change the number of restricted layers and the number of sub-categories. In fact, the above design is not only used in the category table, some online forum programs that can be displayed by tree type list mostly adopt similar design. Some people may think that the type_father field in type_table_2 is redundant data and can be removed. In this case, when inserting or deleting a category, you have to make a tedious judgment on the content of the type_layer, so I have not deleted the type_father field, this is also in line with the principle of appropriately retaining redundant data in the database design to reduce Program Complexity. I will introduce a case of intentionally increasing data redundancy. Ii. Design of the commodity information table assume that you are an electronic brain developer of a department store company. One day, the boss asks you to develop an online e-commerce platform for the company. The department store sells thousands of commodities, however, currently, we only plan to sell dozens of products online for convenient transportation. Of course, new products may be added to the e-commerce platform in the future. Now we have begun to design the product information table of the platform database. Each sold item has the same attributes, such as the product ID, product name, product category, related information, supplier, number of items included, inventory, purchase price, sales price, and discount price. You will soon design four tables: wares_type, wares_provider, and wares_info) name type constraints description type_id Int No duplicate category ID, primary key type_name Char (50 ) It cannot be a null type name or repeated type_father Int The parent category ID of the class cannot be empty. If it is a top node, it is set to a unique value type_layer. Char ( 6 ) Limited to three layers, the initial value is 000000 class first order traversal, mainly to reduce the number of times the database is retrieved supply vendor table (wares_provider) Name type constraints description provider_id Int No duplicate supplier ID, primary key provider_name Char ( 100 ) Do not allow empty Supplier name item info table (wares_info) Name type constraints description wares_id Int No duplicate item ID, primary key wares_name Char ( 100 ) Empty product name wares_type is not allowed. Int Empty product type identification is not allowed, and wares_type.type_id is associated with wares_info Char ( 200 ) Can be empty related information provider Int The supplier ID is not allowed to be empty, and the setnum is associated with wares_provider.provider_id. Int The initial value is the number of contained items. The default value is 1 stock. Int The initial value is 0 inventory. The default value is 0 buy_price money. The empty incoming price sell_price money is not allowed. The empty selling price discount money is not allowed. You can check the three tables for the boss, the boss hopes to add another item image field, but only some items have images. OK. You added a haspic bool field to the item info table (wares_info), and then created a new table-product image table (wares_pic): product image table (wares_pic) name type constraints pic_id Int No duplicate product image ID, primary key wares_id Int It is not allowed to enter the product ID and associate it with pic_address with wares_info.wares_id. Char ( 200 Is not allowed to empty image storage path after the program development is complete, fully meet the boss's current requirements, so officially started. After a while, the boss plans to launch new product sales on this platform. All of these products must be added with the "length" attribute. The first round of hard work ...... Of course, you added a bool field of haslength in the item info table (wares_info) according to the old method of adding the item image table, and created a new table-the item length table (wares_length): item length table (wares_length) Name type constraints description length_id Int No duplicate product image ID, primary key wares_id Int The product ID cannot be empty, and the length is associated with wares_info.wares_id. Char ( 20 ) The length of the item cannot be empty. It is not long before the change. The boss plans to add a new batch of goods. This time, all the commodities need to be added with the "width" attribute. You have bitten your teeth, caught your teeth, and added the commodity width table (wares_width ). After a while, the boss has some new products that need to be added with the "height" attribute. Do you feel that the database you designed is growing in this way, will it soon become a maze? So, is there any way to curb this unpredictability, but it is similar to repeated database expansion? I have read Agile Software Development: Principles, models, and practices and I have found similar examples: 7.3 "Copy" program. Among them, I strongly agree with the idea of Agile Software Development: At first there was almost no pre-design, but once the demand changes, as a programmer pursuing excellence, the entire architecture design should be reviewed from the ground up to design a system architecture that can meet similar changes in the future. The following is a solution for modifying the "length" attribute: remove the haspic field in the item info table (wares_info) and add the additional item Attribute Table (wares_ex_property) and the additional item information table (wares_ex_info) to add new attributes. Item additional Attribute Table (wares_ex_property) Name type constraint description ex_pid Int No additional attribute identifier for repeated items, primary key p_name Char ( 20 ) It is not allowed to leave an additional property name item Additional Info table (wares_ex_info) Name type constraints description ex_iid Int No duplicate item additional information identifier, primary key wares_id Int The product ID cannot be empty, and the property_id is associated with wares_info.wares_id. Int The additional property identifier of the item cannot be empty and is associated with property_value of wares_ex_property.ex_pid. Char ( 200 ) Do not allow null additional item property values to add two records in the additional item Attribute Table (wares_ex_property): ex_pid p_name 1 Product images 2 The product length is added to the background management function of the entire e-commerce platform to manage additional product attributes. New attributes will appear when new products are added later, you only need to use this function to add a record to the additional product Attribute Table (wares_ex_property. Don't be afraid of changes. It's not a bad thing to be hit by the first bullet. The bad thing is to be hit by the second and third bullets flying in the same orbit. The sooner the first bullet arrives, the heavier the injury, and the stronger the resistance. 8) (to be continued)