Talking about the technique of database design (i)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Skills | design | data | database | Database design when it comes to databases, I don't think we have to talk about data structures first. In 1996, when I first entered the university to study computer programming, then the teacher told us that: computer program = data structure + algorithm. Although the current process development has been oriented towards the main object-oriented, but I strongly agree with 8 years ago the teacher told us the formula: computer program = data structure + algorithm. The first thing to do in object-oriented program development is, first, we analyze the data to be processed in the whole program, extract the abstract template, design the class with this abstract template, then add the function (that is, the algorithm) to process its data, and finally, the data members and functions in the class are divided into access rights, so as to realize encapsulation.

The initial prototype of the database is said to originate from a dairy farm in the United States Books Book Thin (paper, this shows that the database is not necessarily stored in the computer data ^_^), which records the dairy farm's income and expenditure accounts, programmers in their collation, input to the computer in which the inspiration. When the amount of data collected in accordance with the stipulated good data structure is large to a certain extent, due to the efficiency of the program implementation, the programmer will be the search, update maintenance and other functions to separate, to make a separate call module, the module later slowly developed, evolved into the now we have access to the database management system (DBMS)-- An important branch of program development.

The following to get to the point, first of all, according to my personal contact with the program to the database design staff of the foundation of the class:
1, no system to learn the data structure of the programmer. This type of programmer's work is often just their impromptu toys, they are often accustomed to only design a limited number of tables, the implementation of a certain type of functionality of the data are all stuffed in a table, the tables are almost unrelated. A lot of free management software on the Internet are such things, when the program function is limited, the amount of data is not much, its program runs no problem, but if the use of its management more important data, the risk is very large.
2, the system has studied the data structure, but has not developed to the program efficiency requirements higher management software programmer. Most of these people have just graduated from school, they are in the design of database table structure, in strict accordance with the provisions of the textbook, merely e-R diagram and 3NF (Don't lose heart, all database Design Master is from this step started). Their work, for general access-style lightweight management software, is sufficient. But once the system needs to add new features, the original database table is almost a big exchange.
3, the second category of programmers, experienced several procedures to improve the efficiency, as well as upgrade the function of the toss, and finally upgraded to become a database design veteran, the first category of programmers in the eyes of the expert. This type of programmer is capable of developing more than 20-meter business data management systems. They know what kind of situation to keep a certain amount of redundant data to improve the efficiency of the program, and its design of the database can be extended better, when users need to add new features, the original database table only to do a small number of changes can be.
4, after the experience of the 10 similar database management software of the repeated design, the third type of programmers insist on not a career change, but hope to find out the "lazy" tips of the conscientious will slowly awakening, thus completing the transformation of quantitative to qualitative changes. They designed a database table structure with a certain foresight, can predict the future function upgrades required data, thus leaving the foreshadowing beforehand. Most of these programmers now qualify as senior software developers in data mining.
5, the third class of programmers or the fourth class of programmers, in the existing database management system of the principle and development of a certain degree of research, or on its basis for two development, or to develop a set of independent copyright of the general database management system.

I am personally at the end of the third category, so some of the design techniques listed below are suitable for the second and third class database designers. At the same time, because I rarely encountered interest in this area to drill down the peer, so the article will inevitably appear errors and omissions, in this first statement, welcome everyone to correct, do not hide the private Oh 8

Data table of tree type relation
Many programmers in database design have encountered the tree-type relationship of data, such as the common category table, that is, a large class, the following are a number of subcategories, some subclasses and subclasses such cases. When the category is uncertain, the user wants to be able to add a new subclass under any category, or delete a category and all the subclasses under it, and expect the number to grow gradually in the future, we will consider using a datasheet to hold the data. According to textbooks, the second type of programmer would probably design a datasheet structure like this:

Category Table _1 (type_table_1)
Description of name type constraint condition
type_id int no duplicate category ID, primary key
Type_name Char (50) is not allowed to be an empty type name, duplicate is not allowed
Type_father int is not allowed to identify the parent class of this class, if the top node is set to a unique value

This design is short and short, fully meet the 3NF, and can meet the requirements of all users. Is that what it's all about? The answer is no!. Why?

Let's estimate how the user wants to list the data for this table. For the user, of course, he expects to list all the categories at once by the hierarchical relationship he sets, such as:
General Category
Category 1
Category 1.1
Category 1.1.1
Category 1.2
Category 2
Category 2.1
Category 3
Category 3.1
Category 3.2
......

See how many times you can retrieve the above table to achieve such a list display (the tree's first order traversal)? Note that although the category 1.1.1 may be a record added after category 3.2, the answer is still n times. This kind of efficiency for a small amount of data does not affect, but later type expansion to dozens of or even hundreds of records, a single column of the type will be retrieved dozens of times the table, the whole process of operating efficiency will not be flattering. Perhaps the second type of programmer would say, I'll build a temporary array or temporary table, specifically to save the first order of the type table traversal results, so that only the first run time to retrieve dozens of times, the list of all the types of relationships directly read that temporary array or temporary table on the line. In fact, do not need to allocate a new memory to save this data, as long as the data table for a certain expansion, and then add the type of the number of constraints on the line, to complete the list of the above can only be retrieved once. The following is the expanded datasheet structure:

Category Table _2 (type_table_2)
Description of name type constraint condition
type_id int no duplicate category ID, primary key
Type_name Char (50) is not allowed to be an empty type name, duplicate is not allowed
Type_father int is not allowed to identify the parent class of this class, if the top node is set to a unique value
The Type_layer char (6) is limited to 3 layers, the initial value is 000000 categories of first-order traversal, mainly to reduce the number of database retrieval

Following this table structure, let's take a look at how the data in the table is recorded in the above example:

type_id type_name Type_father Type_layer
1 Total Categories 0 000000
2 Category 1 1 010000
3 Category 1.1 2 010100
4 Category 1.2 2 010200
5 Category 2 1 020000
6 Category 2.1 5 020100
7 Category 3 1 030000
8 Category 3.1 7 030100
9 Category 3.2 7 030200
10 Category 1.1.1 3 010101
......

Now search by Type_layer size: SELECT * from type_table_2 ORDER by Type_layer

The recordset is listed as follows:

type_id type_name Type_father Type_layer
1 Total Categories 0 000000
2 Category 1 1 010000
3 Category 1.1 2 010100
10 Category 1.1.1 3 010101
4 Category 1.2 2 010200
5 Category 2 1 020000
6 Category 2.1 5 020100
7 Category 3 1 030000
8 Category 3.1 7 030100
9 Category 3.2 7 030200
......

The record order listed now is exactly the result of the first-order traversal. When you control the hierarchy of the display categories, you can only judge the values in the Type_layer field by moving 2 spaces to the right of each 2-bit group, such as greater than 0. Of course, I set the limit in this example is up to 3 levels, each layer can be set up to 99 subcategories, as long as the user needs to modify the length and number of type_layer, you can change the limit layer and subcategory number. In fact, the above design is not only used in the category table, some of the internet can be shown by tree-type list of forum programs mostly using a similar design.

It may be argued that the Type_father field in type_table_2 is redundant data that can be removed. If so, in the insertion, deletion of a category, you have to type_layer the content of the more cumbersome to determine, so I did not eliminate the Type_father field, which is also consistent with the database design appropriate to retain redundant data to reduce the complexity of the principle of the program, I'll give you a case of intentionally increasing data redundancy.

Second, the design of the commodity information table
Assuming you are a developer at a department store computer, one day the boss asks you to develop an online E-commerce platform for the company, which has thousands of items for sale, but is currently only going to sell dozens of kinds of goods that are convenient to transport online, of course, In the future may be on the E-commerce platform to add new merchandise for sale. The design of the commodity information table of the platform database is now started. Each sale of goods will have the same attributes, such as commodity number, commodity name, commodity category, relevant information, suppliers, the number of components, inventory, purchase price, sales price, preferential prices. You quickly design 4 tables: Commodity list (Wares_type), Supplier table (Wares_provider), Commodity information Table (Wares_info):

Commodity Type table (wares_type)
Name type constraint                      Instructions
type_id   int       No Duplicates & nbsp;                 Category ID, primary key
Type_name char (m)     not allowed to null                     type name, do not allow duplicate
type_father   int          not allowed for null                     the parent class identification of the category, if the top node is set to a unique value
type_layer    char (6)       limited to 3 layers, the initial value is 000000       category of the first sequence traversal, mainly to reduce the number of database retrieval

Supplier List (Wares_provider)
Description of name type constraint condition
provider_id int no duplicate supplier identification, primary key
provider_name Char (100) is not allowed for empty supplier name

Commodity Information table (Wares_info)
Description of name type constraint condition
wares_id int no duplicate product identification, primary key
Wares_name Char (100) is not allowed for empty commodity names
Wares_type int is not allowed for empty commodity type identification, and WARES_TYPE.TYPE_ID Association
Wares_info char (200) allows null-related information
provider int is not allowed for empty vendor identification, and WARES_PROVIDER.PROVIDER_ID Association
Setnum int has an initial value of 1 inclusions, default is 1
stock int initial value is 0 inventory, default is 0
Buy_price money is not allowed for empty purchase price
Sell_price money is not allowed for empty sales price
Discount money is not allowed for empty preferential price

You take these 3 tables for the boss to check, the boss wants to be able to add a product picture of the field, but only a part of the product has pictures. OK, you added a haspic bool field to the Commodity Information table (Wares_info), and then built a new table--The Merchandise picture table (Wares_pic):

Product picture table (Wares_pic)
Description of name type constraint condition
pic_id int no duplicate merchandise picture ID, primary key
wares_id int is not allowed for empty-owned commodity identities, and wares_info.wares_id associated
Pic_address char (200) does not allow a path to be stored for an empty picture

After the completion of the program development, fully meet the boss's current requirements, and then officially opened. After a period of time, the boss intends to launch a new product sales on this platform, in which, a category of goods all need to add "Length" attribute. The first round of tossing and going ... Of course, you add a haslength type bool field to the Commodity Information table (Wares_info) by adding the old method of the product picture table, and a new table--The Commodity length table (wares_length):

Commodity length table (wares_length)
name type constraint            &nbs p;        description
length_id     int    No repetition                       Merchandise picture identification, primary key
wares_id      int          not allowed for null                       owning commodity identification, and WARES_INFO.WARES_ID Association
length       Char (     do not allow for empty product length description

Just changed not long, the boss is going to a new batch of goods, this time a category of goods all need to add "width" attribute. You bite the teeth, and according to the prescription, added the product width table (wares_width). After a while, the boss of the new product has some need to add "height" attributes, you are not beginning to feel that the database you designed to grow in this way, will soon become a maze it? So is there any way to curb this unpredictability, but similar to a duplication of database inflation? In my reading of Agile Software Development: principles, patterns, and practices, I found that the authors cite similar examples: 7.3 "Copy" programs. Among them, I very much agree with the idea of agile software development: In the beginning almost no upfront design, but once the requirements change, at this time as a pursuit of excellence, the programmer should review the entire architecture design, in this revision to meet similar changes in the future system architecture. The following are the modifications I provide when I need to add a "Length" attribute:

Remove the Haspic field from the Commodity Information table (Wares_info), add the Item Extra property sheet (wares_ex_property) and the Item Extra Information table (WARES_EX_INFO) 2 tables to complete the ability to add new properties.

Product Extra Property sheet (wares_ex_property)
Description of name type constraint condition
ex_pid int no duplicate merchandise extra attribute identification, primary key
P_name Char (20) is not allowed to empty extra attribute names

Product extra Information table (WARES_EX_INFO)
Description of name type constraint condition
EX_IID int no duplicate merchandise extra information identification, primary key
wares_id int is not allowed for empty-owned commodity identities, and wares_info.wares_id associated
property_id int does not allow additional attribute identification for empty merchandise, and Wares_ex_property.ex_pid Association
Property_value char (200) is not allowed for empty merchandise extra property values

Add 2 records to the item extra property sheet (Wares_ex_property):
Ex_pid P_name
1 Product Pictures
2 Product length

In addition to the entire E-commerce platform in the background management function to append a product additional property management function, add new products in the future when new properties appear, just use this feature to the product additional property sheet (Wares_ex_property) to add a record. Don't be afraid of change, hit by the first bullet is not a bad thing, the bad is by the same track to fly the second, the third bullet hit. The earlier the first bullet came, the heavier the injury, and the stronger the Resistance 8) (to be continued)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Talking about the technique of database design (i)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Talking about the technique of database design (i)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support