MySQL series: Database Design
Preface
Here we will re-pick up the database design knowledge that we were not very familiar with before and make a simple sorting of knowledge. Previously, I thought that database design is nothing more than creating a database, creating a table, adding fields, and determining the field type (this is random), and so on. When the system re-learns the database knowledge, it is found that the database design also has a set of specifications similar to the software development process, and each step has a different focus.
Concept of Database Design
Simply put, database design is the process of analyzing requirements, logical design, physical design, maintenance, and optimization. We can see that the database design is not only reflected in the software development process, but also in the later maintenance of the software. (Time period)
The software requirement analysis here is not the same as the requirement analysis in the software development process. The requirement analysis in the database design focuses more on the data source (what data), data attributes, and the characteristics of data and attributes.
A series of database design processes must be combined with our existing DBMS to design tables and relationships between tables, so as to store and efficiently access data.
Database Design Steps
Before that, I had another question: Why should I design a database? In my summary, there are the following benefits:
A good database design helps reduce data redundancy and operation exceptions, as well as limited data storage and efficient access. The experiment management system that was designed and implemented by myself after graduation is because the database is not well designed, which makes it difficult to search for data in the future. The fact is, writing a lot of SQL code does not necessarily get the expected results, so this is the consequence of poor database design. You should take it as an example.
Database Design Steps
OK. Let's get down to the truth. The database design involves four steps:
Requirement Analysis Logic Design physical design maintenance and Optimization
As a non-professional DBA, in line with the principle of practicality as the king of the road, I think that the knowledge of the surrounding area can be understood without looking into it. Therefore, we will not detail the final maintenance and optimization. If you are interested in this part, you can refer to the database design tutorial.
Requirement Analysis
Requirement analysis needs to solve three problems:What data does the software need?,Attributes of dataAndFeatures of data attributes. First, the data required by the software is determined by the software business. This can be seen in the previous requirement documents. The attribute of the data is the fields in each table of the database, A data attribute is an indispensable element of data. A row of data in a database becomes a basic unit of data, also known as a tuples. A data attribute is characterized by analyzing whether the data needs to be permanently saved, if yes, the data will always exist in the Database. If no, the data cannot always exist in the Database (this type of data is time-sensitive and involves frequent read/write operations ).
Logic Design
Logic Design to undertake the demand analysis, to solve the core problem of one: Drawing E-R diagram. E-R diagram is the process of converting the result of Requirement Analysis into logic model. A E-R diagram consists of three elements:Entity set,Attribute SetAndContact set. The object set has the same attributes, the property set is the object, and the contact set is composed of links between entities (the links here include multiple links, this will be detailed in subsequent articles ). The so-called "logical design" is irrelevant to the specific DBMS. To draw a E-R diagram, you need to understand the following concepts:
Link: A link corresponds to a table entity in the database: database management object tuples with the same attributes: a row of data attributes of the Table: Each attribute corresponds to a column of primary keywords in the database table: one or more attribute candidate keywords that uniquely identify an object: this is a situation where multiple attributes are required to identify an object.
The basic concepts are explained in the following example. The following describes the projects I am working on:
The whole system is a student information management system with many modules. I am responsible for accommodation information management. After requirement analysis, I finally confirmed that this module has three functions: student Item Repair report, water and electricity information query by students, accommodation registration information of boarding management, repair Report of property handling items, and accommodation information allocation by logistics.
Student :{Student ID, Name, gender, contact information, Dormitory number}
Cebu :{Cebu ID, Name, gender, contact info}
Property :{Property Management Personnel ID, Name, gender, contact info}
Logistics :{Logistics Management Personnel ID, Name, gender, contact info}
Accommodation info table :{Id, Name, student ID, gender, Dormitory number, Major, class number, contact information}
Dormitory :{Dormitory ID, Building no., dormitory No}
Item Repair information table :{Id, Item name, damage, repair reporter, contact information, Dormitory number, repair time, emergency degree, whether to handle}
Water Information :{Id, Dormitory number, water usage, remaining water volume this month, remaining amount, overdue payment status}
Electricity usage information :{Id, Dormitory number, electricity usage, remaining power for this month, remaining amount, in arrears}
Based on the above data, you can draw the following E-R diagram:
Because the source image is too large, only the part is displayed. The following briefly describes the content. Each rectangle represents an object, and each object has an attribute set, This field cannot be blank, Indicates that this field is the primary key of the object,This field is a candidate keyword of the object. There are various links between objects, and the lines in the figure indicate the specific links between objects. The following is a brief description of what the contact is:
There are four basic typesRelationship): One-to-many, one-to-one, and multiple-to-one. For example, a teacher can take multiple students to attend classes, and the students also have multiple teachers. Therefore, the teachers and students are many-to-many contacts. One student can only be in one class, A class can have multiple students, so there is a many-to-one relationship between the students and the class. The analysis logic is whether one or more mappings can be established.Inheritance)For example, if both students and teachers are people, the "student" and "teacher" entities and "persons" constitute an inheritance relationship. Inheritance relationships exist for better extension in the future.Connection: The status of entities with links is equal. Think carefully:
The following is the relationship between developers, experts, and lectures:
It can be seen that if the two are not connected, the status will be unequal. Therefore, the connection relationship is quite understandable. OK, next,Dependency: Dependency means that an object cannot exist independently and must coexist with another object. For example, doors and windows must depend on the house. If there is no house, there is no need for doors and windows to exist.
Physical Design
Physical Design is the core of the final database design and a key step for the visible results. What problems should physical design solve?
Select the appropriate DBMS. Specify the naming rules for databases, tables, and fields. Select the appropriate database based on the selected DBMS.
Currently, enterprise-level databases include Oracle and SQL Server, which have high requirements on data security and capacity. Internet projects generally use MySQL and PgSQL, so select the appropriate database based on the type of your project.
Naming rules
The naming rules must follow the field readability and naming rules. Otherwise, you must create a data dictionary for any field name to increase the workload.
Confirm Field Type
For myself, I think the most difficult to judge is the char and var char Types. The two data types are particularly easy to choose, so generally they are more conservative varchar types. However, as long as you carefully analyze and find that the two types have restrictions in the expression range, the char type cannot exceed 255 bytes, so as long as it is not the common text, it can be accommodated, varchar saves space than char, but varchar is less efficient than char, which can be understood as follows: when modifying varchar data, it may be because the data length is different (taking the string "abc" as an example, the char type requires 5 bytes, And the varchar requires only 3 bytes).Row Migration)", The following Oracle official explanation of row migration:
When a row of records is initially inserted, they can be stored in a block (block is the smallest unit of disk storage). The row is increased due to the update operation, and the free space of the block is full, at this time, row migration is generated. In this case, oracle will migrate the row data to a new block, and oracle will keep the original pointer of the migrated row pointing to the new block for storing row data, this means that the row id to be migrated will not change.
It is a bit complicated, but it can be concluded that the gap between the varchar and char Types is not big, except for big data development applications.