1. Relationship between original documents and entities
It can be a one-to-one, one-to-many, and many-to-many relationship. In general, they are a one-to-one relationship: an original document corresponds to only one entity. In special circumstances, they may be one-to-many or many-to-one relationships, that is, an original document corresponds to multiple entities, or multiple original documents correspond to one entity. The entity here can be understood as a basic table. After clarifying the corresponding relationship, it will be of great benefit to the design of the input interface.
[Example 1]: A copy of employee resume. In the human resources information system, three basic tables are required: employee basic information table, social relationship table, and work history table. This is a typical example of "one original document corresponds to multiple entities.
2. Primary Key and foreign key
Generally, an object cannot have neither a primary key nor a foreign key. In e? In the r diagram, the entity in the leaf area can define the primary key or not the primary key (because it has no children), but must have a foreign key (because it has a father ).
The Design of primary keys and Foreign keys plays an important role in the design of global databases. After the design of the global database was completed, an American database design expert said: "keys are everywhere. There is nothing except keys." This is his experience in database design, it also reflects his highly abstract thoughts on the core of the information system (Data Model. Because: The primary key is the height abstraction of the object. The pairing between the primary key and the foreign key indicates the connection between the object.
3. Basic table Properties
The basic table is different from the intermediate table and temporary table because it has the following four features:
(1) atomicity. Fields in the basic table cannot be decomposed.
(2) primitive. The record in the basic table is the record of the raw data (basic data.
(3) adequacy. From Basic Tables andCodeThe data in the table. All output data can be derived.
(4) stability. The structure of the basic table is relatively stable, and the records in the table must be stored for a long time.
After understanding the nature of the basic table, you can distinguish the basic table from the intermediate table and temporary table when designing the database.
4. paradigm standards
The relationship between the basic table and its fields should satisfy the third paradigm as much as possible. However, database design that meets the third paradigm is often not the best design. In order to improve the efficiency of database operation, we often need to reduce the paradigm standard: appropriately increase redundancy to achieve the purpose of changing the space for time.
[Example 2]: There is a basic table for storing items, as shown in table 1. The existence of the "amount" field indicates that the table design does not meet the third paradigm, because the "amount" can be obtained by multiplying the "unit price" by "quantity, the "amount" is a redundant field. However, adding the redundant field "amount" can increase the speed of query statistics. This is the practice of changing the space for time.
In Rose 2002, two types of columns are required: Data columns and calculation columns. Columns such as "amount" are called "calculation columns", while columns such as "unit price" and "quantity" are called "Data columns ".
Table 1 Table Structure of the commodity table
Product Name product model unit price quantity amount
TV 29? 2,500 40 100,000
5. A general understanding of the three paradigms
A general understanding of the three paradigms is of great benefit to database design. In database design, in order to better apply the three paradigms, the three paradigms must be understood in a general sense (the general understanding is sufficient, not the most scientific and accurate understanding ):
1 paradigm: 1nf is an atomic constraint on attributes. Attributes must be atomic and cannot be decomposed;
Second paradigm: 2nf is the uniqueness constraint on the record and requires that the record have a unique identifier, that is, the uniqueness of the object;
Third paradigm: 3nf is a constraint on field redundancy, that is, any field cannot be derived from other fields, and it requires that the fields are not redundant.
No redundant database can be designed. However, databases without redundancy may not be the best. Sometimes, to improve operational efficiency, we must lower the paradigm standard and properly retain redundant data. The specific approach is to follow the third paradigm in conceptual data model design, and to reduce the paradigm standard to be considered in physical data model design. Reducing the paradigm is to add fields and allow redundancy.
6. Be good at identifying and correctly handling many-to-many relationships
If there is a many-to-many relationship between two entities, this relationship should be eliminated. The solution is to add a third entity between the two. In this way, the original multi-to-many relationship is now two one-to-many relationships. The attributes of the original two entities should be reasonably allocated to the three entities. The third entity here is essentially a complex relationship, which corresponds to a basic table. Generally, database design tools cannot identify many-to-many relationships, but can process many-to-many relationships.
[Example 3]: In the "library information system", "books" are an entity, and "readers" are also an entity. The relationship between the two entities is a typical many-to-many relationship: A book can be borrowed by multiple readers at different times, and one reader can borrow multiple books. To this end, you need to add a third entity between the two. This entity is named "borrow and return". Its attribute is: Borrow and return time, and borrow and return sign (0 indicates borrowing and returning, 1 indicates Returning books). In addition, it should have two foreign keys (primary keys of "books" and primary keys of "readers ), enable it to connect to "books" and "readers.
7. Primary Key pK value method
PK is suppliedProgramThe table connection tool used by members can be a numerical string without physical significance, which is automatically implemented by the program by adding 1. It can also be a combination of physical field names or field names. However, the former is better than the latter. When PK is a combination of field names, it is recommended that the number of fields should not be too large. If it is more, not only the index occupies a large space, but also the speed is slow.
8. Correct understanding of data redundancy
Duplicate primary keys and Foreign keys appear in multiple tables, not data redundancy. This concept must be clear. In fact, many people still do not know. The repeated occurrence of Non-key fields is data redundancy! It is also a low-level redundancy, that is, repetitive redundancy. Advanced redundancy does not appear repeatedly, but is derived from a field.
[Example 4]: the unit price, quantity, and amount fields in the product. The "amount" is derived from the "unit price" multiplied by the "quantity", which is redundancy, it is also a type of advanced redundancy. Redundancy is designed to speed up processing. Only low-level redundancy increases data inconsistency, because the same data may be input multiple times at different times, locations, and roles. Therefore, we advocate advanced redundancy (derivative redundancy) against low-level redundancy (repetitive redundancy ).
9. There is no standard answer to the E--R Diagram
There is no standard answer to the E--R diagram of the information system, because its design and painting is not the only, as long as it covers the business scope and functional content of the system requirements, is feasible. Instead, modify the E--R diagram. Although it does not have the only standard answer, it does not mean that it can be designed at will. Okay, E? The r diagram standards are: clear structure, concise Association, moderate number of entities, reasonable attribute allocation, and no low-level redundancy.
10. View technology is useful in Database Design
Unlike basic tables, code tables, and intermediate tables, a view is a virtual table that depends on the real table of the data source. A view is a window for programmers to use the database. It is a form of base table data synthesis, a method of data processing, and a means of user data confidentiality. In order to perform complex processing, increase computing speed, and save storage space, the definition depth of a view generally cannot exceed three layers. If the three-tier view is not enough, you should define a temporary table on the view and define the view on the temporary table. In this way, the depth of the view is unlimited.
Views play a more important role in certain information systems related to national political, economic, technical, military, and security interests. After the physical design of the basic tables of these systems is completed, the first view is created on the basic table. The number and structure of the view on this layer are exactly the same as the number and structure of the basic table. It is also stipulated that all programmers are only allowed to operate on The View. Only the "Security Key" that the database administrator can master with multiple personnel can directly operate on the basic table. Readers can think about this: Why?
11. Intermediate tables, reports, and temporary tables
An intermediate table is a table that stores statistical data. It is designed for a data warehouse, an output report, or a query result. Sometimes it does not have a primary key or a foreign key (except a data warehouse ). Temporary tables are designed by programmers and are used by individuals to store temporary records. The base table and intermediate table are maintained by the DBA, and the temporary table is automatically maintained by the programmer.
12. integrity constraints are manifested in three aspects
Domain Integrity: Use check to implement constraints. In the database design tool, when defining the value range of a field, there is a check button through which to define the value city of the field. Integrity of reference: it can be achieved through PK, FK, and table-level triggers. User-Defined Integrity: it is a business rule that is implemented using stored procedures and triggers.
13. The method to prevent Database Design patching is the "Three Principles"
(1) The fewer tables in a database, the better. Only when the number of tables is small can we explain that the E--R of the system is small and refined, remove redundant entities, form a high abstraction of the objective world, the system data integration, it prevents patching design;
(2) The fewer fields that combine primary keys in a table, the better. Because the primary key is used to create a primary key index and the other is used as a foreign key of the sub-table, the number of fields in the primary key combination is less, which not only saves the running time, it also saves the index storage space;
(3) The fewer fields in a table, the better. Only when the number of fields is small can we see that there is no data duplication in the system and there is little data redundancy. More importantly, we urge readers to "change columns to rows ", this prevents the fields in the sub-table from being pulled into the master table, leaving many blank fields in the master table. The so-called "Change columns to rows" means to pull out part of the main table and create a sub-Table separately. This method is very simple. Some people just don't get used to it, don't adopt it, don't execute it.
The practical principle of database design is to find a proper balance between data redundancy and processing speed. "Three shao" is an overall concept. A general viewpoint cannot isolate a certain principle. This principle is relative, not absolute. The "three-plus" principle is certainly incorrect. Imagine it would be much better to override the E--R diagram of one hundred entities (one thousand properties in total) with the same system functionality than the E--R diagram of two hundred entities (two thousand properties in total.
We advocate the "Three shao" principle, which allows readers to learn to use Database Design technology for system data integration. The data integration step is to integrate the file system into an application database, integrate the application database into a topic database, and integrate the topic database into a global integrated database. The higher the degree of integration, the stronger the data sharing, the fewer information islands, and the global E of the entire enterprise information system? In the r diagram, the number of entities, the number of primary keys, and the number of attributes are smaller.
The purpose of advocating the "Three shao" principle is to prevent readers from using the patch technology to constantly add, delete, and modify databases, so that enterprise databases become the "garbage dumps" for randomly designing database tables ", or the database table "Miscellaneous", and finally cause the database basic tables, code tables, intermediate tables, temporary tables in disorder, countless, leading to the failure to maintain the information system of enterprises and institutions.
The "three-plus" principle can be implemented by anyone. This principle is the theory of "patching methods" for designing databases. The "three less" principle is a little but refined principle. It requires a high level of database design skills and art, not everyone can do it, this principle is to eliminate the theoretical basis for designing databases using the patching method.
14. Ways to Improve Database Operation Efficiency
Under the given system hardware and system software conditions, the ways to improve the operational efficiency of the database system are as follows:
(1) Reduce the paradigm, increase redundancy, use fewer triggers, and use more stored procedures during database physical design.
(2) When the computation is very complex and the number of records is very large (for example, 10 million records), complex computation must first be performed outside the database, after the file system is processed in C ++, the database is appended to the table. This is the experience of designing the telecom billing system.
(3) It is found that there are too many records in a table. For example, if there are more than 10 million records, the table should be divided horizontally. The horizontal split method uses a value of the primary key PK of the table as the boundary to horizontally split the records of the table into two tables. If you find that there are too many fields in a table, for example, more than 80 fields, split the table vertically and split the original table into two tables.
(4) Optimize the Database Management System DBMS, that is, optimize various system parameters, such as the number of buffers.
(5) Optimize the program using the data-oriented SQL language as much as possible Algorithm .
In short, to improve the operational efficiency of the database, we must make efforts at the same time at the three levels: database system-level optimization, database design-level optimization, and program-level optimization.