1. The relationship between the original document and the entity (the original document can be understood as the data of the entire input interface, where the entity can be understood as the basic table)
Can be a pair of one or one-to-many, many-to-many relationships.
In general, they are one-to-one relationships: that is, an original document corresponds to a single entity (the base table).
In special cases, they may be one-to-many or many-to-one relationships, that is, a single original document corresponds to multiple entities, or multiple original documents corresponding to an entity.
Clear this correspondence, to our design input interface is very good.
Example 1〗: An employee biographical data, in the Human Resources information System, the corresponding three basic tables: Employee basic Situation table, social relations table, work Resume table. This is a typical example of "a single original document corresponding to multiple entities".
2. Primary key and foreign key
In general, an entity (the base table) cannot have no primary key and no foreign key.
In a e-r diagram, an entity in the leaf area can define a primary key or not define a primary key (because it has no descendants), but must have a foreign key (because it has a father).
The design of primary key and foreign key plays an important role in the design of global database. When the design of the global database was completed, an American database design expert said: "Key, Everywhere is the key, in addition to the key, nothing", this is his database design experience, but also reflects his information system core (data model) of the highly abstract ideas. Because: The primary key is the height abstraction of the entity, and the primary key is paired with the foreign key, representing the connection between the entities.
3. Nature of basic tables
A base table differs from an intermediate table, a temporary table, because it has the following four attributes:
(1) atomicity. The fields in the base table are non-biodegradable.
(2) Primitive nature. The records in the base table are the records of the original data (the underlying data).
(3) Deductive nature. All the output data can be derived from the base table and the data in the Code table.
(4) stability. The structure of the base table is relatively stable, and the records in the table are stored for a long time.
After you understand the nature of the base table, you can differentiate the base table from the intermediate and temporal tables when you design the database.
4. Popular understanding of three paradigms
A popular understanding of the three paradigms is of great benefit to database design. In the database design, in order to better apply the three paradigms, it is necessary to understand the three paradigms in a popular way (the popular understanding is sufficient understanding, not the most scientific and accurate understanding):
The first paradigm:1NF is an atomic constraint on attributes , requiring attributes to be atomic and non-decomposed.
The second paradigm:2NF is a unique constraint on records, requiring records to have a unique identity , that is, the uniqueness of the entity;
The third paradigm:3NF is a constraint on field redundancy, that is, any field cannot be derived from another field, it requires no redundancy in the field .
5. Paradigm Standard
The relationship between the base table and its fields should satisfy the third paradigm as much as possible. However, the design of a database that satisfies the third paradigm is often not
The best design. In order to improve the efficiency of database operation, it is often necessary to reduce the normalization standard: to increase the redundancy appropriately, to achieve the space change time
The purpose.
Example 2〗: There is a basic table for storing goods, as shown in table 1. The existence of the "Amount" field indicates that the design of the table is unsatisfactory
Foot in the third paradigm, because "amount" can be obtained by multiplying the "unit price" by "quantity", stating "Amount" is a redundant field. However, increasing
The "amount" of this redundant field can increase the speed of query statistics, which is the practice of space-changing time.
In Rose 2002, it is stipulated that there are two types of columns: data columns and computed columns. Columns such as "Amount" are referred to as "computed columns" and "
A column such as unit price and quantity is called a data column.
Table 1 table structure of the commodity table
Product name commodity model Unit Price quantity amount
TV 29 "2,500 40 100,000
There is no redundant database design to do. However, databases that do not have redundancy are not necessarily the best databases, sometimes to improve the operational
Efficiency, it is necessary to lower the standard of normalization and to preserve redundant data appropriately. Follow the third paradigm when designing a conceptual data model
, the work of lowering the standard of normalization is put into the design of physical data model. Lowering the paradigm is adding fields, allowing redundancy.
6. Be good at identifying and correctly dealing with many-to-many relationships
This relationship should be eliminated if there is a many-to-many relationship between the two entities. The solution is to add a third reality between the two
Body. In this way, the original a many-to-many relationship, now becomes two one-to-many relationship. To properly assign the attributes of the original two entities
to three entities. The third entity here is essentially a more complex relationship that corresponds to a basic table. Generally speaking, the number
The library design tool does not recognize many-to-many relationships, but it can handle many-to-many relationships.
Example 3: In "Library information System", "book" is an entity, "reader" is also an entity. The two entities '
Relationship is a typical many-to-many relationship: A book can be borrowed by multiple readers at different times, and a reader can borrow more
This book. To do this, add a third entity between the two, which is named "borrowing a book," which has the following properties: Borrowing time, borrowing
Also the logo (0 means borrowing, 1 means returning), in addition, it should also have two foreign keys ("book" The Primary Key, "reader" of the primary key), so that
It can be connected with "book" and "Reader".
7. The primary key PK value method
PK is an inter-table connection tool for programmers, which can be a string of numbers with no physical meaning, which is implemented automatically by the program. can also be
A combination of field names or field names that are physically meaningful. But the former is better than the latter. When PK is a combination of field names, the recommended field
Not too many, more not only the index occupies a large space, but also slow.
8. Correct understanding of data redundancy
The repetition of the primary key and the foreign key in multiple tables is not data redundancy, and the concept must be clear, in fact many people are unclear
。 The duplication of non-key fields is the data redundancy! And is a kind of low-level redundancy, that is, repetitive redundancy. Advanced redundancy is not a field
appears repeatedly, but the derivation of the field appears.
Example 4〗: the "unit price, quantity, Amount" Three fields in a product, "Amount" is derived from "unit price" multiplied by "quantity"
, it's redundant, and it's a high level of redundancy. The purpose of redundancy is to improve processing speed. Only low-level redundancy will increase the number of
Because the same data may be entered multiple times, at different times, locations, and roles. Therefore, we advocate advanced Redundancy (PI-
Redundant), against low-level redundancy (repetitive redundancy).
9. e--r Chart No standard answer
The E--r diagram of information system has no standard answer, because its design and drawing is not unique, as long as it covers the business of the system demand
Scope and function content, is feasible. Conversely, to modify the E--r diagram. Although it does not have the only standard answer, it does not mean that you can freely
Design. The standard of good E-r chart is: The structure is clear, the association is concise, the number of entities is moderate, the attribute allocation is reasonable, there is no low level redundancy.
10. View technology is useful in database design
Unlike basic tables, code tables, and intermediate tables, a view is a virtual table that relies on a real table of data sources. View is for programmers
The use of a window of the database is a form of synthesis of the base table data, is a method of data processing, is a user data security of a
Means. In order to perform complex processing, increase computation speed and save storage space, the definition depth of the view should not exceed three layers. If the three floor
The view is still not sufficient, you should define a temporary table on the view and then define the view on the temporary table. With this iterative definition, the depth of the view is
is not limited.
The role of views is more important for certain information systems related to national political, economic, technical, military and security interests. These
The basic table of the system completes the physical design, immediately establishes the first layer view on the basic table, the number and structure of this layer view, and the basic table
The number and structure are exactly the same. It also stipulates that all programmers are only allowed to operate on the view. Only the database administrator, with
A "security key" shared by multiple people to operate directly on a basic table. Let the reader think: why is this?
11. Intermediate tables, reports and temporary tables
The intermediate table is the table that holds the statistics, which is designed for the data warehouse, the output report, or the query results, and sometimes it has no primary key and
Foreign key (except Data Warehouse). Temporary tables are designed by programmers to store temporary records that are used by individuals. The base table and the intermediate table are made up of the DBA dimension
and temporary tables are automatically maintained by the programmer using the program itself.
12. Integrity constraints are represented in three areas
Domain Integrity: Use Check to implement constraints, in the database design tool, the field value range is defined, there is a CH
The Eck button defines the value of the field through the city.
Referential integrity: The use of PK, FK, table-level triggers to achieve.
User-defined integrity: It is a business rule that is implemented with stored procedures and triggers.
13. The way to prevent database design patching is "three-little principle"
(1) The smaller the number of tables in a database, the better. Only the number of tables is few, can explain the system e--r diagram few but good, removed the
Duplicate redundant entities, formed a high degree of abstraction of the objective world, and carried on the system data integration, prevented the patching style design;
(2) The fewer fields in a table combine primary keys, the better. Because the primary key function, one is to build the primary key index, and the second is to do as a child table
Foreign key, so the number of fields of the combined primary key is less, not only saves the running time, but also saves the index storage space;
(3) The smaller the number of fields in a table, the better. Only the number of fields is small to indicate that there is no duplication of data in the system, and
There is very little data redundancy, and it is more important to urge the reader to learn to "row", which prevents fields in the child table from being pulled into the main table.
, leaving a lot of spare fields in the main table. The so-called "row of rows" is to pull out a portion of the main table and build a separate
The child table. This method is very simple, some people are not accustomed to, do not adopt, do not execute.
The practical principle of database design is to find the right balance between data redundancy and processing speed. "Three Little" is an overall overview
Think, a comprehensive view, cannot isolate a certain principle. The principle is relative, not absolute. The "more than three" principle is certainly wrong. Try
Think: If the same function is covered by the system, the e--r graph of 100 entities (total 1000 attributes) is certainly more than 200 entities (2000 properties total)
E--r map, much better.
Advocating the principle of "three little" is called the reader to learn to use the database design technology for system data integration. The steps of data integration are to
The file system integrates into the application database, integrates the application database into the subject database, and integrates the subject database into the global comprehensive database.
The higher the degree of integration, the more data sharing, the less information island phenomenon, the whole enterprise information System of the global E-r diagram of the entity
The number, the number of primary keys, and the number of attributes will be less.
The purpose of advocating the principle of "three little" is to prevent readers from using patching technology, and constantly change the database to make additions and deletions, so that enterprise data
The library becomes the "garbage heap" of arbitrarily designed database table, or "clump" of database table, and finally causes the basic table in the database, generation
The Code table, the intermediate table, the temporary table disorderly, countless, causes the enterprise information system to be unable to maintain and paralysis.
"More than three" principle anyone can do, the principle is "patching method" design database crooked Science said. The principle of "three little"
Is the principle of few but good, it requires a high degree of database design skills and art, not anyone can do, because the principle is to eliminate
The theoretical basis of designing database with "patching method".
14. Ways to improve the efficiency of the database operation
Under the condition of the given system hardware and system software, the way to improve the operation efficiency of the database system is:
(1) in the database physical design, reduce the paradigm, increase redundancy, less use of triggers, multi-use stored procedures.
(2) When the calculation is very complex, and the number of records is very large (for example, 10 million), the complex calculation is first outside the database, to
When the file system is processed in C + +, the final storage is appended to the table. This is the experience of telecom billing system design.
(3) If a table is found to have too many records, such as more than 10 million, the table is split horizontally. The horizontal segmentation approach is that
Divide the table's records horizontally into two tables by dividing a value of the table's primary key PK. If you find that there are too many fields for a table, such as more than
80, the table is split vertically, and the original table is decomposed into two tables.
(4) The database management system DBMS system optimization, that is, the optimization of various system parameters, such as the number of buffers.
(5) When using the data-oriented SQL language for programming, the optimization algorithm should be taken as far as possible.
In a word, to improve the efficiency of database operation, it must be optimized from database system level, database design level, program implementation level.
, these three levels work at the same time.
The above 14 skills, is a lot of people in a large number of database analysis and design practice, gradually summed up. For these experiences,
Use, readers can not help hard sets, rote memorization, and to digest understanding, seeking truth from facts, flexible grasp. and gradually achieve: in the application of hair
Applications in development.
Database Design (Understanding article)