5-2 Database Design
Basic steps
Steps:
1. Requirements Analysis Phase
The database design must first understand and analyze the user needs accurately. Demand analysis is the foundation of the entire design process and is the most difficult and time-consuming step.
2. Conceptual structure Design phase
Conceptual design is the key to the whole database design, which is a concept model which is independent of the specific database management system by synthesizing, summarizing and abstracting the user's needs.
in the conceptual structure design phase, the trip is independent of the machine characteristics and is independent of the conceptual model of the product of relational database management system, which is usually represented by ER diagram .
3. Logical Structure design phase
The logical structure design is to convert the conceptual structure to a data model supported by a database management system and optimize it.
in the logical structure design phase, the ER diagram is converted into a data model supported by the database product, such as the relational model, the trip database logic mode, and then based on the requirements of the user processing, security considerations, on the basis of the basic table to establish the necessary views, the travel database outside the model.
4. Physical Structure design phase
The physical structure design is the physical structure that chooses the most suitable application environment for the logical data model. Reports on the storage structure and access methods.
in the physical structure design stage, according to the characteristics of the relational database management system and the needs of the processing of physical storage arrangements, the establishment of index, travel database within the model
5. Database implementation phase
During the database implementation phase, the designer uses database language and host language provided by the database management system, establishes the database according to the result of logical design and physical design, writes and debugs the application, organizes the data into storage, and runs the test run.
6. Database operation and Maintenance phase
The database application system can be put into operation after a trial run. The database system must be continuously evaluated, adjusted and modified during its operation.
Demand analysis
The focus of the Demand Analysis survey is "data" and "processing", through investigation, collection and analysis, to obtain the user's database requirements as follows:
1. Information requirements. Refers to the content and nature of the information that the user needs to obtain from the database. Data requirements can be exported by information requirements, that is, what data needs to be stored in the database.
2. Processing requirements. Refers to the user to complete the data processing function, the processing performance requirements.
3. Security and integrity requirements.
Conceptual design
The process of abstracting the user requirements from demand analysis into information structure is the design of conceptual structure.
It is the key to the entire database design.
The main features of the conceptual model
Characteristics:
1. To be true, fully reflect the real world, including the relationship between things and things, to meet the data processing requirements, is a real world model.
2. Easy to understand, you can exchange ideas with users who are unfamiliar with the computer.
3. Easy to change, it is easy to modify and expand the conceptual model when the application requirement concept is applied to the environment.
4. Easy to switch to relational, mesh, hierarchical and other data models.
Connections between ER model entities
Contact:
1. The linkage between two entity types, one-to-many, many-to-many.
2. More than two entity-type connections, one-to-many, many-to-many.
3. One-to-one, one-to-many, many-to-many links within a single entity.
Er drawing method
Descriptive:
1. The entity type is represented by a rectangle, and the entity name is indicated in the rectangle box.
2. The attribute is represented by an ellipse and connected with a non-oriented edge to the corresponding entity type.
3. Contact with diamond, the diamond box is indicated by the contact name, and the non-oriented side with the relevant entity type, and at the same time in the non-side table on the type of contact.
4. If the existence of an entity type depends on the existence of other entity type, then this entity type is called weak entity type, otherwise it is called strong entity type.
Integration of ER diagrams
Integration method:
1. Merge, solve the conflict between the sub-ER diagram, the ER diagram is combined to generate a preliminary ER diagram.
The problems faced by each local application are different and are usually designed by different designers, which leads to a lot of inconsistencies between the ER diagrams of each subsystem, called conflicts.
2. Modify and refactor, eliminate unnecessary redundancy, and generate a basic ER diagram. Eliminate attribute conflicts, naming conflicts, and structural conflicts.
Not all redundant data and redundancy exercises have to be eliminated, sometimes to increase efficiency, with redundant information at the expense. Therefore, when designing a database conceptual structure, which redundant ixnxi must be eliminated, which redundancy information is allowed to exist, it needs to be determined according to the overall needs of the user.
Logic Design
The task of logical structure design is to transform the basic E-r diagram designed in conceptual structure design phase into a logical structure conforming to the data model supported by the product of database management system.
Relationship Model Transformation method
Conversion method:
1. A 1:1 connection can be converted to a separate relational pattern, or it can be combined with a relational schema at either end
2. A 1:n connection can be converted into an independent relational schema, or it can be combined with a relational schema corresponding to the N-terminal.
3. A m:n connection transitions to a relational schema, and the code of the entity associated with the contact and the properties of the contact itself are converted to the properties of the relationship.
4. A multivariate relationship between three or more than three entities can be converted into a relational pattern.
5. The relationship mode with the same code can be merged.
Optimization of relational model
The results of the logical design of the database are not unique. In order to improve the performance of database application system, it is necessary to adjust the structure of the data model according to the application needs, which is the optimization of the data Oh row.
Optimization method:
1. Identify dependencies. According to the semantics obtained in the requirement analysis stage, the data dependencies between each attribute in each relational pattern and the data dependencies between different relational pattern attributes are written out separately.
2. Minimize the data dependencies between each relational model and eliminate redundant connections.
3. According to the theory of data dependence, analyze the relationship pattern one by one, investigate whether there are some function dependencies, transfer function dependence, multi-value dependence, etc., and determine the relationship patterns belong to the first paradigm respectively.
4. Analysis of the processing requirements of the requirements analysis phase to determine whether these patterns are appropriate for such application environments, and to identify the merging or decomposition of certain patterns.
5. The necessary decomposition of the relational model to improve the efficiency of data operations and storage space utilization. Two common decomposition methods are horizontal decomposition and vertical decomposition.
Design User sub-mode
After transforming the conceptual model into a global logical model, the user's external mode should be designed according to the local application requirements and the characteristics of the specific relational database management system.
At present, the relational database management system generally provides a view concept, you can use the view of the public design more in line with the user needs of the user's external mode.
Define the database global mode The main grass time, space, easy maintenance, such as three angles trigger. The following aspects are generally considered when defining an out-of-user mode:
1. Use aliases that are more user-compliant.
2. Define different views for different levels of users to ensure the security of the system.
3. Simplify the user's use of the system.
Horizontal decomposition of noun interpretation
The horizontal decomposition is to divide the basic relation tuple into several sub-sets, define each sub-relationship, in order to improve the system efficiency. According to the 80/20 principle, the data used in a large relationship is only part of the relationship, about 20%, can be used to decompose the data, the trip a child relationship, If the relationship R has n transactions, and the data for most transactional accesses does not intersect, R can be decomposed to less than or equal to the N child relationship, and the access for each transaction corresponds to a relationship.
Vertical decomposition
Vertical decomposition is to decompose the properties of the relationship mode R into several sub-sets, forming several sub-relational patterns, and the principle of vertical decomposition is to decompose the properties that are often used together from R, and the stroke to a sub-relational pattern. Vertical decomposition can improve the efficiency of some transactions, but may also cause other transactions to be linked. This reduces efficiency, so whether vertical decomposition depends on the total efficiency of all transactions on R after decomposition is improved. Vertical decomposition needs to ensure non-destructive connectivity and maintain function dependencies, that is, to ensure that the decomposed relationship has a lossless connection and maintain function dependencies.
Data dictionary
Data dictionaries are established during the requirements analysis phase.
A data dictionary is a reserved space, a database, which is used to store the information in the database itself.
The data dictionary has another meaning, which is a tool used in database design to describe the design of the basic table in the database, including the content of the attribute of the table, such as field name, data type, primary key, foreign key, etc.
The composition of the Data dictionary:
1. Data item
2. Data structure
3. Data Flow
4. Data storage
5. Processing process
Description of the various parts of the data dictionary
① data item: Data item description in data structure of block in dataflow diagram
A data item is a unit of data that is not re-divided. The description of a data item usually includes the following:
Data item Description ={data item name, data item meaning description, alias, data type, length,
Value range, value meaning, logical relationship to other data items}
The "Value range" and "logical relationship with other data items" Define the integrity constraints of the data, which is the basis of the design data inspection function.
A number of data items can be composed of a structure.
② Data structure: Information structure description of blocks in streaming diagrams
The data structure reflects the combinatorial relationships between the figures. A data structure can consist of a number of items, or it can consist of a plurality of data structures, or a mixture of several items and structures. The description of a data structure typically includes the following:
Data structure describes ={data structure name, meaning description, composition: {Data item or structure}}
③ Data Flow: A description of streamlines in a streaming diagram
A data stream is a path that is transmitted within a system. The description of the data flow typically includes the following:
Data Flow description ={data stream name, description, data stream source, data flow whereabouts, composition: {Data structure}, average traffic, peak traffic}
Where "Data flow source" is the process from which the data flow comes from, that is, the source of the data. The "flow of data" is the process by which the data flow will go, that is, the whereabouts of the data. "Average traffic" refers to the number of transmissions in a unit time (daily, weekly, monthly, etc.). Peak traffic refers to data traffic during peak periods.
④ data storage: A description of the storage characteristics of data blocks in streaming diagrams
Data storage is the place where data structures stay or save, and it is also one of the sources and whereabouts of traffic. The description of the data store typically includes the following:
Data storage Description ={data storage Name, description, number, incoming data stream, outgoing data stream, composition: {Data structure}, amount of data, access method}
Where "data volume" refers to how much data is accessed each time, daily (or hourly, weekly, etc.) to access several times such information. Whether the "Access method" includes batch processing, online processing, retrieval or update, sequential retrieval or random retrieval, etc.
In addition, "Incoming traffic" should indicate its source, and "outgoing traffic" should indicate its whereabouts.
⑤ Process: Description of function block in streaming diagram
The data dictionary requires only descriptive information describing the process, usually including the following:
Process description ={Process name, description, input: {Data flow}, output: {Data flow}, Processing: {Brief description}}
The "brief description" of the main description of the process of the function and processing requirements. function refers to what the process is used to do (and not how); processing requirements include processing frequency requirements, such as how many transactions per unit of time, how much data, response time requirements, and so on, these processing requirements are the input of the physical design behind and performance evaluation standards.
Physical Structure Design
The storage structure and access method of database on physical device becomes the physical structure of database, it relies on the selected database management system. The process of selecting a physical structure that best suits the application requirements for a given logical data model is the physical design of the database.
In general, the main content of the physical design of relational database includes the selection of access methods for relational schemas, the physical storage structure of database files such as design relationships and indexes.
Selection of access methods for relational mode
Access is a technique for fast access to data in a database. The database management system generally provides multiple access methods. The commonly used access methods are index method and cluster method.
The choice of the so-called access method is, in fact, based on the application requirements to determine which attributes of the relationship are indexed, which properties build a composite index, which indexes are designed as unique indexes, and so on.
Selection of index access method for B + Tree
- If a property or attribute group is frequently present in the query criteria, consider indexing on this property or property group.
- If an attribute is often used as a parameter to a clustered function such as the maximum and minimum values, consider indexing on this property.
- If a property or attribute group is frequently present in the join condition of the join operation, consider indexing on this property.
The choice of hash Index access method
This relationship can weightlifting the Hah access method if the attribute of a relationship mainly appears in the equivalent join condition or is mainly in the equivalent comparison selection condition and satisfies one of the following two conditions.
1. The size of a relationship can be predictable and constant.
2. The size of the relationship changes dynamically, but the data management system provides a dynamic HSH access method.
Selection of cluster access methods
To improve the speed of queries for a property or group of attributes, storing several tuples with the same value on those attributes is called clustering in a contiguous physical block. This property becomes a clustered code.
Cluster function can greatly improve the query speed of a property or group of properties
For example, to query all students in the information system, there are 500 students in the information system, and in extreme cases, the corresponding data tuples of the 500 students are distributed on 500 different physical blocks, although the student relationship has been indexed by the Department, and the tuple flag can be found quickly by the index. However, when a tuple accesses a block of data, it accesses 500 physical blocks and performs 500 I/O operations. If you store several of the same student tuples, you can get multiple tuples that meet the query criteria for each physical block, significantly reducing the number of times the disk is accessed.
It must be emphasized that clustering intelligence improves the performance of certain applications, and the overhead of building and maintaining clustering is considerable. For existing relationships, establishing a cluster will cause the tuple in the relationship to move its physical storage location, and make all the indexes originally established on this relationship invalid and must be rebuilt.
5-2 Database Design