Physical Design of database
The storage structure and access method of the database on the physical device is called the physical structure of the database, it relies on the selected database management system
The process of selecting a physical structure that best suits the application environment for a given logical data model is the physical design of the database
The steps of database physical design
Determine the physical structure of a database, mainly referring to the access method and the storage structure in the relational database
Evaluation of physical structure is focused on time and space efficiency
If the evaluation results meet the original design requirements, you can enter the physical implementation phase, otherwise you will need to redesign or modify the physical structure, and sometimes even return to the logical design stage to modify the data model
Contents and methods of database physical design
Prepare to design the physical database structure
Detailed analysis of the transactions to be run, obtaining the parameters required to select the physical database design
Fully understand the internal characteristics of the RDBMS used, especially the system-provided access method and storage structure
Selecting the parameters required for physical database design
Database query transactions
Relationship of Queries
Properties that are involved in a query condition
Properties involved in the join condition
Projection Properties for queries
Selecting the parameters required for physical database design (cont.)
Data Update transactions
The updated relationship
The properties involved in the update operation condition on each relationship
Modify the property value to be changed by the operation
Frequency and performance requirements for each transaction running on each relationship
The content of the physical design of the relational database
Select access method for relational mode (establish access path)
Design the physical storage structure of database files such as relationships, indexes, etc.
Selection of access methods for relational mode
The database system is a multi-user shared system, to establish multiple access paths for the same relationship to meet the multiple application requirements of multiple users
One of the tasks of physical design is to determine which access methods to choose, which access paths to establish
Common access Methods for DBMS
Index method
Currently the main is the B + Tree Index method
Classic access methods, using the most common
Cluster (Cluster) method
Hash method
Choosing an index access method
Determine according to application requirements
Which property columns to index
Which property columns to set up the composite index
which indexes to design as unique indexes
选择索引存取方法的一般规则如果一个(或一组)属性经常在查询条件中出现,则考虑在这个(或这组)属性上建立索引(或组合索引)如果一个属性经常作为最大值和最小值等聚集函数的参数,则考虑在这个属性上建立索引如果一个(或一组)属性经常在连接操作的连接条件中出现,则考虑在这个(或这组)属性上建立索引关系上定义的索引数过多会带来较多的额外开销 维护索引的开销 查找索引的开销
Cluster
In order to increase the query speed of a property (or group of properties), the grouping of tuples with the same value on this or these properties (called the clustered code) is stored in a contiguous physical block called a cluster
Use of clusters
Greatly improve the efficiency of query by cluster code
For example, if the student relationship is indexed by the department, it is now necessary to query all student lists of the information system.
500 students in the information system perform at least 500 I/O operations when distributed over 500 different physical blocks
If the student tuples of the same department are centrally stored, a single physical block can be used to obtain multiple tuples that meet the query criteria, thus significantly reducing the number of times to access the disk
Save storage space
After clustering, the cluster code of the same tuple is concentrated together, so that the cluster code value does not have to be repeated in each tuple, as long as in a group to save once on the line
Limitations of clustering
- Clustering can only improve the performance of specific applications
- The overhead of building and maintaining a cluster is quite large
Having a cluster on an existing relationship causes the tuple in the relationship to move its physical storage location and invalidate the original index on the relationship, which must be rebuilt
When the cluster code of a tuple changes, the storage location of the tuple should also be moved accordingly.
The applicable range of the cluster
1. Suitable for single-relationship independent clustering, also suitable for multiple relationship combination clusters
Example: Assume that users often have to press the department to query student transcripts, this query involves student relations and elective relationship connection operation, that is, the need to connect the two relations by the number of students, in order to improve the efficiency of the connection operation, you can have the same number of student tuples and elective tuples are physically clustered together. This is equivalent to a number of relationships in the form of "pre-connected" storage, which greatly improve the efficiency of the connection operation.
- Clustering can be used when access or connection through a clustered code is the primary application of the relationship, and other access that is not related to the cluster code is small or minor.
In particular, when a SQL statement contains clauses or phrases such as the order By,group BY,UNION,DISTINCT related to the cluster code, the use of clustering is particularly advantageous, eliminating the sorting operation on the result set
设计候选聚簇对经常在一起进行连接操作的关系可以建立聚簇如果一个关系的一组属性经常出现在相等比较条件中,则该单个关系可建立聚簇如果一个关系的一个(或一组)属性上的值重复率很高,则此单个关系可建立聚簇。即对应每个聚簇码值的平均元组数不太少。太少了,聚簇的效果不明显
Optimal cluster design
Delete the relationship of frequent full table scan from cluster;
Remove the update operation from the cluster far more than the relationship of the connection operation;
Different clusters may contain the same relationship, a relationship can be in one cluster, but not multiple clusters at the same time
Choose a better one from these multiple clustering schemes, including non-clustered clusters, where the total cost of running transactions on this cluster is minimal
Rules for choosing a hash access method
When a relationship satisfies the following two conditions, you can choose the hash access method
The properties of the relationship mainly appear in the equivalent join condition or appear mainly in the Equality comparison selection condition
The size of the relationship is predictable and constant;
Or
The size of the relationship changes dynamically, but the selected DBMS provides a dynamic hash access method
Determining the storage structure of a database
Determining the contents of a database's physical structure
1. Determine the location and storage structure of the data
Relationship
Index
Cluster
Log
Backup
2. Determine the System configuration
Factors that determine the location and storage structure of data
Access time
Storage space Utilization
Maintenance costs
These three aspects are often contradictory to one another.
Example: eliminating all redundant data saves storage space and reduces maintenance costs, but often leads to increased retrieval costs
Tradeoffs must be made to choose a compromise solution
Determine where the data is stored
Basic principles
Depending on the application, the
The variable part is stored separately from the stable part
High frequency of access and lower frequency of access to the part, stored separately
例:数据库数据备份、日志文件备份等由于只在故障恢复时才使用,而且数据量很大,可以考虑存放在磁带上如果计算机有多个磁盘或磁盘阵列 ,可以考虑将表和索引分别放在不同的磁盘上,在查询时,由于磁盘驱动器并行工作,可以提高物理I/O读写的效率 例(续):可以将比较大的表分别放在两个磁盘上,以加快存取速度,这在多用户环境下特别有效可以将日志文件与数据库对象(表、索引等)放在不同的磁盘以改进系统的性能
DBMS products typically provide some storage allocation parameters
Number of users using the database at the same time
Number of database objects open at the same time
Memory allocation parameters
Length, number of buffers used
Storage Allocation parameters
.......
Evaluate the physical structure
Evaluation Content
This paper makes a detailed evaluation of the various schemes produced in the process of database physical design, and chooses a better scheme as the physical structure of the database.
Evaluation method (fully dependent on the DBMS chosen)
Quantitative estimation of various scenarios
Storage space
Access time
Maintenance costs
The evaluation results are weighed and compared, and a better rational physical structure is chosen.
If the structure does not meet user requirements, you need to modify the design
Once the database structure is set up, you can load the data into the database. Organizing data warehousing is the most important work in the database implementation phase.
Data Loading methods
Manual method
Computer-assisted data warehousing
Trial run of the database
Emphasize two points:
Organize data warehousing in batches
Re-engineering physical structures and even logical structures can cause data to be re-put into storage.
Because the data storage workload is too large, time-consuming, laborious, so should be organized in batches of data warehousing
Enter small batch data for trial
After the test run basic qualified and then bulk input data
Gradually increase the amount of data, and gradually complete the operation evaluation
Dumps and restores of databases
In the database trial run phase, the system is still unstable, hard, software failure can occur at any time
The operator of the system is not familiar with the new system, and the misoperation is unavoidable.
Therefore, it is necessary to do a good job of database dump and recovery to minimize the damage to the database.
Operation and maintenance of database
After the database test run is qualified, the database can be put into full operation.
Database commissioning marks the beginning of basic completion and maintenance of development tasks
It is a long-term task to evaluate, adjust and revise the database design, and it is also the continuation and improvement of the design work.
The application environment is constantly changing
Physical storage is constantly changing during database operation
During the database run phase, the regular maintenance of the database is performed primarily by the DBA, including:
Dumps and restores of databases
Security and integrity control of the database
Monitoring, analysis and improvement of database performance
Re-organization and re-structuring of databases
Re-organization and re-structuring of databases
form of the re-organization
All re-organization
Partial re-organization
Re-organize only the frequently added and deleted tables
Objectives of the re-organization
Improve system performance
Re-organized work
According to the original design requirements
Reschedule Storage Locations
Recycling waste
Reduce the pointer chain
The reconfiguration of the database does not change the logical and physical structure of the original design data
Database re-construction
Adjust the database mode and internal mode according to the new environment
Add a new data item
Change the type of the data item
Changing the capacity of a database
Add or remove an index
Modify integrity Constraint conditions
The formation of models at all levels of databases
The model of the database is gradually formed during the design process.
Requirements analysis phase to synthesize the application needs of individual users (real-world needs)
The conceptual design phase forms a conceptual model independent of the machine characteristics, independent of each DBMS product (the information World model), which is described by the E-r diagram.
In the logic design phase, the E-r diagram is transformed into a data model supported by the database product, such as the relational model, which forms the database logical mode. Then according to the user processing requirements, security considerations, based on the basic table to establish the necessary view (view) to form the external mode of the data
In the physical design phase according to the DBMS characteristics and processing needs, the physical storage arrangement, the design index, forms the database inside the pattern
Database-Physical Design