Database-Physical Design

Source: Internet
Author: User

Physical Design of database

The storage structure and access method of the database on the physical device is called the physical structure of the database, it relies on the selected database management system
The process of selecting a physical structure that best suits the application environment for a given logical data model is the physical design of the database

The steps of database physical design
Determine the physical structure of a database, mainly referring to the access method and the storage structure in the relational database
Evaluation of physical structure is focused on time and space efficiency
If the evaluation results meet the original design requirements, you can enter the physical implementation phase, otherwise you will need to redesign or modify the physical structure, and sometimes even return to the logical design stage to modify the data model

Contents and methods of database physical design

Prepare to design the physical database structure
Detailed analysis of the transactions to be run, obtaining the parameters required to select the physical database design
Fully understand the internal characteristics of the RDBMS used, especially the system-provided access method and storage structure
Selecting the parameters required for physical database design
Database query transactions
Relationship of Queries
Properties that are involved in a query condition
Properties involved in the join condition
Projection Properties for queries

Selecting the parameters required for physical database design (cont.)
Data Update transactions
The updated relationship
The properties involved in the update operation condition on each relationship
Modify the property value to be changed by the operation
Frequency and performance requirements for each transaction running on each relationship

The content of the physical design of the relational database
Select access method for relational mode (establish access path)
Design the physical storage structure of database files such as relationships, indexes, etc.

Selection of access methods for relational mode

The database system is a multi-user shared system, to establish multiple access paths for the same relationship to meet the multiple application requirements of multiple users
One of the tasks of physical design is to determine which access methods to choose, which access paths to establish
Common access Methods for DBMS
Index method
Currently the main is the B + Tree Index method
Classic access methods, using the most common
Cluster (Cluster) method
Hash method

Choosing an index access method

Determine according to application requirements
Which property columns to index
Which property columns to set up the composite index
which indexes to design as unique indexes

选择索引存取方法的一般规则如果一个(或一组)属性经常在查询条件中出现,则考虑在这个(或这组)属性上建立索引(或组合索引)如果一个属性经常作为最大值和最小值等聚集函数的参数,则考虑在这个属性上建立索引如果一个(或一组)属性经常在连接操作的连接条件中出现,则考虑在这个(或这组)属性上建立索引关系上定义的索引数过多会带来较多的额外开销 维护索引的开销 查找索引的开销
Cluster

In order to increase the query speed of a property (or group of properties), the grouping of tuples with the same value on this or these properties (called the clustered code) is stored in a contiguous physical block called a cluster

Use of clusters
    1. Greatly improve the efficiency of query by cluster code
      For example, if the student relationship is indexed by the department, it is now necessary to query all student lists of the information system.
      500 students in the information system perform at least 500 I/O operations when distributed over 500 different physical blocks
      If the student tuples of the same department are centrally stored, a single physical block can be used to obtain multiple tuples that meet the query criteria, thus significantly reducing the number of times to access the disk

    2. Save storage space
      After clustering, the cluster code of the same tuple is concentrated together, so that the cluster code value does not have to be repeated in each tuple, as long as in a group to save once on the line

Limitations of clustering
    1. Clustering can only improve the performance of specific applications
    2. The overhead of building and maintaining a cluster is quite large
      Having a cluster on an existing relationship causes the tuple in the relationship to move its physical storage location and invalidate the original index on the relationship, which must be rebuilt
      When the cluster code of a tuple changes, the storage location of the tuple should also be moved accordingly.

The applicable range of the cluster
1. Suitable for single-relationship independent clustering, also suitable for multiple relationship combination clusters
Example: Assume that users often have to press the department to query student transcripts, this query involves student relations and elective relationship connection operation, that is, the need to connect the two relations by the number of students, in order to improve the efficiency of the connection operation, you can have the same number of student tuples and elective tuples are physically clustered together. This is equivalent to a number of relationships in the form of "pre-connected" storage, which greatly improve the efficiency of the connection operation.

    1. Clustering can be used when access or connection through a clustered code is the primary application of the relationship, and other access that is not related to the cluster code is small or minor.
      In particular, when a SQL statement contains clauses or phrases such as the order By,group BY,UNION,DISTINCT related to the cluster code, the use of clustering is particularly advantageous, eliminating the sorting operation on the result set
设计候选聚簇对经常在一起进行连接操作的关系可以建立聚簇如果一个关系的一组属性经常出现在相等比较条件中,则该单个关系可建立聚簇如果一个关系的一个(或一组)属性上的值重复率很高,则此单个关系可建立聚簇。即对应每个聚簇码值的平均元组数不太少。太少了,聚簇的效果不明显

Optimal cluster design
Delete the relationship of frequent full table scan from cluster;
Remove the update operation from the cluster far more than the relationship of the connection operation;
Different clusters may contain the same relationship, a relationship can be in one cluster, but not multiple clusters at the same time
Choose a better one from these multiple clustering schemes, including non-clustered clusters, where the total cost of running transactions on this cluster is minimal

Rules for choosing a hash access method

When a relationship satisfies the following two conditions, you can choose the hash access method
The properties of the relationship mainly appear in the equivalent join condition or appear mainly in the Equality comparison selection condition
The size of the relationship is predictable and constant;
Or
The size of the relationship changes dynamically, but the selected DBMS provides a dynamic hash access method

Determining the storage structure of a database

Determining the contents of a database's physical structure
1. Determine the location and storage structure of the data
Relationship
Index
Cluster
Log
Backup
2. Determine the System configuration
Factors that determine the location and storage structure of data
Access time
Storage space Utilization
Maintenance costs
These three aspects are often contradictory to one another.
Example: eliminating all redundant data saves storage space and reduces maintenance costs, but often leads to increased retrieval costs
Tradeoffs must be made to choose a compromise solution

Determine where the data is stored

Basic principles
Depending on the application, the
The variable part is stored separately from the stable part
High frequency of access and lower frequency of access to the part, stored separately

例:数据库数据备份、日志文件备份等由于只在故障恢复时才使用,而且数据量很大,可以考虑存放在磁带上如果计算机有多个磁盘或磁盘阵列 ,可以考虑将表和索引分别放在不同的磁盘上,在查询时,由于磁盘驱动器并行工作,可以提高物理I/O读写的效率 例(续):可以将比较大的表分别放在两个磁盘上,以加快存取速度,这在多用户环境下特别有效可以将日志文件与数据库对象(表、索引等)放在不同的磁盘以改进系统的性能

DBMS products typically provide some storage allocation parameters
Number of users using the database at the same time
Number of database objects open at the same time
Memory allocation parameters
Length, number of buffers used
Storage Allocation parameters
.......

Evaluate the physical structure

Evaluation Content
This paper makes a detailed evaluation of the various schemes produced in the process of database physical design, and chooses a better scheme as the physical structure of the database.
Evaluation method (fully dependent on the DBMS chosen)
Quantitative estimation of various scenarios
Storage space
Access time
Maintenance costs
The evaluation results are weighed and compared, and a better rational physical structure is chosen.
If the structure does not meet user requirements, you need to modify the design

Once the database structure is set up, you can load the data into the database. Organizing data warehousing is the most important work in the database implementation phase.

Data Loading methods
Manual method
Computer-assisted data warehousing

Trial run of the database

Emphasize two points:
Organize data warehousing in batches
Re-engineering physical structures and even logical structures can cause data to be re-put into storage.
Because the data storage workload is too large, time-consuming, laborious, so should be organized in batches of data warehousing
Enter small batch data for trial
After the test run basic qualified and then bulk input data
Gradually increase the amount of data, and gradually complete the operation evaluation

Dumps and restores of databases
In the database trial run phase, the system is still unstable, hard, software failure can occur at any time
The operator of the system is not familiar with the new system, and the misoperation is unavoidable.
Therefore, it is necessary to do a good job of database dump and recovery to minimize the damage to the database.

Operation and maintenance of database

After the database test run is qualified, the database can be put into full operation.
Database commissioning marks the beginning of basic completion and maintenance of development tasks
It is a long-term task to evaluate, adjust and revise the database design, and it is also the continuation and improvement of the design work.
The application environment is constantly changing
Physical storage is constantly changing during database operation

During the database run phase, the regular maintenance of the database is performed primarily by the DBA, including:
Dumps and restores of databases
Security and integrity control of the database
Monitoring, analysis and improvement of database performance
Re-organization and re-structuring of databases

Re-organization and re-structuring of databases
form of the re-organization
All re-organization
Partial re-organization
Re-organize only the frequently added and deleted tables
Objectives of the re-organization
Improve system performance

Re-organized work
According to the original design requirements
Reschedule Storage Locations
Recycling waste
Reduce the pointer chain
The reconfiguration of the database does not change the logical and physical structure of the original design data

Database re-construction
Adjust the database mode and internal mode according to the new environment
Add a new data item
Change the type of the data item
Changing the capacity of a database
Add or remove an index
Modify integrity Constraint conditions

The formation of models at all levels of databases
The model of the database is gradually formed during the design process.
Requirements analysis phase to synthesize the application needs of individual users (real-world needs)
The conceptual design phase forms a conceptual model independent of the machine characteristics, independent of each DBMS product (the information World model), which is described by the E-r diagram.

In the logic design phase, the E-r diagram is transformed into a data model supported by the database product, such as the relational model, which forms the database logical mode. Then according to the user processing requirements, security considerations, based on the basic table to establish the necessary view (view) to form the external mode of the data
In the physical design phase according to the DBMS characteristics and processing needs, the physical storage arrangement, the design index, forms the database inside the pattern

Database-Physical Design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.