Database-Physical Design

Source: Internet
Author: User

Database-Physical Design
Physical Design of databases

The storage structure and access method of a database on physical devices are called the physical structure of a database. It depends on the selected database management system.
The process of selecting a physical structure that is most suitable for the application environment for a given Logical Data Model is the physical design of the database.

Steps for physical database design
Determine the physical structure of a database. In a relational database, it mainly refers to the access method and storage structure.
Evaluate the physical structure, focusing on time and space efficiency
If the evaluation results meet the original design requirements, you can enter the physical implementation stage. Otherwise, you need to re-design or modify the physical structure, and sometimes even return to the logical design stage to modify the data model.

Contents and methods of database physical design

Preparations for designing the physical database structure
Perform a detailed analysis on the transactions to be executed to obtain the required parameters for selecting the physical database design.
Fully understand the internal characteristics of the RDBMS, especially the access methods and storage structures provided by the system
Select the parameters required for the physical database design
Database Query transactions
Query relationship
Attributes involved in the query Condition
Attributes related to connection conditions
Query projection attributes

Select the parameters required for the physical database design (continued)
Data Update transaction
Updated Link
Attributes involved in update operation conditions on each link
Modify the attribute value to be changed
The frequency and performance requirements of each transaction on each link

Physical Design of Relational Database
Select an access method for the link mode (create an access path)
Design the physical storage structure of relational, index, and other database files

Select a link Mode Access Method

A database system is shared by multiple users. Multiple access paths must be established for the same link to meet the requirements of multiple applications.
One of the physical design tasks is to determine which access methods to choose, that is, to create which access paths
Common DBMS access methods
Index Method
Currently, the main method is the B + tree index method.
Classic access method, most commonly used
Cluster Method
HASH Method

Select an index access method

Determined Based on Application Requirements
Which attribute columns are indexed?
Create a composite index for Attribute Columns
Which indexes should be designed as unique indexes?

Select the general rules for the index access method. If one (or a group) attribute is frequently found in the query condition, you need to create an index (or composite index) on this (or this group) attribute) if an attribute is often used as a parameter for Aggregate functions such as the maximum and minimum values, you must create an index for this attribute. If an (or a group) attribute is frequently used in the connection condition of the connection operation, if the index relationship is set up on this (or this set of attributes), too many indexes are defined, resulting in a large amount of additional overhead for maintaining the index overhead for searching the index.
Clustering

To improve the query speed of an attribute (or attribute group), the tuples with the same value on this or these attributes (called clustering codes) are stored in a continuous physical block called clustering.

Usage of clustering

Greatly improves query efficiency by clustering code
For example, if the student relationship is indexed by the department, you need to query the list of all students in the information system.
500 students in the Information System must perform at least 500 I/O operations on 500 different physical blocks.
If the student tuples of the same series are stored in a centralized manner, multiple tuples meeting the query conditions can be obtained for each read physical block, significantly reducing the number of disk accesses.

Save storage space
After clustering, clustering codes have the same tuples together, so clustering code values do not have to be stored repeatedly in each tuple, as long as they are saved once in a group.

The limitations of clustering can only improve the performance of certain applications. The overhead of building and maintaining clustering is quite large.
Creating a cluster for an existing link will cause the link Meta Group to move its physical storage location and make the original index on the Link invalid and must be rebuilt.
When the clustering code of a tuples changes, the storage location of the tuples must be moved accordingly.

Applicable scope of clustering
1. applicable to single-link independent clustering and multiple link combination Clustering
For example, if the user needs to query the student Transcript by department, this query involves the connection operation between the student relationship and the elective relationship, that is, the two relationships need to be connected by student ID, to improve the efficiency of connection operations, student tuples and optional tuples with the same student ID can be physically clustered. This is equivalent to storing multiple links in the form of "pre-Connections", which greatly improves the efficiency of connection operations.

When you access or connect to a cluster by using a cluster code, the cluster can be used if there are few or secondary accesses unrelated to the cluster code.
Clustering is especially advantageous when SQL statements contain clauses or phrases related to clustering codes, such as ORDER BY, GROUP BY, UNION, and DISTINCT, this removes the need to sort result sets.
Design the relationship between candidate clustering pairs that are often connected together to establish a clustering. If a group of attributes of a relationship often appear in equal comparison conditions, A single link can be used to establish a clustering. If the repetition rate of values on one (or a group) attribute of a link is high, a single link can be used to establish a clustering. That is, the average number of metagroups corresponding to each clustering code value is not small. The effect of clustering is not obvious.

Optimize cluster Design
Delete the relationships that frequently perform full table scans from the cluster;
The relationship between deleting update operations from the cluster is far greater than that of connection operations;
Different clustering clusters may contain the same relationship. A link can be in a cluster, but cannot be added to multiple clusters at the same time.
Select a better clustering solution (including not creating clustering), that is, the minimum total cost of running various transactions on the cluster.

Rules for selecting HASH access methods

When a link meets the following two conditions, you can select the HASH access method.
The attributes of the link are mainly displayed in the equivalent join condition or in the equal comparison selection condition.
The relationship size is predictable and remains unchanged;
Or
The relationship size changes dynamically, but the selected DBMS provides a dynamic HASH access method.

Determine the database storage structure

Determine the physical structure of the database
1. Determine the data storage location and Storage Structure
Link
Index
Clustering
Logs
Backup
2. Confirm System Configuration
Factors for determining the data storage location and Storage Structure
Access time
Storage space utilization
Maintenance Cost
These three aspects are often conflicting.
For example, eliminating all redundant data can save storage space and reduce maintenance costs, but it often leads to an increase in Retrieval costs.
You must make a trade-off and select a compromise.

Determine the data storage location

Basic Principles
Set
The variable part and the stable part are stored separately.
High Access frequency and low access frequency are separated for storage

For example, database data backup and log file backup are used only for fault recovery, and the data volume is large. You can consider storing them on tape. If the computer has multiple disks or disk arrays, you can consider placing the tables and indexes on different disks separately. During query, the disk drive works in parallel to improve the efficiency of physical I/O read/write (continued ): you can store large tables on two disks to accelerate access. This is especially effective in multi-user environments for storing log files and database objects (tables and indexes) put on different disks to improve system performance

DBMS products generally provide storage allocation parameters
Number of users who use the database at the same time
Number of database objects simultaneously opened
Memory Allocation Parameters
Length and number of buffers used
Storage Allocation Parameters
.......

Review physical structure

Rating content
Evaluate the various solutions generated during the physical design of the database, and select a better solution as the physical structure of the database.
Evaluation Method (fully dependent on the selected DBMS)
Quantitative Estimation of various schemes
Storage space
Access time
Maintenance Cost
Weigh and compare the estimation results and select a reasonable physical structure.
If the structure does not meet your needs, you need to modify the design.

After the database structure is established, you can load data into the database. Organizing data warehouse receiving is the most important task in the database implementation phase.

Data Loading Method
Manual method
Computer-aided data warehouse receiving

Database trial run

Emphasize two points:
Organize data warehouse receiving in batches in installments
Re-designing the physical structure or even the logical structure will lead to data re-warehouse receiving.
Because the data warehouse receiving workload is too large and time-consuming and labor-consuming, data warehouse receiving should be organized in batches.
Enter a small batch of data for trial
Input data in large batches after the test run is passed
Gradually increase data volume and complete operation evaluation

Database dumping and recovery
In the database Trial Run phase, the system is still unstable and hardware and software faults may occur at any time.
System operators are not familiar with the new system, and Misoperation is inevitable.
Therefore, we must do a good job of database dumping and recovery to minimize the damage to the database.

Database Operation and Maintenance

After the database has passed the test run, the database can be put into operation.
Database input indicates the basic completion of development tasks and the start of maintenance.
It is a long-term task to evaluate, adjust, modify, and maintain the database design. It is also the continuation and improvement of the design work.
Changing application environment
The physical storage changes constantly during database operation.

In the database operation stage, frequent maintenance of the database is mainly performed by the DBA, including:
Database dumping and recovery
Database Security and Integrity Control
Monitoring, analysis and improvement of database performance
Re-organization and re-construction of databases

Re-organization and re-construction of databases
Organizational Structure
All organizations
Partially re-organized
Only reorganizes tables that are frequently added or deleted
Target of re-organization
Improve system performance

Re-Organizational Work
Based on original design requirements
Reschedule storage locations
Garbage Collection
Reduce pointer chains
The re-organization of the database does not change the logic structure and physical structure of the original data.

Database Reconstruction
Adjust the database mode and internal mode according to the new environment
Add new data items
Change data item type
Change database capacity
Add or delete an index
Modify integrity constraints

Formation of database models at all levels
Database models at all levels are gradually formed during the design process
The demand analysis stage integrates the application requirements of various users (real-world requirements)
Concept Design Stage forms a concept model (information world model) independent of the machine characteristics, independent of various DBMS products, which is described by E-R diagram.

In the logic design stage, the E-R diagram is converted into the data model supported by the specific database product, such as the relational model, to form the database logic model. Then, based on the user's processing requirements and security considerations, the necessary VIEW is created on the basis of the basic table to form an external data mode.
In the physical design stage, physical storage is arranged based on DBMS characteristics and processing needs, and indexes are designed to form the internal database mode.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.