Distributed Database Overview

Source: Internet
Author: User

1. Preface

With the maturity of traditional database technology, the rapid development of computer network technology and the expansion of Application Scope, database applications have been widely established on computer networks. At this time, the centralized database system shows its shortcomings: data is stored on the network as needed, and then processed in a centralized manner, which will inevitably lead to high communication overhead; the application Assembly runs on a computer, once the computer fails, the entire system is affected and the reliability is not high. centralized processing results in inflexible system scale and configuration, and poor system scalability. In this situation, the concept of "centralized computing" of centralized databases is evolving towards the concept of "Distributed Computing. Distributed Computing mainly involves the Client/Server mode and distributed database architecture.

2. Distributed Database System Overview

With the maturity of traditional database technology, the rapid development of computer network technology and the expansion of application scope, the research and development of database systems with the main characteristics of distributed has attracted attention. Distributed Database is a product of the combination of database technology and network technology. It has formed a branch in the database field. The Research on Distributed Databases began in the middle of 1970s. The world's first distributed database system SDD-1 was implemented by American computer company CCA in DEC in 1979. Since 1990s, the distributed database system has entered the commercialization application stage. traditional relational database products have all developed into distributed database products with computer networks and multi-task operating systems as the core, at the same time, distributed databases gradually develop to the Client/Server mode.

3. DDBS Classification

1) homogeneous DDBS: Data Models of the same type are used in various sites, such as relational databases) and DBMS of the same type.
2) homogeneous and heterogeneous DDBS: each site uses the same data model, but the DBMS model is different, such as DB2, ORACLE, SYBASE, and SQL Server.
3) heterogeneous DDBS: data models of different sites have different models, and even different types. With the development of computer network technology, the problem of network connection between heterogeneous machines has been well solved. In this case, the data in various heterogeneous local databases in the entire network can be accessed by using the heterogeneous DDBS.

4. features and advantages and disadvantages of DDBS

Basic Features of DDBS:
1) physical distribution: data is not stored in one site, but stored in multiple sites on the computer network.
Logical integrity: data is physically distributed across various sites, but logically integrated. They are shared by all users and managed by a DDBMS.
2) site autonomy: Data on each site is managed by a local DBMS and can be processed in an autonomous manner to complete local application of the site ).
3) Collaboration between sites: although each site has a high degree of autonomy, collaboration forms a whole.
Other features of DDBS
1) Data independence
2) centralized and autonomous control mechanism
3) appropriately increase data redundancy
4) Distribution of Transaction Management

Advantages of DDBS:

1) flexible architecture
2) Adapt to Distributed Management and Control Institutions
3) superior economic performance
4) high system reliability and availability
5) fast response of local applications
5) good scalability, easy to integrate with existing systems
Disadvantages of DDBS
1) The system overhead is large, mainly in the communication part.
2) complex access structures such as secondary indexing and file link technologies) are an important technology for effective data access in centralized DBS, but not necessarily effective in distributed systems.
3) data security and confidentiality are difficult to process.

5. Data sharding

Type:
1) Horizontal sharding: All tuples in the global relationship are divided into several non-Intersecting subsets based on certain conditions. Each subset is a fragment of the link.
2) Vertical fragment: divides an attribute set of a global relationship into several subsets for projection. Each projection is called a vertical fragment.
3) export part: This is also called export horizontal part. That is, the condition of horizontal part is not the condition of the local link attribute, but the condition of other link attributes.
4) hybrid sharding: a mixture of the above three methods. You can perform horizontal sharding before vertical sharding, vertical sharding before horizontal sharding, or other forms, but their results are different.

Condition:
1) completeness condition: All data in the global relationship must be mapped to the fragment. Data with a global relationship cannot belong to any of its fragments.
2) reconfigurable condition: ensure that the global relationship can be re-built by each segment of the same global relationship. The global relationship can be reconstructed for horizontal sharding and for vertical sharding and join operations.
3) Non-Intersecting condition: Except for primary keys of vertical shards, each data segment obtained after a global relationship is split is required ).

6. Data allocation methods

1) centralized: All data segments are arranged on the same site.
2) segmentation: all data is divided into several logical segments, each of which is assigned to a specific site.
4) full replication: data is stored repeatedly on each site. That is, each site has a complete data copy.
5) HYBRID: This is a distribution method between the split type and the full replication type.

7. Architecture

The separation of Data sharding and data allocation forms the concept of "data distribution independence.
Explicit control of data redundancy. The data is clearly displayed in the allocation mode of each site for easy system management.
The independence of a local DBMS. This feature is also called "Local ing transparency ". This feature allows us to study issues related to DDB management without considering the dedicated data model of a local DBMS.

8. Distributed Database Management System

Accept user requests and determine where to send them, or which computers must be accessed to meet the requirements.
Access the network data dictionary to learn how to request and use the information.
If the target data is stored on multiple computers in the system, distributed processing is required.
Communication interface function. Coordinates between users, local DBMS, and DBMS of other computers.
In a heterogeneous Distributed Processing Environment, data and process porting are also required. The difference between the hardware and software of each site.

9. Small knot

1) the concept of "Distributed Computing" breaks through the framework of centralized DBS. Data Distribution puts the system on the path of distributed DB, and function distribution puts the system on the path of C/S. This is two development of DBS.
2) the C/S system includes a computer network, usually connected by a LAN. In almost all cases, the client is a microcomputer; the server sometimes uses a minicomputer or mainframe, but in most cases it also uses a microcomputer or a high-end microcomputer. Applications are processed on the client, and data management of DBMS and OS is divided into servers.
3) the C/S structure has evolved from two layers, three layers to multiple layers. The general trend is to make the client increasingly "thin" and become a browser, and more types of servers, easy to implement system assembly.
4) the C/S system makes applications closer to users and provides users with better performance and more complex interfaces.
5) distributed systems are developed based on centralized systems. DDB is a product of the combination of database technology and network technology. With the rapid development of computer network technology, DDBS has become the mainstream direction in the database field.
6) DDB features data distribution and logic integrity. DDBS support global applications involving multiple sites. DDB data storage has two policies: Data sharding and data allocation.
7) The DDB schema structure provides a general conceptual structure for understanding DDBS. Distribution transparency is the goal of DDBS. DDBMS is a software system that manages the access, consistency, effectiveness, and completeness of logically integrated data in a distributed environment.
8) the data in DDBS is distributed in various sites. The overwhelming performance goal of the system is to minimize the number of information transmitted over the network and the amount of data transmitted. In distributed queries, semi-join-based optimization strategies are commonly used.
9) for concurrency control and recovery, a large number of problems that cannot be encountered in the centralized DBMS environment will occur in the DDBMS environment. Distributed Concurrency control includes the home site method and the master copy method, which is then supplemented by the backup site technology.

(

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.