1. What is distributed database:
The distributed database system is developed on the basis of the centralized database system. It is a product of the combination of database technology and network technology.
There are two types of Distributed Database Systems: physical distribution, but logical centralization. This distributed database is only suitable for a single, small unit or department. Another distributed database system is physically and logically distributed, that is, a federated distributed database system. Because the sub-database systems that constitute the Federation are relatively "autonomous", this system can accommodate a variety of databases with different purposes and big differences, and is more suitable for database integration within a wide range.
The Distributed Database System (DDBS) includes the distributed database management system (DDBMS) and distributed database (DDB ).
In a distributed database system, an applicationProgramYou can perform transparent operations on the database, the data in the database is stored in different local databases, managed by different DBMS, run on different machines, supported by different operating systems, and connected by different communication networks..
A Distributed Database is logically unified as a whole: A single logical database is stored physically on different physical nodes in front of users. An application can access databases distributed in different geographical locations through network connections. Its distribution is manifested in that the data in the database is not stored in the same site. More specifically, it is not stored on the storage device of the same computer. This is the difference from a centralized database. From the user's point of view, a distributed database system is logically the same as a centralized database system. Users can execute global applications on any site. Just as the data is stored on the same computer and managed by a single database management system (DBMS), users do not feel the same.
Each database server in a distributed database maintains the consistency of the global database.
A Distributed Database System is a customer/server architecture.
Each computer in the system is called a node. If a node has database management software, the node is called a database server. If a node is an application that requests server information, it is called a customer. Oracle customers can execute database applications to access data information and interact with users. On the server, execute Oracle software to process concurrent and shared data access to the Oracle database. Oracle allows the above two parts to be on the same computer, but it is more effective when the customer and server are connected by different computers through the network.
Distribution processing is a process where multiple processors share a single task. Examples of distributed processing in the Oracle database system are as follows:
The customer and the server are located on different computers connected to the network.
A single computer has multiple processors, and different processors execute customer applications separately.
Each server involved in distributed databases manages databases independently, as if each database is not a networked database. Each database is managed independently and is called site autonomy. Site autonomy has the following benefits:
◆ The system node can reflect the company's logical organization.
◆ Local database administrators control local data, so that each database administrator has a smaller responsibility domain and can better manage it.
◆ As long as a database and network are available, the global database can be partially available. It does not stop all operations or cause performance bottlenecks due to a database failure.
◆ Fault recovery is usually performed on a single node.
◆ Each local database has a data dictionary.
◆ Node software can be upgraded independently.
An object can be accessed from all nodes in a distributed database. Therefore, just like a non-distributed local DBMS, an object must be referenced in a local database. Distributed DBMS must provide a naming mode so that an object in a distributed database can be uniquely identified and referenced in an application. Generally, uniqueness is implemented at each layer of the hierarchy. Distributed DBMS simply expands the hierarchical naming model and implements unique database naming on the network. Therefore, the Global Object Name of an object must be unique in a distributed database.
Oracle allows global object names in SQL statements to reference schema objects (tables, views, and processes) in distributed databases ). In Oracle, the global name of a schema object consists of three parts: the schema name, Object Name, database name, and format of the object, for example:
SCOTT.EMP@SALES.DIVISION3.ACME.COM
A remote query is a query that selects information from one or more remote tables that reside on the same remote node.
A Distributed Query can retrieve data from two or more nodes. A distributed update can modify the data of two or more nodes.
A remote transaction is a transaction that contains one or more remote statements. All the statements referenced by a remote transaction are on the same remote node. A Distributed Transaction contains one or more statements to modify the data of two or more different nodes in the distributed database.
In distributed databases, transaction control must be implemented on the network to ensure data consistency. The two-phase commit mechanism ensures that all database servers involved in distributed transactions commit or roll back all statements in transactions.
The Oracle distributed database system structure allows the Oracle Database Administrator to provide location transparency for end users and applications, and provides location transparency in the Oracle distributed database system using views, synonyms, and processes.
Oracle provides two mechanisms for table duplication transparency in distributed databases: Table snapshots provide asynchronous table duplication, and triggers enable synchronization of table duplication. In both cases, the transparency of duplicate tables is achieved.
In a single-field or distributed database, all transactions are aborted using a commit or rollback statement.
Ii. Classification of distributed database systems:
(1) homogeneous DDBS: each site uses the same type of data model (such as relational databases) and the same type of DBMS.
(2) homogeneous and heterogeneous DDBS: each site uses the same data model, but the DBMS model is different, such as DB2, Oracle, Sybase, and SQL Server.
(3) heterogeneous DDBS: The data models of different sites are different, or even different types. With the development of computer network technology, the problem of network connection between heterogeneous machines has been well solved. In this case, the data in various heterogeneous local databases in the entire network can be accessed by using the heterogeneous DDBS.
Iii. Main features of the distributed database system:
Basic Features of DDBS:
(1) physical distribution: data is not stored in one site, but stored in multiple sites on the computer network.
Logical integrity: data is physically distributed across various sites, but logically integrated. They are shared by all users (global users) and managed in a unified manner by a DDBMS.
(2) site autonomy: Data on each site is managed by a local DBMS and can be processed in an autonomous manner to complete the application (partial application) of the site ).
(3) Collaboration between sites: although each site has a high degree of autonomy, collaboration forms a whole.
Other features of DDBS
(1) Data independence
(2) centralized and autonomous control mechanism
(3) Appropriately increasing data redundancy
(4) Distribution of Transaction Management
Iv. Advantages of the distributed database system:
(1) more suitable for Distributed Management and Control.
The structure of the distributed database system is more suitable for organizations or organizations with geographical distribution. It allows different departments in different regions to implement local control over their own data. For example, global data is input, queried, and maintained locally. In this case, because computer resources are close to users, the communication cost can be reduced and the response speed can be improved, while only a small amount of data is involved in other site databases, this greatly reduces the amount of information transmitted over the network, and improves the security of local data.
(2) flexible architecture.
Centralized database systems emphasize centralized control. Physical databases are stored on one site and managed centrally by a DBMS. Multiple users can only run the DBMS with the support of multiple user operating systems through short-range or remote terminals to share data in the database. However, the autonomy of the local DBMS in the distributed database system enables most local transaction management and control to be implemented in the local environment, you must use the network as a global transaction to manage data that involves other sites. Distributed DBMS can be designed to have different levels of autonomy, from full field autonomy to almost completely centralized control.
(3) The system is cost-effective, with high reliability and good availability.
Compared with a large computer supporting a large centralized database, distributed Database systems supported by super micro computers or super small computers often have higher cost effectiveness and implementation flexibility. Distributed Systems have higher reliability and higher availability than centralized systems. If data is distributed in multiple sites and many copies of data exist, failure of individual sites or communication links will not cause the system to crash, in addition, local system faults do not cause global loss of control.
(4) accelerated response under certain conditions.
If the accessed data is stored in the local database, it can be executed by the user's computer, and the speed is fast.
(5) good scalability, easy to integrate with existing systems, and easy to expand.
For an enterprise or organization, the distributed database technology can be used to develop global applications based on a number of databases created, and some changes can be made to the original local database system to form a distributed system. This makes it easy to build a large database system, saving both time and money and material resources. You can also quickly expand the existing distributed database system by increasing the number of sites.
V. disadvantages of distributed database systems:
(1) high communication overhead and high failure rate.
For example, when the network communication transmission speed is low, the system response speed is slow. communication-related factors often lead to system faults, and the complexity of the system may also lead to a high failure rate. After a fault occurs, the system recovery is complex and the reliability needs to be improved.
(2) the data access structure is complex.
Generally, accessing data in a database during distribution is more complex and overhead than accessing data in a centralized database.
(3) data security and confidentiality are difficult to control.
When the database has a high degree of venue autonomy, the local database administrators of different venues can adopt different security measures, but the global data cannot be ensured to be safe. Security issues inherent in distributed systems. Because distributed systems implement distributed control through communication networks, the communication networks themselves have weaknesses in protecting data security and confidentiality, and data can be easily stolen.
Distributed Database Design, SITE division, and data distribution across different sites are complex. Data Division and distribution have a great impact on system performance, response speed, and availability. The communication speed of different sites is much slower than the access speed of access components of local database systems. The communication system has a high latency, And it is costly to Process Communication Information on the CPU. In a distributed database system, you must pay attention to the design, query processing and optimization, transaction management, concurrency control, and Directory management of distributed databases.
6. Distributed Database System: Data sharding
Type:
1. Horizontal partitioning:
All the tuples of the global relationship are divided into several non-Intersecting subsets based on certain conditions. Each subset is a fragment of the link.
2. Vertical parts:
Divides an attribute set of a global relationship into several subsets and performs projection operations on these subsets. Each projection is called a vertical partition.
3. Export parts:
This is also known as the condition for exporting a horizontal slice. That is, the condition for a horizontal slice is not the condition of the link attribute, but the condition for other link attributes.
4. Hybrid sharding:
The preceding three methods are mixed. You can perform horizontal sharding before vertical sharding, vertical sharding before horizontal sharding, or other forms, but their results are different.
Condition:
(1) Completeness conditions:
You must map all the data of the global relationship to the fragment, and never allow data that belongs to the global relationship but does not belong to any of its fragments.
(2) reconfigurable conditions:
You must ensure that the global relationship can be reconstructed by the fragments of the same global relationship. The global relationship can be reconstructed for horizontal sharding and for vertical sharding and join operations.
(3) Non-intersection conditions:
Each data segment obtained after a global relationship is split does not overlap with each other (except for primary keys for vertical partitions ).
VII. Distributed Database System: Data Distribution Method
(1) centralized: All data fragments are arranged on the same site.
(2) Splitting:
Only one copy of all data is split into several logical segments, each of which is assigned to a specific site.
(4) full replication: data is stored repeatedly on each site. That is, each site has a complete data copy.
(5) HYBRID: This is a distribution method between Split and full replication.
VIII. Distributed Database System: Architecture
The separation of Data sharding and data allocation forms the concept of "data distribution independence.
Explicit control of data redundancy. The data is clearly displayed in the allocation mode of each site for easy system management.
The independence of a local DBMS. This feature is also called "Local ing transparency ". This feature allows us to study issues related to DDB management without considering the dedicated data model of a local DBMS.
9. Distributed Database Management System
Accept user requests and determine where to send them, or which computers must be accessed to meet the requirements.
Access the network data dictionary to learn how to request and use the information.
If the target data is stored on multiple computers in the system, distributed processing is required.
Communication interface function. Coordinates between users, local DBMS, and DBMS of other computers.
In a heterogeneous Distributed Processing Environment, data and process porting are also required. The difference between the hardware and software of each site.
Distributed Database Management System
Reprinted: http://tech.ddvip.com/2008-09/122180807067490_6.html