A medium-sized or large company is often composed of geographically dispersed departments that usually need to share data. The shared data can be stored on a site, and all users are required to access the site. the advantage of this solution is that data consistency is easy to ensure, but its disadvantage is also very prominent, that is, the site has a large load, a large network load, and remote user data response is slow. The data replication technology can effectively solve this problem by copying the shared data to multiple databases located in different locations, so as to achieve local data access and reduce network load, it also improves data access performance and ensures that all users use the same and latest data by regularly synchronizing data in the database (usually every night. This technology is applicable to application modes with a large number of users, a wide geographical distribution, and real-time access to the same data.
Concept and features of Data Replication
1. concept and classification of Data Replication
Data replication is to copy the data in the database to another or multiple physical sites to maintain consistency between the source database and the specified data in the target database.
Based on the Real-time Data replication, data replication can be divided into synchronous data replication and asynchronous data replication. Synchronous Data Replication refers to the replication of local production data to a remote location in full synchronization mode. Each local Io transaction must be released by the remote replication party. Asynchronous Data Replication refers to synchronizing local production data to a remote location. Each local Io transaction is released normally without waiting for the completion of remote replication. Synchronous replication is highly real-time, and remote data is fully synchronized with local data. However, this method is greatly affected by bandwidth, and the data transmission distance is short. Asynchronous replication does not affect local transactions and the transmission distance is long, but its data is slightly delayed than local data. In an asynchronous replication environment, the most important thing for all applications is to ensure data consistency.
Based on the replication site type, data replication can be divided into multi-master Site Replication, materialized view replication, and hybrid replication. Multi-control Site Replication is also called Peer-to-Peer Site Replication. Each site is a master site and must communicate with other sites. Each site is equal. Materialized View replication contains one master site, one or more materialized view sites,
The contents in the materialized view can be all or partially copied to the target primary object at a certain time point, the target primary object can be a table on the master site or a master Materialized View on the materialized view site. Hybrid replication includes multiple master sites and multiple materialized view sites. It is a combination of Master Site Replication and materialized view replication and is suitable for complex business situations.
2. Data replication features
Data Replication creates backups on multiple sites to improve data security and data availability, users can choose other sites to continue the operation, and the application system can continue to run, so that data replication provides a fault tolerance protection mechanism.
However, the most basic function of data replication is to improve the database performance. By copying data from a remote database to a local machine, the application can access data nearby, reducing network transmission load and improving efficiency. In addition, in the data replication system, load balancing between multiple sites can be provided for these users to use this server, while other users can
Use other servers to avoid heavy load on some sites.
Materialized views also provide replication by subset, so that sites can only copy the data they need and reduce the amount of network transmission.
Implementation of Data Replication
Before implementation, we must first design and plan well. This requires careful analysis of specific business conditions, and design a set of solutions to meet business needs. During the design process, you need to determine the database site to be created, the type of each site, the data objects to be copied, the synchronization method, and the conflict solution.
After the design is complete, you can implement data replication. The data replication implementation process is shown in:
We can see that the implementation of data replication mainly includes the following steps:
(1) create a replication site
(2) create a group object
(3) configure a conflict solution
The following is an example to illustrate the specific work that needs to be done in each step. In this example, we use the multi-master Site Replication mode, with two master sites and two shared data tables. The two master sites are: the processing site (Cl. World) and the interpretation site (JS. Wo rld). The two data tables are survey and line ).
Step 1 create a replication site
(1) first log on to the master site database Cl. worldconnect system/manager@cl.world as System
(2) create a user-copy administrator, and authorize the copy administrator to create and manage the copy site. Each copy site must have a copy administrator:
Create user repadmin identified by repadmin;
Begin
Dbms_repcat_admin.grant_admin_any_schema (username => 'repadmin ');
End;
(3) designated disseminators for this site
The disseminator is responsible for transmitting the latest locally updated data to other sites:
Begin
Dbms_defer_sys.register_propagator (username => 'repadmin ');
End;
(4) specified recipient for this site
The receiver is responsible for receiving data sent by the disseminators on other sites:
Begin
Dbms_repcat_admin.register_user_repgroup (
Username => 'repadmin ',
Privilege_type => 'Explorer ',
List_of_gnames => null );
End;
(5) determine the clearing time
In order to ensure that the transferred transaction queue is not too large, you need to clear the successfully loaded transactions from the transaction queue, which is set to be cleared every hour.
Connect repadmin/repadmin@cl.world
Begin
Dbms_defer_sys.schedule_purge (
Next_date => sysdate,
Interval => 'sysdate + 123 ',
Delay_seconds => 0 );
End;
After creating the site Cl. World, create the site Js. world in the same way.
(6) Create scheduling links between master sites
To create a scheduling link between the master sites, you must first establish a database link between the master sites, and then define the scheduling time for each database link.
First, create a database link for the processing site and interpret the site. Here, you need to create a public database link for other private database links.
Connect system/MANAGER@cl.world
Create public database link Js. World using 'js. world ';
Connect repadmin/repadmin@cl.world
Create database link Js. World Connect to repadmin
Identified by repadmin;
Similarly, establish and process the database links of the site on the site of interpretation
Connect system/MANAGER@js.world
Create public database link Cl. World using 'cl. world ';
Connect repadmin/repadmin@js.world
Create database link Cl. World Connect to repadmin
Identified by repadmin;
The scheduling link determines the frequency at which transactions on the site are sent to other sites. The following code is once every 10 minutes:
Connect repadmin/repadmin@cl.world
Begin
Dbms_defer_sys.schedule_push (
Destination => 'js. world ',
Interval => 'sysdate + (1/144 )',
Next_date => sysdate,
Parallelism => 1,
Execution_seconds => 1500,
Delay_seconds => 1200 );
End;
Perform the same work on the interpretation site. Step 2: Create a master control group. In the replication environment, Oracle uses a group to manage replication objects. By placing relevant replication objects in a group, you can easily manage a large number of data objects.
Here we assume that the user mode Integr ation already exists at both the processing site and the interpretation site, and the table measurement area (survey) and line (line) have also been created.
(1) create a master Group Object
Connect repadmin/repadmin@cl.world
Begin
Dbms_repcat.create_master_repgroup (
Gname => 'Inte _ repg ');
End;
(2) add data objects to the master group and add the area Test Table survey to the inte_repg group.
Begin
Dbms_repcat.create_master_repobject (
Gname => 'TE _ repg ',
Type => 'table ',
Oname => 'Survey ',
Sname => 'integration ',
Use_existing_object => true,
Copy_rows => false );
End;
Add the line table to the inte_repg group in the same way.
(3) add other sites that participate in replication to the master group. The synchronization mode between databases is specified here.
Begin
Dbms_repcat.add_master_database (
Gname => 'TE _ repg ',
Master => 'js. world ',
Use_existing_objects => true,
Copy_rows => false,
Propagation_mode => 'asynchronous ');
End;
(4) If a conflict may occur, You need to configure a conflict solution. The conflict solution will be described later.
(5) Generate and copy support for each object
Begin
Dbms_repcat.generate_replication_support (
Sname => 'integration ',
Oname => 'Survey ',
Type => 'table ',
Min_communication => true );
End;
The same is true for line tables.
(6) Start copying again
Begin
Dbms_repcat.resume_master_activity (
Gname => 'Inte _ repg ');
End;
Set the site for interpretation in the same way. After the setting is successful, the data replication process is complete and the data in the database can be copied.
Solutions to conflicts in Data Replication
In the replication environment, although the database and application design process will avoid conflicts between sites as much as possible, the possibility of completely avoiding conflicts is still relatively small, once a conflict occurs, a conflict resolution mechanism based on specific business rules is required to ensure data consistency between sites.
First, we need to analyze which objects are prone to conflicts. Generally, static data changes less, and conflicts are less likely to occur. Some data changes greatly, and conflicts are more likely to occur. After determining the objects that are prone to conflicts, You need to determine how to resolve the conflicts. For example, you need to establish a priority between sites. When data inconsistency occurs, the priority on a site prevails; or take the latest modification on a site as the standard.
Oracle provides multi-medium conflict solutions, including update conflicts, Uniqueness conflicts, and deletion conflicts. In addition to these solutions, you can also customize conflict resolution methods. Each solution has its own application, so we need to select an appropriate conflict solution based on the specific business.
Conclusion
This article introduces in detail the Data Replication Technology in the distributed system oracle. In specific applications, there are still many complicated problems to be solved, for example, how to handle tables or self-related tables that contain circular dependencies in the master control group, how to create a materialized view site using the template mechanism, and how to manage and maintain the data replication environment. These problems need to be explored gradually in practical application and studied in depth.