Introduction to Oracle Golden Gate principles

Source: Internet
Author: User
Tags time zones

Http://www.askoracle.org/oracle/HighAvailability/20140109953.html#6545406-tsina-1-3386-14a237f3587652368b695b00507b1475

 

Introduction to GoldenGate OracleGolden Gate is a log-based copy and backup software for structured data. It parses Source Database online logs or archive logs to obtain incremental data changes and then applies these changes to TargetDatabase, to synchronize the source and target databases. Oracle Golden Gate can be implemented in I

Oracle Golden Gate is a log-based copy and backup software for structured data. It parses Source Database online logs or archive logs to obtain incremental data changes and then applies these changes to Target Database, to synchronize the source and target databases.

Oracle Golden Gate can implement real-time synchronization of large amounts of data between IT heterogeneous platforms in seconds. Due to its powerful functions, IT is used in many application systems, such as the online report system, real-time data warehouse supply, data synchronization, data migration, and dual-business center, oracle Golden Gate can implement one-to-one, one-to-many, many-to-one, cascade, and other topology structures.

Oracle also provides the Oracle Golden Gate software download and online learning documentation (11g R2 ).

Shows an Oracle Golden Gate logical architecture for initializing data and synchronizing DML/DDL operations. The suggested changeable mode depends on business requirements.

This process runs the Source System, which is the Extraction (capture Extraction) Mechanism of Golden Gate. It is mainly used for the following purposes:

1. initial Loads: initializes and Loads data. It directly captures and extracts a current and static data set from the source object (here, it refers to using the EXPDP/IMPDP tool to import data from the source database to the slave database, if this tool is not available, you can refer to the Oracle tool EXPDP for details & IMPDP for details)

2. Change Synchronization: Change Synchronization. After the initialization Synchronization is complete, Extract captures DML & DDL operations to keep the Source Database synchronized with another dataset.

Extract can capture source database data in the following ways:

1. for initialization loading, it can be captured directly from the Source Tables

2. From the database's Recovery Logs or Transaction Logs (like Oracle's Redo Log or SQL/MX's Audit Trails), the method for capturing data actually depends on the Database Type

3. Third-party capture module. This mode provides a communication interface to transmit data and source data from external APIs to Extract APIs, which are provided by data providers or third-party vendors.

When Extract is configured as Change Synchronization, Extract will capture DML & DDL executed on the configured OBJECT, and Extract will store these operations until the transaction of the operation is committed or rolled back; when a transaction rollback is received, Extract discards the operations in the transaction. When a transaction commit is received, Extract will persist the transaction to a series of files on the disk, these files are called trails, which are queued for being transmitted to the Target Database. All the operations in the transaction are written to the Trail by serializing the Transaction Unit of the Organization. This design ensures the speed and data integrity.

You can configure multiple Extract processes to operate on different objects at the same time. For example, when the amount of data changes is too large, two Extract processes can be used to concurrently Extract and transmit data to two Replicat processes to minimize the target wait time. Each Extract needs to be assigned a Group.

This component is the level-2 Extract Group configured in Oracle Golden Gate on the Source side. If Data Pump is not used, Extract is responsible for transmitting captured operations to the Remote Trail on the Target side. In the typical configuration of using Data Pump, the Primary Extract process is only responsible for writing captured operations to the Trail file of the Source Database, data Pump is responsible for reading Remote trails that are sent to the Target Database over the network. Data Pump increases storage flexibility and effectively isolates the Primary Extract process from TCP/IP.

Generally, Data Pump can perform filtering, ing, conversion, and other operations, or set it to Pass-through mode. In this mode, Data is transmitted passively, this mode increases the throughput of Data Pump because it can bypass the function of searching for object definitions.

Data Pump has the following advantages:

1. Protection for network and target failures

In the Golden Gate of a basic configuration, only the Target end has a Trail. Extract extracts data in the memory continuously, but does not store the Trail on the Source end. If the network or Target end becomes unavailable, the Extract process may terminate abnormally due to insufficient memory. When the network or Target end returns to normal, the data pump will send the data captured from the Source Trail to the Target end.

2. Data Filtering in multiple stages

When complex data filtering is required, you can use data pump to perform the first filtering on the source or target end, or even on the intermediary system intermediate system, then use another data pump or replicate to perform the second filter.

3. Merge data from multiple data sources to one target

When you need to synchronize databases from multiple sources to the target system, you can configure the Extract Extraction Process on each Source, then, Use data pump on each Source to send them to the target trail. This can split the Storage Load of source and target.

4. Single Source to multiple targets

When you need to synchronize multiple targets, you can set a data pump on each Target. In this case, if any Target has a problem, the data can be sent to other targets to ensure data validity.

This component runs on the Target end. It reads the Trail files on the system, reconstructs DML and DDL operations, and applies them to the Target database. You can configure the Replicat process for one of the following purposes:

1. Initial Loads: used to initialize data loading,

2. change Synchronization: when it is configured as Change Synchronization, the Replicat process uses the local database interface or ODBC, depending on the data type, and applies the Data Operations copied from the Source to the Target; to ensure data integrity, Replicat applies replication operations as submitted by the source.

You can configure multiple Replicat processes to increase throughput. To ensure data integrity, different Replicat processes process different object sets and assign a Group to each Replicat process.

Delayed replication allows you to wait for a specified time before the application replication operation reaches the target. This latency can be designed, such as preventing erroneous SQL propagation, controlling the arrival of data from different time zones, or giving another scheduled event occurrence time. The delay length is controlled by the DEFERAPPLYINTERVAL parameter.

To support continuous extraction and replication of database changes, Oracle Goldengate temporarily stores changed records in a series of files on the disk. These files are called trails. The Trail can appear on the source, target, intermediary, or their combination, depending on how you configure Oracle Goldengate. In the local system, it is called extract trail (local trail ). In remote system, it is called remote trail.

Using trail as storage, Oracle Goldengate supports data accuracy and fault tolerance. The use of trail also makes the extraction and replication activities happen independently of each other. Due to the separation of these processes, you have more options for data processing and delivery. For example, if replace extraction and replication occur consecutively, you can make the extraction happen consecutively and then copy the target when necessary.

The Extract and Data Pump Processes on the Source end are responsible for writing the Trail, and each extract process must be connected to a Trail.

Read the Trail process:

Data-pump Extrac: Extract DML and DDL operations from the Local Trail connected to the previous Extract

Replicat: Read the Trail file and apply the copied DML and DDL to the target database.

The Trail file will be created by yourself during processing, but you need to specify a two-character name for it when you add exttrail or add rmttrail to the Oracle Goldengate configuration. By default, trails is stored in the dirdat subdirectory under the Oracle Goldengate installation directory.

Complete trail files are automatically increased without internal maintenance to support continuous processing. When a new trail file is created, it inherits the names of two characters and appends a unique 6-digit sequence number to the end. The sequence ranges from 000000 to 999999. For example: /home/ogg/dirdat/lt000001. When the number reaches 999999, the next sequence starts from 000000.

You can create multiple trails to separate different objects or applications. You can add the exttrail or rmttrail parameter before the table and sequence parameters in the extract parameter file to specify the trail connected to the object. The increasing trail can be purge through the PURGEOLDEXTRACTS parameter in the Manager parameter file.

In some configurations, the data extracted by Oracle Goldengate is stored in an extract file instead of a trail. This extract file can be a separate file, it can also be configured as multiple scroll files due to the file size limit of the operating system. In this scenario, it is similar to a trail file, but it does not record checkpoint, the file is automatically created at runtime. The same version function for trail also applies to extract file.

For the purpose of restoration, the Checkpoints process stores the current writes and reads to the disk. Checkpoints checks whether changes to the marked data to be synchronized are captured by Extract and applied by Replicat, and prevents data redundancy. It provides fault tolerance for data loss because the network, system, and Oracle Goldengate processes need to be restarted. For complex synchronization configurations, checkpoints enables multiple Extract or Replicat processes to read the same trail set. Checkpoints uses internal processes to confirm to prevent message loss in the network.

Extract creates checkpoints for its location in the data source and trail. Because Extract only captures committed transactions, it must keep track of all open transaction commit event operations. This requires Extract to record a checkpoint. This point is the position in the current transaction log, and the starting position of the oldest open transaction can be the current log or the previous log.

To control the volume of transaction logs that must be reprocessed after a power failure, Extract processes the current status and data to the disk at a specified interval, including status and long transaction data (if any ). If Extract stops a period of time between these intervals, it can be restored from the previous interval or the last checkpoint, instead of returning to the location where the first log of the oldest long transaction appears.

To create checkpoints in a trail, Replicat stores its checkpoints on a checkpoint table in the target database, so that the transactions it commits correspond to its position in the trail. The Checkpoint table ensures consistency by ensuring that a transaction is only applied once during database recovery, even if the replicat process or database process fails. For reporting purposes, Replicat also has a checkpoint file stored on the disk in the dirchk subdirectory under the Oracle Goldengate installation directory.

For non-continuity, you can re-run the configuration from the start point if necessary, such as initializing and loading. Checkpoints is not required.

Manager is used to control the process of Oracle Goldengate. Before Extract and Replicat can run, the Manager must be running on each system, and the Manager must be running to execute the management function during which processes are running. Manager performs the following functions:

Start the Oracle Golden Gate process, start the dynamic process, maintain the process port, execute Trail management, create event, error, and threshold reports. A Manager process can control multiple Extarct or Replicat processes. In windows, the Manager can run as a service. Example:

123456 GGSCI (ogg2) 1> edit params ./GLOBALSMGRSERVNAME OracleGoldenGateggschema oggC:\>cd D:\goldengateC:\>D:D:\goldengate>install addservice addevents

# In this way, an OracleGoldenGate service can be seen in windows Services.

During Continuous and online change synchronization, Collector is a background process running on the target. Collector does the following:

1. Submit a connection request from remote Extract to the Manager, scan and bind an available port, and send this port to the Manager for allocating the request to the Extract process.

2. Accept the extracted database changes sent from Extract and write them to a trail file. When there is a network connection request, the Manager automatically starts Collector, and the Oracle Goldengate user does not need to interact with it. Collector can only receive information from one Extract process, so each Extract you use has a corresponding Collector

By default, the source-side Extract initializes TCP/IP to connect to the target-side Collector, but Oracle Goldengate can also be configured to initialize the connection by the target-side Collector. The initialization of the connection from the target end may be required. For example, the target end is in a trusted network area, and the source end is in an untrusted network area.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.