Oracle Golden Gate Architecture detailed (reprint)

Source: Internet
Author: User
Tags passthrough terminates

Excerpted from Iteye Czmmiao http://czmmiao.iteye.com/blog/1550877

Goldengate Introduction
Oracle Golden Gate Software is a log-based structured data replication backup software that enables the source database to synchronize with the target database by parsing the source database online logs or archive logs for incremental changes to the data, and then applying those changes to the target database. Oracle Golden Gate enables a large number of data sub-second real-time replication between heterogeneous IT infrastructures, including almost all common operating system platforms and database platforms, in order to be available in emergency systems, online reporting, real-time Data Warehouse provisioning, transaction tracking, data synchronization, centralized/distributed, Disaster recovery, database upgrades and porting, dual-service centers, and more. At the same time, Oracle Golden Gate can be a one-to-one, broadcast (a-to-many), aggregation (many-to-two), bidirectional, point-to-point, Cascade, and many other flexible topologies.


Goldengate Technology Architecture
As with traditional logical replication, the Oracle goldengate implementation principle is to extract the source side of the redo log or archive log, and then post to the target via TCP/IP, and finally resolve the restore application to the target side, so that the target side to achieve homologous data synchronization. The following is the technical architecture of Oraclegoldengate:

Manager Process
The manager process is a goldengate control process that runs on both the source and target sides. It mainly functions in the following aspects: Start, monitor, restart other processes of goldengate, report errors and events, allocate data storage space, publish threshold report, etc. There is only one manager process on the target side and the source side, and its running status is running good stopped. On a Windows system, the manager process starts as a service, and the second is a system process on the Linux/unix system.
Extract Process
Extract runs on the source side of the database and is responsible for capturing data from the source-side data tables or logs. The role of extract can be divided into time according to the table:
Initial time Load phase: During the initial data load phase, the extract process extracts data directly from the data table on the source side.

Synchronization change capture phase: After initial data synchronization is complete, the extract process is responsible for capturing the source-side data changes (DML and DDL)
Goldengate does not support DDL operations on all databases
The extract process captures all configured object changes that need to be synchronized, but only the committed transactions are sent to the remote trail file for synchronization. When a transaction commits, all the log records related to the transaction are recorded in the transaction unit order in the trail file. The extract process uses its intrinsic checkpoint mechanism to periodically record the location of its reads and writes, which is to ensure that the extract process terminates or the operating system is restarted, Goldengate can revert to its previous state after restarting extract, Continue down from the previous breakpoint. With the above two mechanisms, the integrity of the data can be ensured.

Multiple extract processes can operate on different objects at the same time. For example, the data extracted from another extract process can be used to make a report while the transaction data is extracted from and to the target side in one extract process. Alternatively, two extract processes can take advantage of two trail files, simultaneously extracting and concurrently transmitting to two replicat processes to reduce the latency of data synchronization.
The goldengate generates extract files to store data instead of trail files when the initialization is reproduced, or when the data is synchronized in bulk. By default, only one extract file is generated, but multiple extract files can be generated by configuration if the operating system has a single file size limit or other considerations. The extract file does not record checkpoints.

The status of the extract process includes stopped (normal stop), starting (starting), Running (running), abended (abbreviation for Abnomal end, marking the end of the exception).
Pump Process
The pump process runs on the source side of the database, which is often recommended as a local trail file generated by the source, sending the trail as a block of data through the TCP/IP protocol to the target. Pump process essence is a special form of the extract process, if the trail file is not used, then the extract process is delivered directly to the target side after extracting the data, and a remote trail file is generated.
The process called the server collector, which corresponds to the pump process, does not need to cause my attention because it is transparent to us in the actual operation without having to configure it in any way. It runs on the target side, and its task is to reassemble the data posted by extract/pump into a remote Ttrail file.

Note: Trail files will be generated on the target side regardless of whether the pump process is used
Pump process can be online or batch configuration, he can do data filtering, mapping and conversion, while he can also be configured as "Passthrough mode", so that the data is transmitted to the target side can directly generate the desired format, no additional action. Passthrough mode improves the efficiency of data pump because the resulting object does not need to be retrieved.
In most cases, Oracle recommends the use of data pump for the following reasons:
1, for the target end or network issues to provide protection: If the trail file is only configured on the target side, because the source will keep the contents of the extract process in memory, and timely send to the target side. When a network or target fails, because the extract process cannot send the data to the target in a timely manner, the extract process runs out of memory and then terminates abnormally. If the data pump process is configured on the source side, the captured information is transferred to the hard disk, preventing an abnormally terminated condition. The data pump process sends the trail file on the source side to the target when the fault is repaired, and the source and target side resume connectivity.
2, can support complex data filtering or conversion: When using data filtering or conversion, you can first configure a pump process to the target or source end of the first step of the conversion, using another data pump process or Replicat group for the second part of the conversion.

3. Efficient planning of storage resources: when synchronizing from multiple data sources to a data center, data pump is used to save the extract on the source side and save the trail files on the target side, thus saving storage space.
4. Solve single point of failure of single data source to transmit data to multiple target: When sending data from one data source to multiple target side, each target end can be configured with different data pump processes. This way, if a target end fails or a network failure occurs, the other target side is unaffected and can continue synchronizing the data.
Replicat Process
Replicat process, we also call it the application process. Running on the target side is the last station of data delivery, which reads the contents of the target Trail file and resolves it to a DML or DDL statement and then applies to the target database.
As with the extract process, REPLICAT has its own internal checkpoint mechanism, which guarantees that the restart can be resumed from the last recorded location without the risk of data loss.
The status of the Replicat process includes stopped (normal stop), starting (starting), Running (running), abended (abbreviation for Abnomal end, marking the end of the exception).
Trail File
For more efficient and more secure delivery of database transaction information from the source side to the target. Goldengate introduced the concept of trail file. As mentioned earlier, when extract extracts the data, Goldengate will convert the extracted transaction information into a goldengate proprietary format file. Then the pump is responsible for delivering the trail file from the source to the target, so the source and target will have this file at either end. The purpose of the trail file is to prevent a single point of failure, persist transactional information, and use the checkpoint mechanism to record its read-write location, and if a failure occurs, the data can be re-transmitted according to the location of the checkpoint record. Of course, the remote Trail file can also be generated via the extract via TCP/IP protocol directly to the target side. But this approach can result in data loss, which has already been mentioned, and is not discussed here.
The trail file defaults to 10MB, starting with a two-character 000000~999999 number as the file name. such as c:\directory/tr000001. is stored in the Goldengate dirdat subdirectory by default. You can create different trail files for different apps or objects. At the same time, there will only be one extract process that processes a trail file.

After the 10.0 version of Goldengate, records containing trail file information are stored at the head of the trail file, whereas versions prior to 10.0 do not store the information. The data records in each trail file contain the header area and the data region. Contains transaction information in the header area, and the data region contains the actual extracted data

How processes write Trail files

To reduce the I/O load of the system, the extracted data is stored in a trail file in a large byte block. For compatibility, the data stored in the trail file is stored in a common data pattern (a pattern that can be quickly and accurately translated between heterogeneous databases). Of course, depending on the needs of different applications, data can also be stored in different modes.

By default, the extract process writes to the trail file in an appended manner. When the extract process terminates abnormally, the trail file is marked as requiring recovery. When extract restarts, data appended to checkpoint is appended to the trail file. In versions prior to GoldenGate 10.0, the extract process was in overwrite mode. That is, when the extract process terminates abnormally, the data that follows the last fully written transaction data is overwritten with the contents of the existing trail file.

Here is the author understand is not very thorough, the original text below, hope the reader to give advice

By default, Extract operates in append mode, where if there is a process failure, a recovery marker are written to the TRA Il and Extract appends recovery data to the file so, a history of all prior data are retained for recovery purposes.

In Append mode, the Extract initialization determines the identity of the last complete  transaction that was Written to the trail at startup time. With this information, extract  ends recovery when the commit record for that transaction was encountered in the data  source; Then it begins new data capture with the next committed transaction that Qualifies  for extraction and begins APS Pending the new data to the trail. A data Pump or replicat  starts reading again from that recovery point. 

Overwrite mode is another version of Extract recovery, was used in versions of GoldenGate prior to version 10.0. In these versions, Extract overwrites the existing transaction data in the trail after the last write-checkpoint position , instead of appending the new data. The first transaction that's written is the first one, qualifies for extraction after the last read checkpoint posit Ion in the data source.

Checkpoint

Checkpoint is used for extraction or replication failures (such as system down, network fault lights), extraction, replication process relocation extraction or replication starting point. In an advanced synchronization configuration, you can read the same trail file set by configuring checkpoint another extract or replicat process.

The extract process identifies the checkpoint,replicat in the data source and trail files and only checkpoint is indicated in the trail file.

In batch mode, the extract and replicat processes do not log checkpoint. If the batch fails, the rectification batch is re-processed.

The checkpoint information is stored by default in the Goldengate subdirectory dirchk. In addition to the checkpoint file on the target side, we can also store Replicat checkpoint information by configuring additional checkpoint table.

Group
We can differentiate between different processes by grouping different extract and replicat processes. For example, when you need to replicate different datasets in parallel, we can create two or more replication processes.
The process group contains processes, process files, checkpoint files, and other process-related files. For the replicat process, if checkpoint table is configured, the different groups will contain checkpoint table.
The naming rules for groups are as follows


Ggsci
GGSCI is the abbreviation for goldengate Software command Interface, which provides a very rich set of commands to perform various operations on goldengate, such as creating, modifying, monitoring goldengate processes, and so on.
Commit Sequence Number
The previous article has repeatedly mentioned that Goldengate is to ensure the integrity of the data in the transaction unit, then How does goldengate recognize a business? here is the Commit Sequence number (CSN). CSN is stored in the transaction log and in the trail file for data extraction and replication. CSN is recorded in the trail file as the start of the transaction, and can be viewed through the @getenv field conversion function or the Logdump tool. The CSN for different database platforms is shown below


Goldengate support for different databases

* can only be used as the target end, not as the source side. However, Goldengate can extract data from the original table loaded directly by MySQL. (as the author does not understand MySQL, here is just the literal meaning of translation, the original text is as follows
The exception being that GoldenGate can extract records from MySQL source tables as part of a GoldenGate direct load.
* * Goldengate API tool for transactional data management
Only mirror replication is supported, data manipulation, filtering, field mapping, etc. are not supported.

Reference to: Oracle GoldenGate Administrator Guide
"Enterprise-level it operation and maintenance of the goldengate of the 1th chapter" linkage North
Source: http://czmmiao.iteye.com/blog/1550877

Czmmiao

Oracle Golden Gate Architecture detailed (reprint)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.