Brief introduction:
Greenplum is a database repository based on the MPP architecture developed by PostgreSQL database, which is suitable for OLAP systems, and supports the storage and processing of 50PB (1PB=1000TB)-level massive data.
Background:
One business today is the need to synchronize the underlying data in an Oracle database to the Greenplum Data Warehouse for data analysis and processing.
Scale:
Produce around 60G of data per day, and the largest table adds hundreds of billions of data per day.
Workaround:
1) The historical data is initialized by extracting the imported method.
2) Incremental update data:
Use Goldengate to pass Oracle log parsing to the node where Greenplum resides.
The Greenplum node synchronizes goldengate parsed log records incrementally to the Greenplum database repository through a program.
Final Result:
1. Initialize the data for about three days at a time, initializing about 5T of data.
2. The incremental synchronization data is delayed by no more than 3 hours.
3.GreenPlum performance is optimized to 10~100 times faster than queries on the Oracle database (Greenplum's machine configuration is considerably lower).
4. Compression of some large tables reduces the overhead of storage space and I/O.
5. No column storage is used, there are too many columns in the large table, and compression is only done for columns that are not suitable for column-type storage.
6. The distribution keys of some tables are adjusted, which greatly improves the efficiency of data analysis.
Enables incremental synchronization of data from Oracle to Greenplum