Greenplum is a database repository based on the MPP architecture developed by PostgreSQL database, which is suitable for OLAP systems, and supports the storage and processing of 50PB (1PB=1000TB)-level massive data.
One business today is the need to synchronize the underlying data in an Oracle database to the Greenplum Data Warehouse for data analysis and processing.
Produce around 60G of data per day, and the largest table adds hundreds of billions of data per day.
1) The historical data is initialized by extracting the imported method.
2) Incremental update data:
Use Goldengate to pass Oracle log parsing to the node where Greenplum resides.
The Greenplum node synchronizes goldengate parsed log records incrementally to the Greenplum database repository through a program.
1. Initialize the data for about three days at a time, initializing about 5T of data.
2. The incremental synchronization data is delayed by no more than 3 hours.
3.GreenPlum performance is optimized to 10~100 times faster than queries on the Oracle database (Greenplum's machine configuration is considerably lower).
4. Compression of some large tables reduces the overhead of storage space and I/O.
5. No column storage is used, there are too many columns in the large table, and compression is only done for columns that are not suitable for column-type storage.
6. The distribution keys of some tables are adjusted, which greatly improves the efficiency of data analysis.
Enables incremental synchronization of data from Oracle to Greenplum