In the face of the increasing volume of data, data synchronization between different databases, or synchronous import and export of data between different systems. The choice of more and more ways, blindly emphasize what is not the data synchronization, does not mean that the efficiency must be how high, depending on the scene to decide.
Scene One: DB2 export synchronized data to Greenplumn database
DB2 Ordinary database Db2export export speed of about dozens of M, DB2 DPF partition database, multithreading according to the partition can be reached about 200M, DB2 HPU tool speed is fast.
Greenplumn database import speed, copy way can reach 280m/s,gpfdist service can reach 350m/s or so, gpfdist can start multiple services, multiple network card parallel processing speed is faster.
This can lead to a performance bottleneck if you export to the GP database in a DB2 way. DB2 export is only dozens of m, a large amount of data table for example, hundreds of G,DB2 export spending time 2 hours, the actual GP as long as 10 minutes, do not fall way leading to the GP database connection has been occupied, not worth the candle.
In the event of DB2 exporting the intermediate process, locating errors is extremely inconvenient, and the process will take longer to restart.
Scenario Two: Oracle database export sync to greenplumn database
Just like the above, Oracle databases are slower. I've tested one. Exporting from the Greenplumn database to an Oracle database, a dozens of G file would take nearly a day to export from GP to Oracle database.
Conversely, exporting from an Oracle database to GP is mostly spent on Oracle exports.
So the project used in the way to practical maintenance-oriented, memory, CPU all kinds of physical environment needs to be considered synthetically.