1. The overall scheduling process, execute a shell script that contains KJB execution information through the crontab timer under Linux
2.xxxx_0_execute_judge conversion has two jobs, by getting a daily synchronization status value to determine whether to perform synchronization work, if the synchronization status is not met, will send an email to inform
3.xxxx_a0_connect_next job contains four parallel execution jobs, Message_prepare_yes job is responsible for obtaining the sync status OK email notification
4. Four parallel jobs are responsible for synchronizing different module data (the whole synchronization principle is that the small table is directly synchronized, the large table is exported by bcp and mapped to the UTF-8 txt external table of the GP corresponding)
5. After all the modules have been successfully executed, a sync-completed email notification will be sent
The above is just records of the ETL Project consolidation framework process, the whole process is more complex is more than more than 10 g of large table compression transfer, GBK to UTF-8 transcoding.
Kettle Implementing a daily synchronization schedule for SQL Server data to Greenplum