ETL Zipper Algorithm Summary total Daquan __ algorithm

Source: Internet
Author: User
Tags comparison valid volatile

Zipper Algorithm Summary Daquan:
One, 0610 algorithm (append)
1, delete the loading date of the warehouse table is the data of the loading date to support the re-running
Delete from xxx where Start_dt >= $tx _date;
2. Create a temporary table for storing data extracted from the source table
Create multiset volatile table xxx;
3. Inserting data into temporary tables and processing according to certain rules
Insert INTO XXX select ... from xxx;
4, for temporary table data on the time stamp directly into the warehouse table
Insert INTO XXX select ... from xxx;
Two, 0611 algorithm (all delete all plug)
1. Delete the field records of the primary key in the warehouse table in the source table
Delete from XXX where (ID) in (SELECT id from XXX);
2. Insert all data from the source table directly into the warehouse table
Insert INTO XXX select ... from xxx;
Three, 0612 algorithms (History zipper Algorithm)
1. Delete the loading date of the warehouse table is the data of the loading date, used to support the re-running
Delete from xxx where Start_dt >= $tx _date;
2, modify the warehouse table's End date field, the effect is to the end date is greater than the load date and not the maximum date of the date is placed to the maximum date, so that it is valid
Update set end_dt= $max _dt where End_dt >= $tx _date and End_dt <> Max_dt;
3. Create a temporary table for storing data extracted from the source table
Create multiset volatile table new;
4. Create a temporary increment table to store incremental data
Create multiset volatile Table Inc;
5, according to certain rules to the temporary table to load the source table data, according to the requirements
INSERT into new select ... from xxx where ...;
6, with the temporary table data and warehouse table data for comparison, the new and changed data into the incremental table
INSERT INTO Inc Select ... from new where ... Not in..;
7, all in the increment table and is valid data for the off chain processing
Update xxx set end_dt= $tx _date where ...;
8, all the data in the incremental table to pull the new chain processing
Insert INTO XXX Select ... from Inc;
Four, 0614 (with deleted history zipper algorithm)
1. Delete the loading date of the warehouse table is the data of the loading date, used to support the re-running
Delete from xxx where Start_dt >= $tx _date;
2, modify the warehouse table's End date field, the effect is to the end date is greater than the load date and not the maximum date of the date is placed to the maximum date, so that it is valid
Update set end_dt= $max _dt where End_dt >= $tx _date and End_dt <> Max_dt;
3. Create a temporary table for storing data extracted from the source table
Create multiset volatile table new;
4. Create a temporary increment table to hold incremental data, where the data from the physical deletion of the source system is stored and identified using Min_date
Create multiset volatile Table Inc;
5, according to certain rules to the temporary table to load the source table data, according to the requirements
INSERT into new select ... from xxx where ...;
6, with the temporary table data and warehouse table data for comparison, the new and changed data into the incremental table
INSERT INTO Inc Select ... from new where ... Not in..;
7, with the warehouse table of the effective data primary key to the temporary table data primary key to the field than in the warehouse table in the temporary table is the source of the physical deletion of the fields, its end_dt with the min_date identity into the increment table (this data from the warehouse)
INSERT INTO.. Select ... from where end_dt= $max _date and etl_job_num=920 and (Agt_num,agt_modif_num) not in (select Agt_num,agt_modif_nu M from New)
8, all in the increment table and is valid data for the off chain processing
Update xxx set end_dt= $tx _date where ...;
9, for all in the incremental table and End_dt identity is not min_date data to pull new chain processing
Insert INTO XXX Select ... from inc where END_DT <> $min _date;
Five, 0616 (Economic History zipper algorithm)
1, set the matter no level for RU prior to other business
Set session characteristics as transaction isolation level RU
2. Create a temporary table for storing data extracted from the source table
Create multiset volatile table new
3. Create an incremental table to hold incremental data only the new changes will be stored here
Create multiset volatile Table Inc
4. Create a delete table for logical deletion data with special identities
Create multiset volatile table Del
5. Insert data into temporary table according to certain loading rules
INSERT INTO new
6, with the temporary table data and warehouse table data comparison of the new changes to the data into the incremental table
INSERT INTO Inc Select ... from new
7, the source table data with special identification (generally end_dt=min_date) into the delete table
Insert INTO del Select ... From New where end_dt=min_date
8, all the data in the Increment table and delete table off the chain processing
Update xxx set end_dt= $tx _date where ...
9, all the data in the incremental table pull a new chain in addition to the specified items
Insert INTO XXX Select ... from inc where Seq_num <> '
Vi. 0613 (historical zipper algorithm for Tombstone)
0613 and 0616 almost the same, except for the last step
9, all the data in the incremental table to pull a new chain

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.