ETL Zipper Algorithm Summary Daquan __ algorithm

Source: Internet
Author: User
Tags valid volatile

Zipper Algorithm Summary Daquan:
One, 0610 algorithm (append)
1, the loading date of the deleted warehouse table is the data of this load date to support the re-run
Delete from xxx where Start_dt >= $tx _date;
2. Create a temporary table for storing data extracted from the source table
Create multiset volatile table xxx;
3, inserting data into the temporary table, processing according to certain rules
Insert INTO XXX select ... from xxx;
4, for the temporary table data timestamp directly into the warehouse table
Insert INTO XXX select ... from xxx;
Two, 0611 algorithm (full-cut full plug)
1. Delete the field record of the primary key in the source table in the warehouse table
Delete from XXX where (ID) in (SELECT id from XXX);
2. Insert all data from the source table directly into the warehouse table
Insert INTO XXX select ... from xxx;
Three, 0612 algorithm (History zipper algorithm)
1, the loading date of the deleted warehouse table is the data of this load date, used to support the re-run
Delete from xxx where Start_dt >= $tx _date;
2, modify the End date field of the Warehouse table, the role is to set the end date is greater than the load date and the end date of the data is not the maximum date is set to the maximum date, make it valid
Update set end_dt= $max _dt where End_dt >= $tx _date and End_dt <> Max_dt;
3. Create temporary tables for storing data extracted from the source table
Create multiset volatile table new;
4. Create temporary delta tables to hold incremental data
Create multiset volatile Table Inc;
5, according to certain rules to the temporary table load source table data, depending on the needs of the
INSERT into new select ... from xxx where ...;
6. Compare the data of the temporary table with the warehouse table data, and put the new and changed data into the increment table.
INSERT INTO Inc Select ... from new where: Not in.;
7, to all in the Delta table and is valid data to carry on the chain processing
Update xxx set end_dt= $tx _date where ...;
8, to all the data in the Delta table to pull the new chain processing
Insert INTO XXX Select ... from Inc;
Four, 0614 (with a deleted history zipper algorithm)
1, the loading date of the deleted warehouse table is the data of this load date, used to support the re-run
Delete from xxx where Start_dt >= $tx _date;
2, modify the End date field of the Warehouse table, the role is to set the end date is greater than the load date and the end date of the data is not the maximum date is set to the maximum date, make it valid
Update set end_dt= $max _dt where End_dt >= $tx _date and End_dt <> Max_dt;
3. Create temporary tables for storing data extracted from the source table
Create multiset volatile table new;
4. Create temporary delta tables to hold incremental data, where the source system physically deletes data and identifies it using Min_date
Create multiset volatile Table Inc;
5, according to certain rules to the temporary table load source table data, depending on the needs of the
INSERT into new select ... from xxx where ...;
6. Compare the data of the temporary table with the warehouse table data, and put the new and changed data into the increment table.
INSERT INTO Inc Select ... from new where: Not in.;
7, with the warehouse table of the effective data primary key against the temporary table data primary key than in the warehouse table is not in the temporary table is the source system physical deletion of the field, its end_dt with the min_date identity into the Delta table (this data from the warehouse)
INSERT INTO.. Select ... from the where end_dt= $max _date and etl_job_num=920 and (Agt_num,agt_modif_num) not in (select Agt_num,agt_modif_nu M from New)
8, to all in the Delta table and is valid data to carry on the chain processing
Update xxx set end_dt= $tx _date where ...;
9, to all in the Delta table and END_DT identity is not min_date data to pull the new chain processing
Insert INTO XXX Select ... from inc where END_DT <> $min _date;
Five, 0616 (Economical history zipper algorithm)
1, set the matter no level for RU takes precedence over other transactions
Set session characteristics as transaction isolation level RU
2. Create temporary tables for storing data extracted from the source table
Create multiset volatile table new
3. Create delta tables to hold incremental data this will only store the data for the new changes.
Create multiset volatile Table Inc
4. Create a delete table to hold tombstone data with a special identifier for tombstone data
Create multiset volatile table Del
5. Inserting data into temporary tables according to certain loading rules
INSERT INTO new
6. Compare the data of the temporary table with the Warehouse table data add the changed data to the Delta table
INSERT INTO Inc Select ... from new
7, the source table data has a special identification (generally end_dt=min_date) into the delete table
Insert INTO del Select. From New where end_dt=min_date
8. Off-chain processing of all data in delta tables and delete tables
Update xxx set end_dt= $tx _date where ...
9. Pull new chains for all data in delta tables in addition to the specified items
Insert INTO XXX Select ... from inc where Seq_num <> '
Six, 0613 (tombstone history zip algorithm)
0613 and 0616 are almost the same, except for the final step
9. Pull new chains for all data in delta tables

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.