ETL Zipper Algorithm Summary Daquan

Source: Internet
Author: User
Tags volatile

Zipper Algorithm Summary Daquan:
One, 0610 algorithm (append)
1, the loading date of the deletion warehouse table is the data of this loading date to support the re-run
Delete from xxx where Start_dt >= $tx _date;
2. Create a temporary table for storing data extracted from the source table
Create multiset volatile table xxx;
3. Insert data into the temporary table. Processing according to certain rules
Insert INTO XXX select ... from xxx;
4, for the data of the temporary table time stamp directly into the warehouse table
Insert INTO XXX select ... from xxx;
Two, 0611 algorithm (full-cut full plug)
1. Delete the field record of the primary key in the source table in the warehouse table
Delete from XXX where (ID) in (SELECT id from XXX);
2. Insert all data from the source table directly into the warehouse table
Insert INTO XXX select ... from xxx;
Three, 0612 algorithm (History zipper algorithm)
1, the loading date of the deletion warehouse table is the data of this loading date, used to support the re-run
Delete from xxx where Start_dt >= $tx _date;
2. Change the end date field of the Warehouse table. The effect is to place the end date of the data with the end date greater than the load date and not the maximum date to the maximum date. Make it effective
Update set end_dt= $max _dt where End_dt >= $tx _date and End_dt <> Max_dt;
3. Create a temporary table for storing data extracted from the source table
Create multiset volatile table new;
4. Create a temporary delta table for storing incremental data
Create multiset volatile Table Inc;
5, according to certain rules to the temporary table to load the source table data, depending on the demand
INSERT into new select ... from xxx where ...;
6, using the temporary table data and warehouse table data as a comparison, the new and changed data into the Delta table
INSERT INTO Inc Select ... from new where: Not in.;
7, for all in the Delta table and is valid data for the closed chain processing
Update xxx set end_dt= $tx _date where ...;
8. New chain processing for all data in delta tables
Insert INTO XXX Select ... from Inc;
Four, 0614 (with a deleted history zipper algorithm)
1, the loading date of the deletion warehouse table is the data of this loading date, used to support the re-run
Delete from xxx where Start_dt >= $tx _date;
2. Change the end date field of the Warehouse table. The effect is to place the end date of the data with the end date greater than the load date and not the maximum date to the maximum date. Make it effective
Update set end_dt= $max _dt where End_dt >= $tx _date and End_dt <> Max_dt;
3. Create a temporary table for storing data extracted from the source table
Create multiset volatile table new;
4. Create a temporary delta table for storing incremental data, where the source system physically deletes data and identifies it using Min_date
Create multiset volatile Table Inc;
5, according to certain rules to the temporary table to load the source table data, depending on the demand
INSERT into new select ... from xxx where ...;
6, using the data of the temporary table and warehouse table data as a comparison. Storing new and changed data in delta tables
INSERT INTO Inc Select ... from new where: Not in.;
7, using the Warehouse table valid data primary key and the temporary table data primary key as the control in the warehouse table not in the temporary table is the source system physically deleted fields. The END_DT is stored in the Delta table with the Min_date ID (this data is from the warehouse)
INSERT INTO.. Select ... from the where end_dt= $max _date and etl_job_num=920 and (Agt_num,agt_modif_num) not in (select Agt_num,agt_modif_nu M from New)
8, for all in the Delta table and is valid data for the closed chain processing
Update xxx set end_dt= $tx _date where ...;
9, for all in the Delta table and END_DT identity is not min_date data to pull the new chain processing
Insert INTO XXX Select ... from inc where END_DT <> $min _date;
Five, 0616 (Economical history zipper algorithm)
1, set the matter no level for RU precedence over other transactions
Set session characteristics as transaction isolation level RU
2. Create a temporary table for storing data extracted from the source table
Create multiset volatile table new
3. Create delta tables to hold incremental data only the newly changed data will be stored here.
Create multiset volatile Table Inc
4. Create a delete table to hold tombstone data with a special identifier for tombstone data
Create multiset volatile table Del
5. Inserting data into the temporary table in accordance with certain loading rules
INSERT INTO new
6, using the data of the temporary table and the warehouse table data as a comparison of the newly changed data into the Delta table
INSERT INTO Inc Select ... from new
7, the source table data has a special identification (generally end_dt=min_date) into the delete table
Insert INTO del Select. From New where end_dt=min_date
8, to all in the Delta table and delete the data in the table to close the chain processing
Update xxx set end_dt= $tx _date where ...
9. Pull new chain for all data in delta tables in addition to the specified item
Insert INTO XXX Select ... from inc where Seq_num <> '
Six, 0613 (tombstone history zip algorithm)
0613 and 0616 almost the same, except for the final step
9. Pull new chain for all data in Delta table

ETL Zipper Algorithm Summary Daquan

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.