Processing of deleted data in ODI

Source: Internet
Author: User
Tags ibm db2

First, the Prerequisite knowledge:
When data is extracted from the source database to the Data Warehouse, the following are generally used:

    • Full decimation mode
      If the data volume of the table is small, you can take a full table extraction method to truncate/insert the data extraction.
    • Timestamp-based Extraction mode
      If the source data table is data that is not updatable (for example, most transactional data) or is non-removable (only if the history is invalid), the data that is most recently changed is synchronized according to the change timestamp.
    • Log analysis
      If the timestamp is not updated, or if the source data is deleted, log analysis can be performed to synchronize the latest data changes.

Description: How to timestamp if you want to handle deleting data, you need to create a trigger in the source table to capture the deleted record.
ODI has added CDC (Changed data Capture)Functionality, and includes two ways to add triggers on the source database table to capture new, modified, and deleted data into the log table. The other is through the mining of logs (Oracle's log miner and IBM db2/400). But the ODI CDC must require the source table to have a primary key.
second, the problem of deleting data introduced
When we start using ODI for data synchronization in CDC mode, everything is OK, and the deleted data can be correctly synchronized to the target database. Then, becauseBusiness requirements, we only need to synchronize some of the data to the target database, that is, the source data table plus filtering, so that the problem, the new and modified data can be correctly synchronized to the target database, and the source data table deleted data, after the ODI interface execution, the target data table still exists.
After checking, interface when loading dataQuerying the log view, which is the outer connection of the Log table to the source table, results in that the deleted records have only primary keys in the view, and the rest of the fields are empty, so that filtering based on the log view will inevitably cause the deleted data to be filtered out. The following is an example of an object script created by ODI:

Data source Log View Jv$qp_list_lines
Create or replace view Soau.jv$qp_list_lines as
SELECT Decode (Targ. ROWID, NULL, ' D ', ' I ') Jrn_flag,
Jrn.jrn_subscriber Jrn_subscriber,
Jrn.jrn_date Jrn_date,
jrn.list_line_id list_line_id,
Targ.creation_date Creation_date,
Targ.created_by created_by,
...
From (SELECT l.jrn_subscriber jrn_subscriber, l.list_line_id list_line_id, MAX (l.jrn_date) jrn_date
From Soau.j$qp_list_lines l
WHERE l.jrn_consumed = ' 1 '
GROUP by L.jrn_subscriber, l.list_line_id) Jrn,
Soau.qp_list_lines Targ
WHERE jrn.list_line_id = targ.list_line_id (+)

ODI interface Creating a temporary view for loading data
Create or Replace view
Soau. C$_0qp_list_lines
(
C1_LIST_LINE_ID,
C2_creation_date,
C3_created_by,
...
Jrn_subscriber,
Jrn_flag,
Jrn_date
)
As Select
Qp_list_lines. LIST_LINE_ID,
Qp_list_lines. Creation_date,
Qp_list_lines. Created_by,
...
Jrn_subscriber,
Jrn_flag,
Jrn_date
From Soau. Jv$qp_list_lines Qp_list_lines
Where (1=1)
and (Qp_list_lines. List_line_type_code in (' PLL ', ' PBH '))
and (Qp_list_lines. End_date_active is null or trunc (qp_list_lines. end_date_active) > Trunc (sysdate))
and (Qp_list_lines. pricing_phase_id = 1)
and (Qp_list_lines. Qualification_ind in (4, 6, 20, 22))
and Jrn_subscriber = 'ERP-FK '/* and Jrn_date < sysdate * *

When interface is performing the integration, the data is inserted from the view soau.c$_0qp_list_lines into the flow table, where the flow table is interface processed in the Target intermediate table, and the data is synchronized eventually from flow Table to the target datasheet), since the view has been filtered, the deleted data cannot be inserted into the flow table, causing the deleted data to eventually not be written to the target.
third, the problem- solving process
Since the data is filtered at the source, it is important to ensure that the data is not filtered at the source, and that interface can support the processing of the sources, the Staging, and the targets, and then perform various tests:
1. Move data filtering to Staging
The test found that although the deleted records were included when the data was loaded, during the integration phase, when the data was inserted into the flow table, the remaining fields were empty, except for the primary key, so the deleted data was also filtered out.
2. Move data filtering to the target
After testing, when we move the filter to the target, the deleted data is correctly synced to the target table. So as long as the filter is moved to the target, you can solve the problem?
3, the emergence of new problems
When the data is actually inserted into the flow table, from flow table to the target table, no filtering is performed and all data is synchronized to the Target data table. This means that the data we need to exclude is also written to the target table, indicating that this method failed.
4. Enable filtering of the target table in the model
In the ODI designer, edit the table under model, add filtering, and then remove the filtering of the source data in interface, perform interface discovery, and the data is inserted correctly into the target table. However, this approach is the same as in the previous one, which is to take all the changed data from the source to the target, there are some performance problems.
5, the final way
After research, still feel to modify the original lkm the most practical, only need to put the view from the log view of the number of viewsCode modification, modify the original view code to the record of the Union delete.
The modified create view on source code is as follows:

Create or Replace View <%=odiref.getobjectnamedefaultpschema ("L", "", "W")%><%=odiref.getinfo ("Coll_name")% >
(
<%=odiref.getcollist ("", "[Cx_col_name]", ",/n/t", "", "")%>
)
As Select <%=odiref.getpop ("Distinct_rows")%>
<%=odiref.getcollist ("", "[EXPRESSION]", ",/n/t", "", "")%>
From <%=odiref.getfrom ()%>
Where (1=1)
<%=odiref.getfilter ()%>
<%=odiref.getjrnfilter ()%>
<%=odiref.getjoin ()%>
<%=odiref.getgrpby ()%>
<%=odiref.gethaving ()%>

<%if (!odiref.getjrnfilter (). Equals ("")) {%>

UNION

Select <%=odiref.getpop ("Distinct_rows")%>
<%=odiref.getcollist ("", "[EXPRESSION]", ",/n/t", "", "")%>
From <%=odiref.getfrom ()%>
Where (1=1)
<%=odiref.getjrnfilter ()%>
and jrn_flag= ' D '
<%=odiref.getjoin ()%>
<%=odiref.getgrpby ()%>
<%=odiref.gethaving ()%>

<%}%>
Iv. Episodes
The problem has been solvedsolution, the TAR was submitted to Oracle, after several communications, and finally confirmed as a km bug in the case of the OWC demo.
But Oracle also provides a way to do this without modifying lkm, but it seems to be a bit less good: In each filter code in interface, add " OR jrn_flag = ' D '", This ensures that records deleted in the log table must not be filtered out.

Processing of deleted data in ODI

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.