How to use informatic to implement incremental extraction of tables

Source: Internet
Author: User
Tags add time system log
The data loading strategy mentioned in this paper is the OLTP system as the source system, and
The general data loading strategy used by ETL data to be loaded into OLAP system.
Depending on the specific nature of this approach, the ETL data load generally has the following four kinds of parties
Case:
1. Time Stamp mode
You need to uniformly add time fields as timestamps in the business tables in the OLTP system (such as the time fields already available in the table, which you do not have to add), and modify the timestamp field values whenever you update the business data in the OLTP system. When the ETL is loaded, the data extraction is determined by comparing the system time with the timestamp field.
Advantages: ETL System design is clear, the source data extraction is relatively clear and simple, fast. Can achieve the data of the recursive increase load.
Disadvantage: time-stamping maintenance needs to be done by the OLTP system, need to modify the business table structure in the original OLTP system, and all the added time-stamped tables, in the business system, when the data changes, while updating the time Stamp field, the original OLTP system business operation procedures need to be modified, the workload is large, the change face big, the risk.
2. Log table mode

The System Log table is added to the OLTP system, and when the business data changes, the content of the Maintenance log table is updated, and as the ETL loads, the data is loaded by reading the log table data and how it is loaded.
Pros: No need to modify the OLTP table structure, source data extraction is clear, faster. Can achieve the data of the recursive increase load.
Cons: Log table maintenance needs to be done by the OLTP system, and the OLTP system business operation procedures need to be modified to log information. Log table maintenance is more troublesome, the original system has a greater impact. Heavy workload, large changes. There is a certain risk.

3, the full table comparison method
During the ETL process, all the source data is extracted and the corresponding rules are converted, and the target table is compared to each data after the goal is not inserted before completion. Based on the primary key value of the decision to insert and update, the target table already exists the primary key value, indicating that the record has been, and the remaining field alignment, if there is a different, update operation, such as the target table does not exist the primary key value, indicating that the record has not yet, that is, insert operation.
Advantages: There is no impact on the existing system table structure, do not need to modify the business operation procedures, all extraction rules are completed by ETL, management and maintenance unified, you can achieve the data of the recursive increase load. No risk.
Disadvantage: ETL is more complex than the comparison, design is more complex, the speed is slow
4, the whole table Delete Insert method
Each ETL operation deletes the target table data, and the ETL loads the data completely.
Advantages: ETL loading rules are simple and fast
Disadvantage: For the dimension table plus surrogate key is not suitable, when the OLTP system produces delete data operations, the OLAP layer will not be logged to the deleted historical data. It is not possible to achieve a recursive increase in data load.

As a system data load policy scenario, based on the methods listed above, and the existing system
Consider:
(1), if the integrated OLTP system for other production products, you should try to reduce the ETL as a result of the existing system and the impact of the system risk. And the performance of the impact can be solved by two aspects, part of the hardware upgrade to solve, because the ETL in addition to read table and write table operations, all conversions by the ETL server in memory, so high configuration server will greatly improve the ETL running speed; part of the load time to control, Loading time is taken when the system is more idle, parallel loading, etc., can reduce the impact on the operating system. Therefore, you can use the full table ratio to increase the load of data in the way of such systems as ETL data loading rules.
(2), if the original OLTP system for their own development of products, the OLAP system for the system on the original system, you can consider the use of the time or log table, the difference is only the impact on the original system size.
(3), when the data implementation of the recursive increase load, the OLAP system of the aggregation table, the fact table data in the OLAP two ETL generated, at this time due to the integrity and accuracy of OLAP data, you can use the full table to delete the insertion method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.