SSIS Design5: Using Staging

Source: Internet
Author: User
Tags ssis

Designing the package in a data stream, moving the core data processing to the data flow, typically reduces the creation of temporary tables, provides high processing performance, and in some cases, the use of a staging table (staging table) optimizes the package design.

1, using collection-based update operations

In large systems, data updates are usually bottleneck of the system, because SSIS cannot perform collection-based updates in data Flow. In data flow, the OLE DB Command transformation component updates the data line by row, performing an update operation on each row of data , which results in poor performance. For data streams with large volumes of updates, the solution to effectively resolve data update problems is to cache the data that needs to be updated to a staging table (staging table), and use the TSQL statement and staging table to perform collection-based update operations on the target data.

2. Restart the package from the wrong point using checkpoint

SSIS checkpoint records the execution result of the control Flow task, and if the conversion in the data Flow task fails, then checkpoint does not preserve the data state. When the package is restarted, the data flow will start from the beginning. If you save the data to the staging table, you can restart it from the staging data. Read the data from the source, load it into the staging table, and then fetch the data from the staging table and apply the transformation logic to it.

3, Increase disk IO

Staging the data into a table multiplies the disk IO, which is eventually saved to disk when using the staging table to temporarily store the data, and the data needs to be read from disk to memory. Because of the need to move a large amount of data, disk IO is usually the bottleneck of ETL, so in the ETL that does not need to update the data, we should reduce the requirement of the staging table, use the data flow to complete the same conversion operation, because the data flow mainly uses memory, the memory can be accessed faster than disk. This will not only reduce the cost of disk IO, but also reduce the time of ETL processing.

SSIS Design5: Using Staging

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.