SSIS Design6: Leveraging Data flow

Source: Internet
Author: User
Tags ssis

The data stream uses memory to buffer the data and process the data transformation in memory, and because memory accesses are very fast, SSIS data Flow conversion performance is very efficient. SSIS engine loads the data into memory in batches, and when data flow loads a new batch of data into memory, the downstream transformation component processes the already loaded into memory, and the destination component loads the processed data into the target table, reading the data, Conversion and loading are performed simultaneously, which is the advantage of data flow.

1, using data flow to reduce staging and expensive disk IO operations

Disk io is the bottlenect of system performance in a large number of data ETL systems. Compared with the memory Ram,disk IO, the reading and writing speed is very slow, the data in the memory buffer is written to the staging table, and then the data from the staging table is read into memory, which not only multiplies the Disk io, but also restricts the concurrency of the data stream. Keep in mind that the scenario for using a staging table is an update operation that has a large amount of data in the ETL system. This is because data Flow does not have a collection-based update transformation component, and the OLE DB Command transformation component updates the data line -by-row, using a staging table to take advantage of the RDBMS's collection-based update operations on the data. If the ETL system does not have an update operation for large amounts of data, choose to use the data flow to process the data.

2. Reduce dependency on RDBMS with data flow

It is very efficient to use the Data Flow transformation component to join and clean the information, because the data is read, transformed, and loaded simultaneously. RDBMS data processing is atomic, before the data is processed, it can not be processed by other SQL statements, on the contrary, the data stream may be processed in parallel to the data extraction, cleaning, joining and writing, so that the full use of data flow concurrency processing advantages, the overall data processing time will be greatly reduced.

3, using RDBMS to process data sources

When you use the source adapter to get data, sorting by using the database's ORDER BY clause will be more efficient than the sort conversion, and for query statements where the source is a multi-table join, getting the data using the RDBMS will be faster.

SSIS Design6: Leveraging Data flow

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.