The data stream uses memory to buffer the data and process the data transformation in memory, and because memory accesses are very fast, SSIS data Flow conversion performance is very efficient. SSIS engine loads the data into memory in batches, and when data flow loads a new batch of data into memory, the downstream transformation component processes the already loaded into memory, and the destination component loads the processed data into the target table, reading the data, Conversion and loading are performed simultaneously, which is the advantage of data flow.
1, using data flow to reduce staging and expensive disk IO operations
Disk io is the bottlenect of system performance in a large number of data ETL systems. Compared with the memory Ram,disk IO, the reading and writing speed is very slow, the data in the memory buffer is written to the staging table, and then the data from the staging table is read into memory, which not only multiplies the Disk io, but also restricts the concurrency of the data stream. Keep in mind that the scenario for using a staging table is an update operation that has a large amount of data in the ETL system. This is because data Flow does not have a collection-based update transformation component, and the OLE DB Command transformation component updates the data line -by-row, using a staging table to take advantage of the RDBMS's collection-based update operations on the data. If the ETL system does not have an update operation for large amounts of data, choose to use the data flow to process the data.
2. Reduce dependency on RDBMS with data flow
It is very efficient to use the Data Flow transformation component to join and clean the information, because the data is read, transformed, and loaded simultaneously. RDBMS data processing is atomic, before the data is processed, it can not be processed by other SQL statements, on the contrary, the data stream may be processed in parallel to the data extraction, cleaning, joining and writing, so that the full use of data flow concurrency processing advantages, the overall data processing time will be greatly reduced.
3, using RDBMS to process data sources
When you use the source adapter to get data, sorting by using the database's ORDER BY clause will be more efficient than the sort conversion, and for query statements where the source is a multi-table join, getting the data using the RDBMS will be faster.
SSIS Design6: Leveraging Data flow