Sort data by SSIS

Source: Internet
Author: User
Tags ole ssis

In general, there are two ways to sort data: Use the sort component, and use the TSQL command to ORDER BY. Sorting using the sort component is a blocking operation for SSIS, which means that the SSIS component must wait until all the data is loaded into memory to perform a sort operation on the data according to a specific field. However, if you use the ORDER BY clause of the TSQL command to perform a sort operation in the relational engine and then load the ordered data into memory, this is a streaming operation for SSIS, which means that the SSIS component does not have to wait until all the data is loaded into memory to begin processing the data. It not only saves the memory resources of the system, but also the process of data processing faster.

One, sort by using the sort component

Using the sort component to sort unordered data, you must set the sort key, the position of the sorted column, the type of sort (ascending or descending):

SortType: Ascending ascending, descending descending,

SortOrder: The position of the row sequence, starting from 1, increments in turn,

The sort component also has a property: Remove Wors with duplicate sort values, which specifies whether the sort component removes duplicate rows when the row sequence repeats? This differs from the distinct command, where distinct is the output of all columns that are not duplicated. If this property is checked, it is only guaranteed that the row sequence (part of the output column) is not duplicated, which can also be viewed and set from the Sort transformation Advanced Editor, the default value is False, and no duplicate values are removed:

Second, sort the data by using the ORDER BY clause of the TSQL command

In the data source, the data access mode is selected as SQL command, and the data is sorted by writing the TSQL code.

Step1: Sorts the data using the OLE DB data source component, and the data source component passes down the data in order:

STEP2, open the Advanced Editor for OLE DB and view the input and Output Properties tab

1, click OLE DB Source ouput, set the IsSorted property to True, set the property to true to not sort the data, just inform the downstream component that the output data is sorted.

If the IsSorted property is set to True, the actual data is not sorted and an error occurs when the package runs, so you must supply the sorted data (sort the data by using the ORDER BY clause in the TSQL command)

2, click Output Columns, set the SortKeyPosition property of the Order by column_list one by one

The SortKeyPosition attribute has a sort position and direction two metadata:

Positive integers are sorted in ascending order, 0 means not a sequence, negative integers are sorted in descending order, numbers represent the sequence numbers of rows

For example, for the following TSQL statement:

Select Col_1,col_2,col_3,col_4  from dbo. TableNameOrderASCdesc desc

The output columns needs to be set individually, Col_1,col_2,col_3,col_4 sortkeyposition
Since Col_1,col_2,col_3 is a row sequence, the sequence number increments from 1, and the col_4 is not a sequence, so the configuration of SortKeyPosition is as follows

    • Col_1 's sortkeyposition is 1, the first row sequence, and sorted in ascending order
    • Col_2 's SortKeyPosition is-2, second row sequence, and sorted in descending order
    • Col_3 's sortkeyposition is 3, the third row sequence, and sorted in ascending order
    • Col_4 's sortkeyposition is 0, not a row sequence.

Third, performance comparison

After testing, using a relational database to output an ordered stream of data, can take full advantage of the performance of the relational database, and accelerate the performance of the SSIS engine processing the massive data task. Therefore, when you need to load ordered data from a relational database, we recommend that you use SQL commands to sort the data.

For a data source that needs to load data from a non-relational database, this amount of data is generally small, and the time and resource consumption for sorting the data is small, which can be done quickly for the sort component.

Reference Documentation:

Sort Data for the merge and merge Join transformations

Sort data by SSIS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.