"Bi thing" data flow conversion-multicast, Union all, merge, merge Join

Source: Internet
Author: User

Source: "Bi Thing" data flow conversion-multicast, Union all, merge, merge Join

Set up test data:

CREATE TABLEFactresults (NameVARCHAR( -), CourseVARCHAR( -), scoreINT    )INSERT   intofactresults (Name, Course, score)SELECT  'Zhang San' ,                'language' ,                 -        UNION  All        SELECT  'Zhang San' ,                'Mathematics' ,                 the        UNION  All        SELECT  'John Doe' ,                'language' ,                 About        UNION  All        SELECT  'John Doe' ,                'Mathematics' ,                 -        UNION  All        SELECT  'John Doe' ,                'Physical' ,                94SELECT  * fromFactresults

Multicast: A transformation that distributes a dataset to multiple outputs.

Like its name. Multicast can output data from one path to multiple paths, and you might use this transformation to output data to multiple paths. Edit this task, connect it to the input source, and then connect it to multiple destination, except for the name of the task, which has no special editing options.
Note: The multicast is similar to the split transformation, unlike the multi-point transfer, which outputs all the rows, and split will conditionally output some of the rows.

UnionAll: Merges transformations of multiple datasets.

Combining the functionality and merging of all tasks is just the opposite of merging multiple data sources into one result set. For example, merge data from two XML data sources into one output and then feed the data into the keyword extraction task.
To edit this transformation, first connect the first data source to the task and then connect the other data source to the task. Open the edit interface to ensure that the columns are mapped correctly and that the DDIs will automatically adapt to correct mappings. For example, one input character is 20 characters and the other is 50, and the book will be a column of more than 50 characters.

Merge : Merges the transformations of two sorted datasets.

The merge transformation can combine input data from two paths into one output. This conversion is similar to the union all conversion, and it has some limitations:

    • data must be sorted before merging , either by using a sort transformation or by using the order BY statement in the data source
    • the combined metadata type must be the same , for example, CustomerID cannot be numeric in one path but is a character type in another path
    • If there are more than two paths, you need to select the Union ALL conversion

Edit this task to make sure that the data in the two paths is consistent, select the column when the dialog box prompts the data to merge to Path 1 or path 2, if you choose to merge to Path 1, and then connect to path 2. This will eventually be mapped from one path to another, and some of the path data can be ignored.


Merge Join : A transformation that joins two datasets with a full, left, or INNER join.

One of the goals of SSIS is to use tasks and try to ensure that no code is written, a typical example of which is connection merging. This merge allows two inputs to be connected inside or out, and then selectively output. For example, in a data flow that contains human resources information containing EmployeeID, the payroll information is stored in another data stream, you can connect the two paths, get the name from the human resources information, get the employee's salary from the payroll information, and then output from a path.
Note: If the two input paths are in the same database, data connection operations in the OLE DB data source may be more efficient if they are affected in different databases if they are likely to be efficient. This connection can be useful when the two data is not in the same database or you do not want to write code.


Create a Data Flow task in your project with the following data flow:

The function of component multicast is to distribute its inputs to one or more outputs, each with the same output as the input.
So the first multicast name "Multicast dick and Harry" distributes the same content as the previous table.
I use names (Zhang San, John Doe) in the datasheet for each component for easy viewing.

The component "Conditional split" is set as follows, and the name equals "Zhang San" and the name does not equal "Zhang San" respectively output to two multicast.

Next, look at the component "Union all", is "Multicast dick and Harry" and "Multicast only Zhang San" two data content merge up and down, the theoretical result:

Then look at the component "merge joins", which are "sort _ dick and Harry" and "Sort _ John Doe" After the two table data is sorted after the merge connection. Merge mode
The connection type is: Left outer connection. The left is "sort _ dick and Harry", which is the first input.
Condition is (equivalent): On A.name=b.name and A.course=b.course
The output name is redefined.

The output theory result is: (sort is sorted by component "sort _ dick and Harry")

Finally, for the component "merge", the merge is merged with "multicast only Zhang San" after the merge join.
The first input is "sort _ Zhang Sanli" because it has multiple columns (5 rows and 6 columns), whichever is the data structure.
The second input is "sort _ Zhang San" (2 rows and 3 columns)
Merge transformations

As you can see, the number of "sort _ Zhang San" columns is not enough, so when merging with the table above < Ignore, that is, no value. The Union is a sort merge and the theoretical result is:

At this point, the design is complete, now compile! You can see the number of rows passed by the data stream. Results



"Bi thing" data flow conversion-multicast, Union all, merge, merge Join

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.