Fourth integration Services: Incremental load-updating Rows

Last Update:2015-08-26 Source: Internet

Author: User

Tags ole ssis

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is the fourth chapter of the Integration Services series, please refer to the original text for details.

Review incremental load
Remember, there are three use cases in SSIS incremental loading:
1. New rows-add rows to the destination that has been added to the source since the previous load.
2, Updated rows-update rows in the destination that has been Updated in the source since the previous load.
3. Deleted rows-remove rows from the destination, that has been Deleted from the source.
In this article, we will look at the update rows in the SSIS incremental load feature-rows that have changed in the source since they were loaded into the target.
Add update code
Before detecting changes and updating rows, we need to configure another test. Use the same method as before to change the rows in the target table (dbo. person). Open SSMs and connect to your instance of SQL Server. Open a new Query window and enter the following T-SQL statement:

 Use [AdventureWorks2012] Go Update Set MiddleName='Ray'whereisNull

Press F5 to execute the statement, and 8499 rows of records will be updated in the target table. We update the value of MiddleName null to Ray.
Open bids and open the My_first_ssis_project solution. Click on the Data Flow tab and we need to make some changes to the lookup component added to the previous article. Double-click Find to open the Lookup Transformation Editor.
Click on the Columns page. In the upper right part of the page there are tables in tabular form. The left one is marked as an available input column. This contains a list of input columns for the Lookup transformation (remember, the lookup transformation is the output of the source adapter that connects to the OLE DB, and these columns are from there). Another grid marker can be used to find columns. These are the columns of the table, view, or query that are configured on the Connection page.
In the previous article, we mapped the EntityId column of the available input columns to the EntityId column of the available lookup columns. I'll look for a join,
The lines between the EntityId columns represent the on of the join clause. The matching criteria for this "on" definition force the lookup to work.
In the previous article we did not select the check box for the available lookup columns. There is a check box next to the available lookup columns and a Select all check box at the top of the grid. If the lookup transformation is a join, these check boxes are used to add columns from the Join table to the SELECT clause. Click the Select all check box for the available lookup columns.
There is a grid below the available input columns and the available lookup columns. The output alias is used to modify the column names returned by the available lookup columns. To continue the join analogy, the output alias is similar to using as in the query statement to set the alias for the column. I like to use LKUP_ or dest_ to identify the rows returned from the find operation. This makes it easy for me to distinguish the columns that come in from the OLE DB source from the Lookup transformation of the columns. Also, if the column name is the same, SSIS automatically adds (1) after the column name. Here we add lkup_ prefixes to differentiate:

Figure 4.1 Lookup Transformation Editor-tick the available lookup columns
The lookup transformation flows from the OLE DB source adapter into the data flow pipeline, and is loaded from the Person.person table into the data flow. The target table is dbo. person, we configure access through the Connection page in the Lookup Transformation Editor. The Lookup transformation opens the target table and attempts to match the records that exist in the data flow pipeline with the target table. When no match is found, the row with no match is sent to the unmatched output.
We modified the Lookup transformation configuration, and when the EntityId column in the source and target tables found a match, the EntityId, FirstName, LastName, and MiddleName columns in the target table were returned. These columns are added to the data flow record as they flow through the lookup transformation.
Next we add the conditional Split and OLE DB command components on the data flow canvas. Click the Find transform and drag the green arrow to the conditional split:

Figure 4.2 Adding conditional Split and OLE DB commands
The only available green data stream path for the lookup transformation is to find the matching output, so this time we are not prompted to select the output, and the default is to find the matching output. In addition to the data they contain, the unmatched output of the lookup transformation and the matching output are different, the biggest difference being the returned columns.
I'm going to show you. The Lookup transformation output adds a lookup column to the source row. First let's look at the input columns. Right-click the data flow path between the OLE DB source and the Lookup transformation, and select Edit:

Figure 4.3 Editing the data flow path between the OLE DB source and the lookup
When you open the Data Flow path editor, click the Meta Data page, and the path metadata 4.4 shows:

Figure 4.4 Metadata for the data flow path between OLE db source and lookup
The Lookup transformation column is from the OLE DB source. Close the Data Flow path editor, right-click the Lookup transformation and OLE DB Target data flow path called find unmatched output, select Edit to open the Data Flow path editor, click Metadata page:

Figure 4.5 Finding the metadata for the data flow path between the OLE DB destinations
We've seen this meta data in the previous article. The metadata for the lookup transformation's unmatched output is exactly the same as the metadata for the lookup transformation input. This is designed by itself, and lookup can only pass these lines, by converting them cannot find a match.
Because we configured the lookup transformation, there are different situations when a matching record is found: the columns in the target are returned extra. To view these changes to the data flow, right-click the Lookup transformation and the conditional split between the data flow paths called find matching output:

Figure 4.6 Finding metadata for a data flow path with a conditional split
The extra columns added in the data flow path are easily found in Figure 4.6, because I added an alias when we looked for the transform match operation. Close the Data Flow path editor.
In the previous article, we configured to find no matching output, and no configuration returned columns from the lookup table. Because they are not required to load new rows incrementally, we now need them to compare the values in the source and target tables.
Change Detection
What does it mean to find "match" in matching output? It represents the EntityId column in the source table (person.person-through the OLE DB source adapter loaded into the Data Flow task) and the target table (dbo. person-has the same value in the EntityId column that is accessed through the Lookup transformation in the Connection page. We know that the EntityId column values match, but we're not sure if the other columns also match, and we need to know that.
If all the column values of the source and destination match, the records in the source do not make any changes, and we can exclude it from further consideration. If there are changes in the source, we want to capture those changes and apply them to the target. This is the target of the update in the incremental load. First we want to detect the difference and then apply the update.
applying Change Detection in SSIS
Opens a conditional split, which displays three parts when the Conditional Split transformation editor opens. The upper-left section contains two virtual folders: variables and columns:

Figure 4.7 Variables and Columns virtual folder
We primarily manipulate columns, so expand the column virtual folder in Figure 4.7. To detect differences in the source and target columns, we will compare the FirstName, LastName, and MiddleName columns. Why don't we do better than EntityId? The EntityId columns of these rows already match, and in the lookup we use the EntityId column of the source and target tables as the on join condition in the join.
First let's compare FirstName, click FirstName in the Column list and drag to the condition column below the conditional Split Transformation Editor:

Figure 4.8 Drag-and-drop FirstName to the condition
Validation occurs when you release FirstName in the Condition column and click somewhere else. The text will become red because the validation failed. Why does it fail here? The condition must return a Boolean value-True or false. FirstName is a string, at this point you will get an error message if you click OK to try to close the conversion editor.

Figure 4.9 Error message
Click OK, we're not done yet. The upper-right part of the Conditional Split Transformation Editor contains the SSIS expression language.
We want to detect different FirstName and lkup_firstname. In the SSIS expression language, expand the operator virtual folder, and select the Not equal action:

Figure 4.10 Operator Virtual folder is not equal to
Click Not equal to the action and drag to the right of the FirstName column under the conditional expression, and then drag Lkup_firstname from the column virtual file to the right that is not equal to the condition expression:

Figure 4.11 Add not equals and lkup_firstname in the article
Because the expression can now derive a Boolean value, it passes validation and the font becomes black. How is it checked? Detects whether FirstName and Lkup_firstname match in records that match EntityId columns in the source and destination. If the FirstName is different, there is a difference between the source and the target. Because targets are often the source of lag, they assume that the source has updated, better, more accurate data. If that's the case, we need to enter the data into the target.
Because no changes are made to the FirstName column in the test query, this condition returns always false: because they are always equal. We need to add a conditional expression to get all the changes. Let's detach this part of the change detection condition expression by enclosing the expression in parentheses:

Figure 4.12 Wrapping an expression in parentheses
Next we will add MiddleName. We stop to think about what we need and we want to detect any column differences in the source and target. Just one change is enough to trigger an update on the target. So we want to check if one column or the other is not equal. The "Or" Operation meets the requirements.
Drop-down SSIS expression Language list, either logically or by dragging to the right of the expression:

Figure 4.13 Logical OR in the Operator virtual folder
Add the following logic or parentheses, as shown in brackets 4.14:

Figure 4.14 Adding logic or parentheses
From the column virtual folder in the upper-left part of the conversion number, drag and drop middlename into parentheses, drag a non-equal action to the right of the MiddleName column, and drag it into lkup_middlename.
Continue with the previous steps to add LastName to the conditional expression, as shown in the final conditional expression 4.15:

Figure 4.15 Final conditional expression
Before we close the conversion editor, we rename the output name to updated Rows.
You may be bothered by the needless parentheses that are expressed, and in Figure 4.16, I clean out unnecessary parentheses and spaces in the expression:

Figure 4.16 Overall expression
What have we done? The Lookup transformation matches the output to the conditional split transformation. This means that the EntityId and Lkup_entityid columns of these rows have the same values. The EntityId and Lkup_entityid columns are used for the Lookup transformation to perform the matching operation. We configured a condition called updated rows in the conditional Split transformation. Updated rows conditionally separates FirstName, MiddleName, LastName columns that have differences.
Note: The conditional Split transformation directs rows to different outputs. When we define a condition, we also create a new output, a new path, and the data can flow out of the conditional split transformation.
Where is the same row in the source and in the target? Specifies the criteria to be followed when the input row is directed to a specific output. If the input row does not meet any criteria, the row is directed to the conditional split default output.

Figure 4.17 Conditional Split Transformation Editor after configuration
Click OK to close the Conditional Split transformation editor. Click Conditional Split and then drag the green arrow to the OLE DB command, select updated Rows in the Select Input output:

Figure 4.18 Selecting the input and output
At this point your data Flow task should look like 4.19:

Figure 4.19 Data Flow Canvas
We will perform the update of the rows in the target table with the OLE DB Command transformation.
We detect rows where the source differs from the target, sending the updated rows to the OLE DB Command transformation via the updated rows output. It's time to configure the update functionality for OLE DB commands. Right-click the OLE DB command to convert the Advanced Editor to open the OLE DB command.
In the OLE DB command, in the Advanced Editor Connection Manager tab, set the OLE DB connection manager to 127.0.0.1,5377.ADVENTUREWORKS2012.SA. Click on the "Component Properties" tab, drop-down to custom properties, click the ellipsis in the SqlCommand property value text box to open the "String value Editor". Type the following T-SQL statement in the string value:

Update dbo. Person   set=  ?       =  ?       =  where= ?

Figure 4.20 OLE DB Command component Properties
in OLE DB, with a question mark (?) Represents a parameter placeholder. We will map these parameters in the next tab. Click OK to close the String value Editor. Click the Column Mappings tab:

Figure 4.21 OLE db Command Column Mappings
The question marks form a zero-based array. This means that PARAM_0 represents the first question mark, param_1 represents the second question mark, and so on. We release them to the mapping parameters in the available destination columns by dragging the columns in the available input columns. Columns from the source and destination tables exist in the available input columns. Remember that the column prefixed with LKUP_ is actually the target data returned by the Lookup transformation. So we just map the columns of the source table (because we assume the source contains the latest data). Because the first one? Map FirstName, we will firstname map param_0,middlename map param_1,lastname Map Param_2,entityid map param_3:

Figure 4.22 OLE DB command Map column
You have completed the parameter mapping and the OLE DB command configuration. Click the OK button to close the Advanced editor for the OLE DB command. Your data Flow task should look like 4.23:

Figure 4.23 Data Flow canvas
Let's test, press F5 to start the debugger:

Figure 4.24 Data flow tasks during execution
... It can work almost. Let's check the error message on the Progress tab. I found the error, but it was hard to read:

Figure 4.25 Progress label error message
But you can right-click the error message to select the Copy message text:

[ Conditional Split [127]] Error: "Output" Updated Rows "(165!=| | != || != NULL, but the component conditional split (127) requires the result of a Boolean value. Modify the error row handling setting in the output to treat the result as False (ignore failure) or redirect the row to the error output (redirect row). The expression result must be a Boolean value for conditional split. The NULL expression result is not correct.

The problem appears above my expression. Some expressions are evaluated as NULL. When you verify that equals/does not equal, and that one of them is null, the result is not true/false, but null. The conditional expression failed because NULL is not a Boolean value.
I will search for null from the source table: Person.person, which executes three queries:

 Use [AdventureWorks2012]GoSelect *  fromPerson.personwhereFirstName is NullSelect *  fromPerson.personwhereMiddleName is NullSelect *  fromPerson.personwhereLastName is Null

Executing these queries tells me that the MiddleName column of 8499 rows in the source table is null. Remember that at the beginning we updated the record for middlename column NULL in the target table, exactly 8499 rows.
To resolve this problem, stop the bids debugger and then open the Conditional Split transformation editor. We need to separate the null values into a method that allows us to compare the values of the MiddleName column without error. We use the IsNull () function in the SSIS expression language. This function is not a isnull () that we know well in T-SQL. The T-SQL function uses another value instead of NULL. The IsNull () function of the SSIS expression language is a null value detection, which returns a Boolean result (True/false). We combine IsNull () and conditional actions to check for null and return whether they exist. Update the updated rows conditional statement in the following form:

!= || ((ISNULL!= (ISNULL| | != Lkup_lastname)

Figure 4.26 updated updated rows conditional statement
How are we dealing with null? First we use ISNULL (middlename) to detect them. The ISNULL () function returns True/False, and is the condition of the conditional operation. The conditional Action function format is <condition>?<true>:<false>. If MiddleName is null, the true part of the conditional operation (in this case Humperdinck) will be applied. If MiddleName is not NULL, the false part of the conditional operation (in this case middlename) will be applied. So if middlename is null, we use Humperdinck instead for comparison. If MiddleName is not NULL, we are directly comparing the values of MiddleName.
The Lkup_middlename column uses the same logic, and when lkup_middlename is null, we use Humperdinck instead, and if not NULL, we contrast it with a non-null value. This way we avoid the expression null! = null.
It is important to note that Humperdinck cannot be the value of an available middlename. Consider what happens when there is no middlename in the name data that is loaded. Like John Smith. Later, he updated his name information online or through a customer service representative. What happens if John's MiddleName is Humperdinck? The middlename value stored in the target table is empty. The source table has been updated to Humperdinck. When the conditional split updated rows condition is applied to this line, the middlename side of the inequality is not empty, so use the MiddleName (Humperdinck) comparison. The inequality Lkup_middlename side is empty and therefore replaced with Humperdinck. Inequalities were compared by design, but were not able to determine the differences in these values.
Click OK to close the Conditional Split transformation editor and press F5 to re-execute the SSIS package:

Figure 4.27 Re-executing the SSIS package
I pay attention to this to perform some interesting things. First, it executes successfully; second, it takes a while. On my computer, it took nearly 2 minutes.

Figure 4.28 SSIS package execution time
Why does execution take so long? Most of the time is spent on UPDATE statements in the OLE DB command. The OLE DB command processes only one row at a time. is equivalent to SSIS cursors, and SQL Server does not like row-based operations, and they perform too slowly.
Collection-based updates
Is there a way to avoid row-based operations? Of course there is. Let's see how it's used. If your SSIS package is still running, stop it. In the Data Flow task, click the OLE DB command and then delete it, drag an OLE DB target from the Toolbox in its place, and conditionally split the output updated rows to connect to the OLE DB destination:

Figure 4.29 Adding an OLE DB destination
Rename the OLE DB target to stageupdates. Right-click the OLE DB target to open the editor. Under Connection Manager, select 127.0.0.1,5377.adventureworks2012.sa, and accept the default data access mode table or view-fast load. Next to the table or view name, click New:

Figure 4.30 OLE DB Destination Editor
Before we click OK, we make some adjustments, and the table name comes from the name of OLE DB-that's why we started renaming it. The column of the data definition statement, which is read from the metadata of the data flow path. It includes converting SSIS data types to SQL Server data types.
Before we execute, we will remove the LKUP_ prefix column. We only need to save the data from the source to update it. The final statement is modified to:

CREATE TABLE [stageupdates] (    [EntityId] int,    [FirstName] nvarchar( -),    [LastName] nvarchar( -),    [MiddleName] nvarchar( -))

Click OK and the Stageupdates table will be created under the AdventureWorks2012 library. Note the OK button for the OLE DB destination editor is grayed out and has a warning message at the bottom. Click on the Mapping page:

Figure 4.31 OLE DB Destination Editor Mapping page
Column Auto match because the column name and data type are the same. Type is because the available target columns are built from the metadata of the available input columns. Click the OK button. Your data flow is shown in canvas 4.32:

Figure 4.32 Data Flow Canvas
We need to process the rows in the stageupdates before executing. The data that needs to be updated now is saved in the Stageupdates table. We need this data to be applied to the dbo. Person table. Click Control Flow tab and add an Execute SQL task to the canvas. Click the Data Flow task to drag the green arrow to the Execute SQL task:

Figure 4.33 Adding an Execute SQL task
We will perform a collection-based update in the Execute SQL task. Unlike looping a single line of records at a time, collections can be applied to the target table almost once for all updates.
Right-click the Execute SQL task to open the Execute SQL Task Editor. On the General page, modify the Name property to "Apply staged Updates" and modify the connection property to 127.0.0.1,5377.adventureworks2012.sa. Click the ellipsis of the SQLStatement property to open Input SQL query and type the following T-SQL statement:

Update dest    Set = stage. FirstName      =  stage. MiddleName      =  stage. LastName  from  dbo. Person dest   join  dbo. Stageupdates stage     on= dest. EntityId

This statement joins Stageupdates and Dbo.person and applies the update from the stage to the target. Before we execute this package, let's reset the data in the target table and execute the following T-SQL statement:

 use  [   go  update  dbo. Person set  middlename '

Return to bids, and press F5 to execute the SSIS package. My control flow is as follows:

Figure 4.34 Control Flow Canvas
My Data Flow task is as follows:

Figure 4.35 Data Flow task
The Progress tab in bids shows the improved execution time:

Figure 4.36 Improved execution time
2.792 seconds far better than 2 minutes!
We still need to do more with the Stageupdates table. The current configuration will continue to save records to the Stageupdates table and then apply to the update. We can apply the update to dbo. After the person target table, delete the records in the Stageupdates table. But if something "bad" happens during execution, keeping these records during execution allows me to have more data points to check for clues. Therefore, we emptied the Stageupdates table before the data Flow task was loaded, leaving the records in the Stageupdates table intact during execution.
To achieve this, first stop the bids debugger. Then add another Execute SQL task on the control flow canvas to connect the Execute SQL task to the Data Flow task:

Figure 4.37 Adding an Execute SQL task
Double-click to open the Execute SQL Task Editor. The Configuration General page 4.38 shows:

Figure 4.38 Configuring the General page
Below we are ready to execute, click F5 to execute the SSIS package:

Figure 4.39 Control Flow Canvas
The Data Flow task performs an incremental load:

Figure 4.40 Data Flow Canvas
Summarize
In this article, we start with an SSIS incremental load that does not have a change to the data that exists in the target. This applies to high-temperature data for a given date, or to a case of a daily currency exchange rate. Some of the data is changing over time. To capture these changes, we need to detect the difference between the source and the target and apply the update to the target. This article covers several newer methods, which are more efficient as they go back.

Fourth integration Services: Incremental load-updating Rows

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More