Let's open the previous project: my_first_ssis_project_after_step_2.zip
We have already imported 19972 rows to DBO. Contact in the previous project. If the package is re-executed, it will be re-imported. let's solve this problem.
Incremental Load goals)
Before batch processing, let's clarify the incremental import objectives:
- If there are new records in the data source, we need to import these records to the destination.
- If there is a change record in the data source, We need to update the relevant records of the previous import destination.
- If the records in the data source are deleted, We need to delete the records of the previous import destination.
This article focuses on the first point: import new records to the destination.
We first delete some records in the DBO. Contact table to simulate the actual scenario: data in the data source, but not in the destination. Execute the following code
Use AdventureWorksgoDelete dbo.ContactWhere MiddleName Is NULL
We delete all values with null middlename. After the statement is executed, the following information is returned:
(8499 row (s) affected)
Now we only have 11,473 rows in the target table (originally 19,972 rows). The scenario is simulated.
The first thing to do is to check the new line. Because the data source and destination are in the same database, so we can directly write some T-SQL statements to identify these new lines. There are several ways to achieve:
Method 1: locate the data that exists in the data source (person. Contact) but is no longer in the target table (DBO. Contact). Execute the following T-SQL statement in SSMs:
Use AdventureWorksgoSelect FirstName,MiddleName,LastName,EmailAddressFrom Person.ContactWhere EmailAddress Not In (Select Email From dbo.Contact)
This query returns the email address that cannot be configured with the DBO. Contact table in the person. Contact table.
Method 2: Use join to obtain a new row:
Use AdventureWorks go Select src.FirstName ,src.MiddleName ,src.LastName ,src.EmailAddress From Person.Contact src Left Join dbo.Contact dest On src.EmailAddress = dest.Email Where dest.Email Is NULL
The second of the two methods is highly efficient.
The first query takes 6 seconds to return 8,499 rows:
Figure 9
The second query returns the same 8,499 rows, but it takes 0 seconds:
Figure 10
There are many ways to determine which query is better, but this is beyond the scope of our document. (You can refer to the related article query execution plans ).
Now let's change SSIs.
Return to my_first_ssis_project. as you may have guessed, we need to add a component in the data flow task between ole db source and contact (ole db) Destination adapters. first, we need to delete the data flow path connecting them:
Figure 11
After the data flow path is deleted, the ole db destination adapter (contact) Error indicates-red circle X:
Figure 12
The error message is displayed because we deleted the previously configured data flow path. Do you still remember to open the metadata page shown in data flow path before? It is the data flow task and the destination adapter interface.
An interface is a term used in software development. the interface is treated as the protocol between objects ("contracts" connection, the contact is more appropriate = ). delete the data flow path and the connection will be disconnected. therefore, the ole db destination reports an error.
Let's take a closer look at the error message.
Figure 13
The ole db destination adapter prompts that a field is lost and the firstname cannot be located. In fact, other fields are also lost. Why didn't all the missing fields be displayed? In fact, it is very simple, because it does not care about how many fields are lost, it knows that it needs all fields, so as long as one is lost, it will directly report an error.
Add lookup component
Drag lookup transformation and use data flow path to connect ole db source and lookup Transformation:
Figure 14
As the name implies, lookup transformation is used to view other tables, views, or queries for row matching. lookup has some concepts that we will explain one by one when using it. although it sounds simple, there are some special points.
Quirk #1: if the data stream does not match the queried table, view, or column, lookup transformation will fail by default.
Quirk #2: If Multiple matching records are found in the queried table, lookup transformation returns only the one found at the beginning.
Configure Lookup
Double-click lookup transformation, which is displayed as follows by default.
Figure 15
The cache mode data controls the lookup operation mode.
Matt Masson, Microsoft developer and SSIS expert, wrote a document. For details, refer to. lookup cache modes. Here I will give a summary:
In the no cache mode, the lookup operation is performed on one row. That is to say, when a row of data is queried, The lookup operation executes one query.
In full cache mode, The lookup operation tries to load all the queried tables, views, or query data to the memory before the data flow task is executed. you should have noticed the word "attempt (attempts. the lookup operation fails if a table, view, or query returns a large amount of data, or the memory limit (running slowly or not supported by enough memory.
If lookup transformation fails due to memory problems in full cache mode, what should I do? The first option is to use the cache mode. The second option is to use the partial cache mode. There are other methods, but they are not covered in this article.
In partial cache mode, transformation first queries the lookup cache. if no matching data is found in the cache, It is queried in the database. if matched data is queried, the data is added to the lookup cache. the same data can be directly found in the cache.
In this example, I use the full cache mode. Because we only match 19,972 records in a small amount at a time, I will monitor the performance of the SSIS package and adjust it if necessary.
Next, change "specify how to handle rows with no matching entries" to "Redirect rows to no match output ":
Figure 16
Click the connection page and select ole db Connection Manager as "(local). adventureworks". Just like the ole db source adapter, configure the interface here:
Figure 17
Similarly, like the ole db source adapter, You need to select a table or use SQL queries to access data. This example uses the following T-SQL queries:
SELECT Email ,FirstName ,LastName ,MiddleNameFROM dbo.Contact
Figure 18
Click the columns page. there are two tables above. the left side is the input field column (available input columns, that is, the data entered by the previous ole db source). The columns in this column will be placed in the input buffer of lookup transformation. the other one is the lookup field list (available lookup columns, which is the data that the lookup component connects to the manager just now ).
Click the email field in available input columns and drag it to the email field of available lookup columns. Remember that I used to join two tables for T-SQL query? The drag operation you are doing in lookup is similar to join operation.
Figure 19
Click OK and drag the green tip under lookup transformation to OLE DB destination (contact). A prompt is displayed, indicating that lookup no match output is selected:
Figure 20
Why no match? If the email address is in the source table but does not exist in the target table, it is a new row. This is exactly what we need.
Let's unveil the technical aspects of data flow tasks to test the impact of lookup transformation. Right-click the data flow path of ole db source adapter and lookup transformation, and then click Edit:
Figure 21
Click metadata page:
Figure 22
Click data flow path between lookup transformation and the ole db destination:
Figure 23
They are exactly the same! The no match output of lookup transformation directly copies the lookup transformation's input. If no match is found, the data will be directly transmitted to the next data stream. After all the operations are completed, the interface is as follows:
Figure 24
Test it. If you see the following result, it means it is successful:
Figure 25
Let's run it once! :
Target fulfillment
Connection: http://www.sqlservercentral.com/articles/Integration+Services+ (SSIS)/75331/
[Translation] stairway to integration services level 3-incremental data import