SSIS advanced Content Series 4

Source: Internet
Author: User
Tags ssis
1. How to import an Excel file and select Columns

1. You do not need to use "execl Connection Manager". Instead, you need to create "oledb Connection Manager" and select the "Microsoft JET 4.0 ole db" provider;

2. In the extended properties extended property bar, enter "Excel 8.0 ". As shown in:

3. Use the following query statement in the SQL task execution or oledb Source:

SELECT * FROM [SheetName$FromCellName:ToCellName]

For example, to query a total of 1st rows from column A to column E:

Select * from [address $ A1: e501]

2. Package deployment method selection 2.1 Advantages of saving as a file

(1) You can use source code control for management.

(2) high security when the "user key encryption" option is used

(3) not affected by network faults (stored in the Local System)

(4) deployment packages can be handed over to a third party for safekeeping, including complex files

(5) It takes only a few steps to load the file into the designer.

(6) easier access and viewing

(7) packages can be stored in the file system in a hierarchical structure.

(8) projects in Visual Studio are built on disks. Therefore, packages must be saved to the file system.

(9) ease of use during development

2.2 advantages of saving data to SQL Server

(1) database security, DTS role, and "agent" (proxy) interaction.

(2) The package will be backed up along with normal database backup operations

(3) the ability to filter packets through queries

(4) ability to store packages in a hierarchical structure through the new package folder

(5) It is more convenient to use in production.

 

1. Package Template

1. Create a package template named packagetemplate. dtsx.

2 copy to % ProgramFiles % "Microsoft Visual Studio 8" common7 "ide" privateassemblies "projectitems" datatransformationproject "datatransformationitems directory

For example, C: "Program Files" Microsoft Visual Studio 8 "common7" ide "privateassemblies" projectitems "datatransformationproject" datatransformationitems

3. Right-click solution, select "add"> "new item" in the pop-up window, and rename packagetemplate. dtsx as packagetemplate1.dtsx.

For example:

4. generate a new package ID in the newly created package packagetemplate1.dtsx, as shown in:

2. Data Synchronization policy 2.1 incremental data in the data source

1. Search for the incremental timestamp

(1) various databases: The table creation time field and the modification time field or the last modification time field;

(2) SQL SERVER: You can use the auto-increment field or timestamp;

(3) Oracle: sequence query fields are also auto-incrementing;

(4) If the preceding fields cannot be found, analyze the actual business data. For example, if the numbers are not auto-incrementing;

The data for the previous month will not change. The changes are only for this month, so this month's data is incremental data;

(5) There is an incremental change record data table that identifies which fields are insert, update, and delete

2. How to Implement insert, update, and delete

(1) split data tables with incremental change records (insert, update, and delete) so that you can directly process the data simultaneously;

(2) For incremental data:

A: The lookup component Delete can implement insert and update. The delete component can only use the outer join method;

B: The Outer Join and null keyword methods can implement insert, update, and delete, but consume performance when the data volume is large.

C: The dimension component of SCD can easily implement insert and update, and delete can only use the outer join method;

2.2 Data sources without incremental data

If you cannot find the incremental data, you can only delete all the data first and then extract all the data.

Transaction 1.1

1 transaction:

Distributed Transaction Coordinator (DTC) Transactions: one or more transactions that require DTC, which can be connected across, tasks, and packages.

Native transaction: A transaction located at the SQL Serve engine level, managed by a T-SQL transaction command using a single connection.

Note: Microsoft Distributed Transaction Coordinator (MSDTC) allows applications to expand transactions across two or more SQL Server instances. In addition, it allows applications to participate in transactions managed by the Transaction Manager compatible with the X/Open dtp xa standard.

2. Set the transactionoption attribute

Supported: if a transaction already exists in the parent object (package), the container will join the transaction;

Not supported: if a transaction exists, the container will not join the transaction;

Required: if the parent object does not have a transaction, the container starts a transaction; otherwise, it will join the transaction of the parent object.

2. Execution tree 2.1 Data Stream Conversion Type

1. Blocking

Non-blocking conversion: The logic applied in the conversion does not prevent data from being moved to the next conversion after the data rows are converted.

Stream Conversion: You can use pre-cached data to quickly apply the conversion logic and process the computing process in the engine.

Row-based conversion: rows in data conversion flow one by one.

Semi-blocking conversion: Before the memory cache flows downstream, records are retained for a period of conversion in the data stream.

Blocking conversion: receives all data from the upstream and then flows to the downstream conversion or destination.

Synchronous or asynchronous output

Asynchronous conversion output: the cache used in the input is different from the cache used in the output.

Synchronous conversion output: after the conversion logic is executed, the cache is immediately transferred to the downstream conversion.

2.2 execution tree

1. Execution tree Definition

The execution tree is a logical grouping based on the synchronization relationship between data flow components (transformations and adapters.

2. Execution tree Principle

(1) Each component in the execution tree and the remaining synchronization component in the same execution tree apply the conversion logic to the same set of cache sets.

(2) Different execution trees use different cache sets. Therefore, data in the new execution tree must be copied to the new cache allocated for the next embedded execution tree.

(3) The integration service uses one process thread for each execution tree and one process thread for each source adapter.

3. Monitor Data Flow execution
Pipelineexecutionplan: data stream processing process

Pipelineexecutiontree: Group of input and output converted from the execution tree

4. Two important parameters

Maxconcurrentececutables is set to-1 by default. This value indicates to SSIs that the maximum number of concurrent tasks is to add 2 to the total number of task handlers (CPU), and then use this value as the maximum number of concurrent tasks.

Enginethread = source Thread Tree + number of execution trees

Note:

1. In any execution tree, both input and output are converted to synchronous conversion, and vice versa.

2. If the execution tree in the data stream exceeds the available enginethread value, one workthread may be allocated to multiple execution trees.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.