Notes: How ETL (SSIS) processes Excel sources

Source: Internet
Author: User
Tags ssis

 

Recently, I encountered the need to import an Excel database as the source, so I wrote it down to help myself review and summarize it later.

Step-by-Step Excel Processing

1. Import an Excel file to the target database

This step is the easiest to implement without considering any scalability and fault tolerance processing. The disadvantage is also obvious. The source file name and file path must be fixed and cannot process multiple files in one batch.

2. process multiple files in an extended Loop

This kind of implementation adds a loop on the basis of the previous one, and cyclically imports eligible files under the same folder and processes the corresponding files.

1. First create a folder structure

Files is used to store the Excel files to be imported, and archive is used to store the Excel files already imported into the DB

 

Select foreach loop editor and loop the file elements. Considering that the path is configurable, you can configure the path through expressions. The srcfilepath variable is used to configure the absolute path of an Excel file.

The variable foreachloopfile is used to receive the Excel file name of the loop.

The script task is used to obtain the absolute path of the absolute file name and arcfile.

DTS. variables ["User: srcfilefullname"]. value = DTs. variables ["User: srcfilepath"]. value. tostring () + "\ ltfc \" + DTs. variables ["User: foreachloopfile"]. value. tostring ();
DTS. Variables ["User: arcfilepaththis"]. value = DTS. Variables ["User: arcfilepath"]. value. tostring () + "\ ltfc ";

File System Task is used to move the file to the arcfolder.

3. the above method has already completed the expansion of the Excel source, so the problem is that (difficult to prevent) if a file fails to be imported during the import of multiple files, the loop will stop, subsequent files cannot be imported into the database.

Now we need to extend the fault tolerance to handle the error files. The solution is as follows:

Create a failed folder in the archive folder to store failed files. In addition, the names of imported files and the time prefix are used to identify the imported files and time.

Modify foreach loop container's foreachexecuteresult = success to ignore the error events in the container during the loop process.

Modify failpackageonfailure = true for sequence container 1, indicating that the entire package will return an error in this container.

Modifying the above two attributes is to ensure that only errors occur during Excel processing can ensure that external loops continue to be executed. However, it is a reasonable error to perform operations such as moving and renaming files, once this occurs, the cyclic execution will be terminated.

 

Add a flag variable. The default value is 0 to control the execution flow.

Sequence container is used to import the cyclic Excel data into the DB. If the variable flag = 1 fails or succeeds, the sequence container1 will continue to be executed.

Sequence container1 is used to rename the imported file according to the success and error and put it in the corresponding folder. Flag = 0 to perform the success operation, flag = 1 to perform the failed operation, and failed to perform the reset flag = 0

 

 

Code contained in script task:

DTS. variables ["User: srcfilefullname"]. value = DTs. variables ["User: srcfilepath"]. value. tostring () + "\" + DTs. variables ["User: foreachloopfile"]. value. tostring ();

DTS. variables ["User: failedfilename"]. value = DTs. variables ["User: arcfilepath"]. value. tostring () + "\ catarget \ failed \" + datetime. now. tostring ("yyyymmddhhmmss") + "_" + DTs. variables ["User: foreachloopfile"]. value. tostring ();
DTS. variables ["User: successfilename"]. value = DTs. variables ["User: arcfilepath"]. value. tostring () + "\ catarget \" + datetime. now. tostring ("yyyymmddhhmmss") + "_" + DTs. variables ["User: foreachloopfile"]. value. tostring ();

Of course, the corresponding file conn and Excel conn are both configured through the combined expression of the cyclic Excel file name and other variables.

 

The above are some extensions of the database imported from Excel as the source. Of course, you can also create some log tables in the database to record the Excel import information, which files are imported, and how many lines are imported for each file, description of import failure.

 

Notes: How ETL (SSIS) processes Excel sources

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.