Step by step learn Bi (3)-integration services create ETL packages

Source: Internet
Author: User
Tags sql server management ssis

Microsoft integration services is a platform that can generate high-performance data integration solutions, including extracting, transforming, and loading (ETL) packages for data warehouses.

Integration Services includes graphical tools and wizard used to generate and adjust packages; tasks used to execute workflow functions (such as FTP operations), execute SQL statements, and send emails; the data sources and targets used to extract and load data. They are used to clear, aggregate, merge, and copy data. Management services used to manage package execution and storage, that is, integration services; and the application programming interface (API) used for integration services object model programming ).

 

Note:

This article is the author's note in the process of learning integration services. The basic ideas and content are from the SQL Server online series. I think there are few bi resources in the garden, so I put them on.

To learn this article, you must build on the basic understanding of integration services.

If you do not know anything, refer to step by step to learn Bi (1)-Understanding integration services

The database used in this article is Microsoft's adventureworksdw.

The text data used in this article is here

Please download these two data files first, which will be used in the experiment.

 

Objectives:

In this article, we will learn how to use the SSIS designer to create a simple Microsoft SQL server integration services package. The created package will extract data from the flat file and reset the data format, then, the formatted data is inserted into the fact table and the extended package is interpreted as a loop, package configuration, logging, and error stream.

 

Steps:
  1. Create a project and a basic package
    • In this section, we will create a simple ETL package, which can extract data from a single flat file source (txt file) and convert the data using two lookup transformations, then write the dataAdventureworksdwInFactcurrencyrateFact table.

      1. Create an integration services project and name it ssistutorial when creating the project. Rename the default package. dtsx to lesson1.dtsx.

      2. Right-click conection managers at the bottom of the workspace, and choose new flat file connection from the context menu"

      3. In the flat file Connection Manager Editor, name the Connection Manager name.Sample flat file source data. Single-click browerand select the samplecurrencydata.txt file in the open dialog box.

      4. In the flat file Connection Manager Editor, select the advanced option to configure the attributes of each row.

        • SetColumn 0Change name attributeAveragerate

        • SetColumn 1Change name attributeCurrencyid

        • SetColumn 2Change name attributeCurrencydate

        • SetColumn 3Change name attributeEndofdayrate

        • Click suggest types (recommended type ). Integration services automatically recommends the most appropriate data type based on the first 100 rows of data. In the displayed window, no changes are required. Click OK to obtain the data type recommended by integration services. If you click cancle (cancel), no suggestions are provided for column metadata, use the default string (dt_str) data type.

        • But becauseCurrencyidAndCurrencydateThe recommended data type of the column is not compatible with the Data Type of fields in the target table. BecauseDimCurrency.CurrencyAlternateKeyThe data type of is nchar (3 ),CurrencyidYou must change the string type from [dt_str] to [dt_wstr]. In additionDimTime.FullDateAlternateKeyIs defined as the datatime data type, soCurrencydateYou need to change the date type from [dt_date] to the database timestamp type [dt_dbtimestamp]. Therefore, in the "properties" paneCurrencyidFrom the string type [dt_str] To the Unicode string type [dt_wstr ].CurrencydateFrom date type [dt_date] to database timestamp type [dt_dbtimestamp], and finally confirm.

      5. Right-click conection managers at the bottom of the workspace, right-click "New ole db connection" in the context menu, and choose "New" in the displayed configure ole db Connection Manager dialog box, in the Connection Manager window, sever name is local and the authentication method is windows. Select adventureworksdw for the database, and click OK.

      6. Click the "control flow" tab at the top of the main work window, drag a "Data Flow task" from the Toolbox on the left to the control flow design panel, and name it:Extract sample currency data. Note:A good practice is to provide a unique name for all components added to the design drawing. Considering ease of use and maintainability, the name should indicate the functions executed by each component. This is consistent with that in programming and development.

      7. Double-click the created data flow task"Extract sample currency data"Go to the data flow design panel, drag a" flat file source "from the Toolbox on the left to the design panel and rename it"Extract sample currency data", Double-click" flat file source ". In the pop-up editing window, select the" sample flat file source data "created previously for the flat file connection manager. OK.

      8. After the flat file source for extracting data from the source file is configured, the next task is to defineCurrencykeyAndTimekey. Query conversion connects the data in the specified input column to the columns in the referenced dataset for query. The referenced dataset can be an existing table or view, or the result of a new table or SQL statement. Here, you can use the ole db Connection Manager to connect to a database that contains the source data of the referenced dataset. We will add the following two find conversion components to the package and configure them:

        • A conversion is matched according to the flat file.CurrencyidColumn value pairsDimcurrencyDimension TableCurrencykeyQuery the values in the column.
        • A conversion is matched according to the flat file.CurrencydateColumn value pairsDimtimeDimension TableTimekeyQuery the values in the column.
      9. Drag a lookup from the tool to form a data flow design form and name itLookup currency key,Use the output of the extract sample currency data as the input of the lookup currency key to link the two components. Double-click lookup currency key. In general (general), the default cache mode is full cache, and the default connection type is ole db Connection Manager. in the connection option, ole db connection manager selects the previously established "localhost. adventureworksdw, select use results of an SQL query:
          select * from (select * from [dbo].[DimCurrency]) as refTablewhere [refTable].[CurrencyAlternateKey] = 'ARS'OR[refTable].[CurrencyAlternateKey] = 'AUD'OR[refTable].[CurrencyAlternateKey] = 'BRL'OR[refTable].[CurrencyAlternateKey] = 'CAD'OR[refTable].[CurrencyAlternateKey] = 'CNY'OR[refTable].[CurrencyAlternateKey] = 'DEM'OR[refTable].[CurrencyAlternateKey] = 'EUR'OR[refTable].[CurrencyAlternateKey] = 'FRF'OR[refTable].[CurrencyAlternateKey] = 'GBP'OR[refTable].[CurrencyAlternateKey] = 'JPY'OR[refTable].[CurrencyAlternateKey] = 'MXN'OR[refTable].[CurrencyAlternateKey] = 'SAR'OR[refTable].[CurrencyAlternateKey] = 'USD'OR[refTable].[CurrencyAlternateKey] = 'VEB'
    • On the columns tab, drag the currencyid in the available input columns panel to the currencyalternatekey in available lookup columns, and select the currencykey check box.
    • Drag a lookup from the tool to form a data flow design form and name itLookup datekey.Edit lookup datekey."Ole db Connection Manager"Dialog Box to make sure thatLocalhost. adventureworksdw. In the "use tables and views" box, select [DBO]. [dimtime]. On the columns tab, drag currencydate from the available input columns panel to fulldatealternatekey in available lookup columns, and select the check box of timekey. SetLookup currency keyAsLookup datekeyIn the pop-up box, select lookup match output from the selection type.
    • Expand Data Flow destinations in toolbox, drag ole db destination to lookup datekey, and name itSample ole db destination.SetLookup datekeyAs the input of sample ole db destination, connect the component, and select lookup match output in the selection type in the pop-up box. double-click sample ole db destination to confirm that the connection of ole db Connection Manager is localhost. adventureworkdw. in the name of the table or the view box, select [DBO]. [factcurrencyrate]. Click mappings on the left to verifyAveragerate,Currencykey,EndofdayrateAndTimekeyWhether the input column is correctly mapped to the target column. If the same name is mapped, The ing is correct.
    • Run the lesson1 package and run it correctly. We found that 1097 pieces of data were imported to the target database as required.
  2. Add Loop
    • In the previous section, we created a basic package, created a package for extracting data from a single flat file source, and then converted the data using the find conversion function, finally, load the data to the factcurrencyrate fact data table of the adventureworksdw sample database. However, a single flat file is rarely used in the extract, transform, and load (ETL) process. A typical ETL process extracts data from multiple flat file sources. Iterative control flow is required to extract data from multiple sources. One of the most likely functions of Microsoft integration services is to conveniently and quickly add iterations or loops to the package. Integration Services provides two types of containers for the loop traversal package: foreach loop container and for loop container. The foreach loop container uses the enumerator to execute a loop, while the for loop usually uses a variable expression.

    • The foreach loop container enables the package to repeatedly execute the control flow for each member of the specified enumerator. You can use the foreach loop container to enumerate:

      • Ado record set row and Architecture Information

      • File and directory structure

      • Systems, packages, and user Variables

      • SQL Server Management object (SMO)

    •  

      1. Open the project file in the previous section, select the lesson1.dtsx package file, copy the package file, select ssistutorial, and paste it. Rename the copied lesson1.dtsx to lesson2.dtsx. This section will be developed in the lesson2.dtsx package file.

      2. Drag the component foreach loop container from toolbox to the control flow panel and name itForeach file in folderAnd double-click to open it. Select foreach file enumerator in the collection of foreach loop editor. Select a folder in enumerator configuration. Here we select the prepared sample data folder, and fill in filesCurrency _ *. txt.

      3. Select variable mappings on the left and new variable in the variable column. In the Add variable dialog box that appears, enter nameVarfilename. In the Add variable dialog box, confirm the foreach loop editor.

      4. SetExtract sample currency dataDrag the data flow task to the renamedForeach file in folderAs shown in.

      5. Select the previously established sample flat file source data flat file connection at connection managers, right-clickSample flat file source dataAnd then selectProperties.In the expressions pane of the Properties window, click the ellipsis to open the property expressions editor window. In the drop-down menu of the property column, select connectionstring.
      6. Click the ellipsis in the expressions column to open the expression builder window, expand varibales on the left of the window, and drag varfilename: variable to the expression window. Return to control flow work panel.
      7. It was another exciting time to execute lesson2.dtsx and import all the files in the path to the target database according to the rules. Check the database quickly, or you can see the information about the loop output success in the output window.
  3. Add package Configuration
    • The package configuration allows you to set runtime attributes and variables from outside the development environment. With configuration, we can develop flexible and easy-to-deploy and distribute packages. Microsoft integration services provides the following configuration types:

      • Xml configuration file

      • Environment Variable

      • Registry Key

      • Parent package Variables

      • SQL Server table

    • Use the package Configuration Wizard to create an xml configuration to update the directory attributes of the foreach loop container by using the package-level variables mapped to the directory attribute. After the configuration file is created, the variable value is modified from outside the development environment, and the modified property is directed to the new sample data folder. When you run the package again, the configuration file will fill in the value of the variable, and the variable will update the directory attribute. As a result, the package iterates through the files in the new data folder instead of the files in the hardcoded original folder in the package.

      1. Copy, paste, and rename lesson3.dtsx. This section will be developed on the lesson3.dtsx package.

      2. First, create a new package-level variable mapped to the directory property. Since it is a package level, you must first select the package you want to operate. Click on the "control flow" Panel Tab Of The lesson3.dtsx package to set the scope of the variable to the package.

      3. In the SSIS menu, select variables to bring up the sidebar and add a variable varfoldername whose data type is string.

      4. Return to the control flow work panel, double-click the foreach file in folder container, and add the package-level variable we just added to the expressions in the collection of the foreach loop editor. Return to the control flow panel.

      5. On the SSIS menu, select package deployments ..., In the displayed window, enable package configurations and add a package configuration. In the package Configuration Wizard, next, in configuration file name browse to the project directory, enterSsistutorialAs the file name. Next, expand variables in slect properties to export's objects, expand varfoldername, expand properties, and select value. Next, in completing the wizard, the configuration name isSSIS tutorial directory configuration,Complete.

      6. Locate the ssistutorial. dtsconfig file in the saved project directory. Open this XML file in notepad, find the configuredvalue configuration item under the configuration node, and configure a path. This compares the lesson2 package files. To import text files in different folders, you only need to modify the configuration file. It is not as hard code as in lesson2 in the code.

      7. Run the lesson3.dtsx package. The text files in the configured path are cyclically imported to the target database according to the rules.

  4. Add Logging
    • Microsoft integration services includes logging functions that allow us to troubleshoot and monitor packages by providing task and container event tracking. The logging function is flexible and can be enabled at the package level or on various tasks and containers in the package. You can select the event to be recorded or create multiple logs for a single package.

    • Log records are provided by the log provider. Each log provider can write log record information into different formats and target types. Integration Services provides the following log providers:

      • Text Files

      • SQL Server Profiler

      • Windows Event Log

      • SQL Server

      • XML file

    • In this section, we will add and configure logging based on the above section to monitor specific events during package execution.

      1. Copy, paste, and rename lesson4.dtsx. This section will be developed on the lesson4.dtsx package.

      2. On the SSIS menu, select logging ...., In the pop-up configuration window, select SSIS log provider for text files for provider type. Then click Add. [integration services will add a default name to the packageSSIS log provider for text filesThe new text file log provider. Now you can configure the new log provider .], Enter the nameLesson 4 log file,In the configuration column, select new, usage type select Create File, file Click Browse to open the project directory, and enterTutoriallog. LogThe file name of the Composition log.

      3. InContainersIn the pane, expand all nodes in the package container hierarchy, and then clearExtract sample currency dataAll check boxes, including the check boxes. Select nowExtract sample currency dataCheck box to get only events related to this node. InDetailsTabEventsColumn, selectPipelineexecutionplanAndPipelineexecutiontreesEvent.

      4. InProvidersTabNameColumn, selectLesson 4 Log File. After creating a log provider for the package, you can cancel the selection to temporarily close the log record, instead of deleting it and then re-create the log provider.

      5. Run the lesson4.dtsx package and go to the project directory to view our log files.

  5. Add error stay redirection
    • To handle possible errors during the conversion process, Microsoft integration services enables us to determine how to process data that cannot be converted based on each component and column. You can choose to ignore failures in some columns, redirect the entire failed row, or only cause the formation to fail. By default, all components in integration services are configured to fail when an error occurs. If the component fails, the package fails and all subsequent processing is stopped.

    • If you do not want to stop running the package because of a failure, a good way is to make possible handling errors in the conversion through configuration. These errors can be processed. Although you may choose to ignore the failure to ensure that Bao ChenGong runs, it is usually better to redirect the failed row to another processing path, where data and errors can be persisted, accept the check and re-process it at a later time.

    • In this section, a corrupt version of the instance data file will be created, and the corrupt file will be forced to handle errors when running the package. To process the error data, we will add and configure a flat file target, which will write all rows that cannot be found in the lookup currency key conversion to the file.

    • Before writing error data to a file, you must include a script component that uses scripts to obtain error description. Then, the lookup currency key conversion will be reconfigured to redirect all the data that cannot be processed to the script conversion.

    1. Copy, paste, and rename lesson5.dtsx. This section will be developed on the lesson5.dtsx package.

    2. Create a corrupted text file package. There is a sample data2 folder in the downloaded files of this project. We will open the currency_veb.txt file, and use the search and replacement functions of the text editor to findVEBAnd replace itBad. Save the modified fileCurrency_bad.txtSave it in the sample data2 folder.

    3. Delete the log file in the project directory and run lesson5.dtsx to check whether there is any error message. Error: 0xc020901e at extract sample currency data, lookup currency key [26]: Row yielded no match during lookup.

    4. Drag a script component from data flow transformations on the data flow panel to the Panel. in the lower right corner of lookup currency key, select transgormation on the pop-up type. Drag the Red Arrow of lookup currency key to this component and select the type of the error column as redirect row in configure error output.

    5. Rename the componentGet error description,Double-clickGet error descriptionIn the script transformation Editor, select the input columns option and input ColumnErrorcodeColumn.

    6. On the inputs and outputs options, expand output 0, expand output columns, click Add column, and name itErrordescriptionAnd set the data typeString [dt_wstr].

    7. Select the script option and click Edit script ..., Open ssisscript, override the method, and then confirm.

      public override void Input0_ProcessInputRow(Input0Buffer Row)    {        Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);    }
    8. Return to the data flow work panel and add a plane target file for get error description as the output record of the error log. Drag a flat file destination from the toolbox to the bottom of the get error description and name it failed rows. Double-click flat file destination Editor, new... A flat file connection manager. In the pop-up window, select delimited. Name conection manager nameError DataIn browser, open the project file named erroroutput.txt. Return to the flat file destination editor and cancel the check box before overwrite data in the file. Click mappings to confirm that all columns are correctly matched.

    9. The execution of the packet lesson5 and packet will be executed by the hacker. The error data will be recorded in the erroutput.txt file. Please check it out.

     

    I finally finished writing, a little long and a little cool.

    I hope this article will help you.

    Download the complete project file: ssistutorial.zip

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.