Step by step learning Bi (1)-Understanding integration services

Source: Internet
Author: User
Document directory
  • Source
  • Conversion

Integration Services is a platform for generating high-performance datasets and workflow solutions (including data warehouse extraction, conversion, and loading (ETL) operations.

Integration Services includes graphical tools and Wizard for generating and debugging packages, and tasks for executing workflow functions such as FTP operations, SQL statement execution, and email message passing; the data sources and targets used to extract and load data. They are used to clean, aggregate, merge, and copy data. The Management Service is the integration services service used to manage the integration services package; and the application interface (API) used for programming the object model of integration services ).

1. After installing SQL Server 2008, start SQL server business intelligence development studio in the "Microsoft SQL Server 2008" program directory.

2. after "SQL server business intelligence development studio" is started, select "Business Intelligence Projects" in project types on the left in the "New Project" pop-up window and select integration services project in templates on the right.

3. In the newly created integration services project, we find that the directory structure in Solution Explorer is as follows:

Data sources: a data source that contains project-level data sources that can be referenced by multiple packages.

Data sources views: Data Source view, which is built on the data source and can be used for source, conversion, and target reference.

SSIS packages: SSIS package. A package is an organizational set that includes connections, control flow elements, data flow elements, event handlers, variables, and configurations, you can use the graphic design tools provided by SQL Server integration services or combine these objects into a package by programming. Then, you can save the package to the SQL Server, SSIS package storage area, or file system. A package is a unit of work that can be searched, executed, and saved. After creating a package for the first time, the package is an empty object and cannot implement any functions. To add a function to a package, you can add a control flow to the package and add one or more data streams. The following diagram shows a simple package, which contains a control flow with a data flow task, and a data flow task contains a data flow.

Miscellaneous: Miscellaneous, including files other than source files, data source view files, or package files.

4. From the understanding of SSIS packages in 3, we found that the main control flow in the package is control flow, and the control flow contains one or more tasks and containers executed when the package is running. To control the running sequence of tasks or containers in the control flow of a package or define running conditions, you can use priority constraints to connect tasks and containers in the package. You can also Group tasks and containers into subsets and run them repeatedly as a unit in the package control flow. Is the control flow. "Control Flow task" and "Maintenance Plan Task" are shown on the left ". The ETL process starting with integration services starts from a control flow task and performs data flow operations in the control flow task.

Control Flow task: a task is a control flow element that defines the units of work executed in the control flow of a package. The SQL Server integration services package consists of one or more tasks. If the package contains multiple tasks, they are connected and sorted in the control flow according to the priority constraints.

Maintenance scheduled tasks: SQL server integration services contains a group of tasks for performing database maintenance functions. These tasks are usually used in the database maintenance plan, but can also be included in the SSIS package.

5. switch from the table Menu Control Flow (Control Flow Design window) on the master workspace to data flow (Data Flow Design window ).

SQL Server integration services provides three types of data flow components: source, conversion, and target. The source extracts data from the data storage area (such as tables and views, files, and analysis services databases in relational databases. Convert, modify, summarize, and clear data. The target loads data to the data storage area or creates a dataset in the memory. In addition, integration services also provides a path to connect the output of one component to the input path of another component. The path defines the sequence of components and allows you to add comments to the data stream or view the column source. The following diagram shows that the data stream has a source, a conversion with an input and an output, and a target. In addition to the input column, output column, and external column, this graph also contains the input, output, and error output.

Source

In integration services, the source is a data flow component that allows other components in the data stream to access data from different external data sources.

The source of a data stream usually has a regular output. Regular output includes output columns that are added from the source to the data stream.

The general output references external columns. An external column is a source column. For exampleAdventureworksDatabaseProductTableMadeflagColumns are external columns that can be added with regular output. Metadata of an external column contains information such as the name, data type, and length of the source column.

The error output of the source contains the same columns as the regular output. In addition, two additional columns related to the error information are provided. The object model of integration services is not limited to the number of common outputs and error outputs that the source can have. Except for the script components, most sources in integration services have a common output, and many sources have an error output. For custom sources, multiple common output and error output can be achieved through encoding.

All output columns can be used as input columns of the next data flow component in the data stream.

Conversion

The conversion function is very extensive. Conversion can perform tasks such as update, summary, clear, merge, and distribute data.

The input and output of the conversion define the columns that pass in and out data. Based on the operations performed on the data, some transformations have one input and multiple outputs, while others have multiple inputs and one output. Conversion can also contain error outputs, which provide information about errors and failed data, for example, string data that cannot be converted to integer type. The integration services object model does not limit the number of inputs, general outputs, and error outputs that can be included in the conversion. You can create custom transformations that can be combined with multiple inputs, common outputs, and error outputs.

The converted input is defined as one or more input columns. Some integration services conversions can also reference external columns as input. For example, the input for ole db command conversion contains external columns. The output column is the column that is converted to the data stream. Both common output and error output contain output columns. These output columns act as the input columns of the next component (Other transformations or targets) in the data stream.

 

Target

The goal is to write data in a data stream to a specific data storage area or to a data stream component that creates a dataset in memory.

The target of an integration service must have at least one input. The input contains input columns from other data flow components. The input column maps to the column in the target.

Many targets also have an error output. The target error output contains the output column, which usually contains the error message when writing data to the target data storage area. There are many different causes for errors. For example, the column may contain null values, but the target Column cannot be set to null values.

The integration services object model does not limit the number of common input and error outputs that a target can contain. Therefore, you can create custom targets with multiple inputs and multiple error outputs.

Today, I am familiar with our work environment and related knowledge concepts. Next time, we will start a simple ETL project.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.