Containers and data streams in SSIS-data conversion continued

Source: Internet
Author: User
Tags ssis

Some Data conversion tasks in Data Flow are listed in the previous article. Here, the remaining conversion tasks are continued.

Data Mining request

Data Mining is a very important task in SSIS. Its idea comes from some algorithms. The data mining request runs the data mining request and outputs the result to the data stream. It can also add new prediction columns. Some application scenarios are listed as follows:

The algorithms involved include:

Fuzzy sorting and searching

Fuzzy sorting tasks can find rows that may be duplicated in data. For example, they can find two rows that contain "Main St." and "Main Street" and then merge them into one row. A fuzzy search task can check data input and clear dirty data. A fuzzy search task is usually placed after a search task. It finds the matched data and then uses fuzzy search to find the data that does not match.

Search

The function of finding a conversion task is similar to the Data Pump task in SQL Server 2000. For example, if the ZipCode column in the data comes from the two columns of State and City in the imported data, you can use the lookup Conversion Function in the ing table. In SQL Server 2000, this function is very clumsy. You must use join lookup to slow down the running speed. 4-25 is the editing page for searching tasks.

 

-25

Merge

Merging and conversion can combine the input data in two paths into one output. This conversion is similar to the Union All conversion, which has some restrictions:

Edit the task to ensure that the data in the two paths are consistent. When selecting a column, a dialog box is displayed, prompting you to merge the data to Path 1 or path 2. If you choose to merge the data to Path 1, then the connection path is 2. In this way, after selection, 4-26 will eventually be mapped from one path to another, and data in some paths can be ignored.

 

-26

Connection merge

One goal of SSIS is to use tasks and try to avoid writing any code. A typical example is connection merging. These two inputs can be joined internally or externally and output selectively. For example, a data stream stores the HR information containing the employee ID, and the wage list information in another data stream. You can connect these two paths, get the name from the Human Resources Information, get the employee's salary from the salary list information, and then output it from a path. 4-27, you can see the names and dates of the employees who are missing through the connection merger.

 

-27

Note: If the two input paths are in the same database, the data connection operation in the ole db data source may be more efficient. If the two input paths are in different databases, the efficiency may be affected. This connection is useful when two data sources are not in the same database or you do not want to write code.

Multi-Point transmission Multicast

Like its name, multi-point transmission can output data in one path to multiple paths. You may use this type of conversion to output data to multiple paths. Edit a task, connect it to the input source, and connect it to multiple destinations. Besides the task name, there is no special edit option.

 

-28

Note:Multi-Point transfer is similar to Split conversion. The difference is that multi-point transfer outputs all rows, and Split outputs some rows conditionally.

Ole db command

The ole db command executes an ole db command on the data lines in the data stream. It updates each row of a data table and stores the data to be updated in the table. For a stored procedure with input parameters, you can store these parameters in a data table without entering parameters every time.

Percentage Sampling and Row Sampling Percentage and Row Sampling

Percentage sampling and row sampling can randomly select a group of data from the data source. Both tasks can generate two sets of outputs. One is randomly selected, and the other is not selected. You can send the selected data to the development or test server. The most suitable application of this Task is to establish a data mining model and then use the sample data to verify the model.

Edit this task and select the number of rows or percentage to be extracted, 4-29. Percentage Sampling randomly selects data from the data source by percentage, and row sampling randomly selects the number of rows from the data source. You can name selected data and unselected data. The last option is the random sampling parameter. If you select a fixed parameter, the output results are the same each time. If you keep the default settings, You do not select the parameter. Different data is output each time.

 

-29

Pivoting and inverse pivoting

This is the same as the role of role and unrole in the T-SQL. Pivot conversion can standardize data or make it more readable in reports. Data output is similar to data output in OLAP and report service. The following table shows the sales volume of the salesperson and each day.

 

Converted data

 

The reverse perspective data function is the opposite of this function.

Number of data rows

The conversion of the number of data rows simply calculates the number of data rows in the data stream and then outputs them to a variable. It is often used to write the number of lines to the mail, and then send the mail to the user to report how many lines of data are converted. You can also determine the number of data rows to perform operations.

Code Components

The Code component allows you to write code to convert transforms, data source, and destination. You can use the code component to complete the following tasks:

Code components can be used as multiple output data sources and can be compiled more efficiently at runtime.

The Code component allows you to write code to convert transforms, data source, and destination. You can use the code component to complete the following tasks:

Code components can be used as multiple output data sources and can be compiled more efficiently at runtime.

Gradient dimension Slow Change Dimesion)

Dimension modification can be used to update or modify a Dimension in the data warehouse. You can use the modify Wizard to generate all updates and create dimension tasks. Once such a task was cumbersome for DTS developers, and now it takes only a few minutes to complete.

Sort

Sorting conversion allows you to sort data in a data stream by a column. This is one of the five commonly used conversions. Connect to the data source and open the editing page to edit the task. Do not select fields that do not want to be set as sorting columns. By default, all columns are selected. 4-30. sort by ProductID and output all columns.

 

-30

In the table at the bottom, you can set the alias of the output column to determine whether to sort by column. The Sort Order column shows that the column will be sorted first, and the second or third. Double-click a column to remove duplicate sorting columns.

Keyword Extraction and search Term Extraction and Lookup

Keyword extraction and search: extract keywords from the dataset. For example, you can use this task to extract keywords from a series of articles. Another feature is to analyze the company's internal email content. This type of task currently only supports the extraction of English keywords.

In keyword extraction, you can specify whether to extract nouns or phrases. For example, "bicycle" is extracted, but "the bicycle" is not extracted. This type of task has two types of outputs: keywords and extraction results. Keywords are the keywords to be matched, and the extracted results are the number of successful matches.

Keyword extraction can output pre-matched rows. For example, you can record information in an email system to a database and combine it with the email system to automatically record defective products. Point the result to a table through a Connection Manager.

Union All

The function and merge of all tasks are the opposite. It combines multiple data sources into a result set. For example, 4-31 combines data from two XML data sources into one output and then sends the data to the keyword extraction task.

 

-31

To edit the conversion, connect the first data source to the task and connect other data sources to the task. Open the editing page to ensure that columns are correctly mapped. DDIS automatically adapts to correct ing. For example, if one input character is 20 characters, and the other is 50, the book will be a column with more than 50 characters.

In the next article, we will use an example to illustrate how to use a conversion task.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.