SSIS component conversion-fuzzy search Conversion

Source: Internet
Author: User

The difference between fuzzy search conversion and search conversion is that fuzzy match is used. Query and convert matching records in the reference table using the same connection. It returns the exact match in the referenced table. Or no content is returned. In comparison, fuzzy search conversion uses fuzzy match to return one or more matching items in the referenced table.

The referenced data source for fuzzy search conversion must be a database of SQL2000 or higher. In fuzzy match, only dt_wstr and dt_str data types can be used as input columns. Full match applies to all data types except dt_text, dt_ntext, and dt_image. The edit box for fuzzy search conversion has three tabs. Let's introduce them one by one.

1. Reference Table Tab

Attribute

Description

Oledb Connection Manager

Use the list box to select an existing ole db Connection Manager or use the "new" button to create a new connection.

Generate new index

Specify the new index to be created during conversion for search

Name of the referenced table

Select the table to be referenced in the database

Store New Indexes

Select this option if you want to save the new search index.

New Index name

If you have selected the new storage Index option, you can set the name for this item.

Maintain storage Indexes

If you have selected the new storage Index option, you can select this option.

Use existing Indexes

Use existing indexes for search during conversion

Existing index name

Select the table to be referenced from the database

 

2. Column Tab

Attribute

Description

Available input Columns

Drag the input column to connect it to the available search column. These columns must have supported data types that match each other. Select a ing row and right-click it to edit the ing in the create link dialog box.

Name

View the names of available input Columns

Transfer

Specifies whether to include an input column in the converted output.

Available lookup Columns

Use these check boxes to select the columns for which you want to perform fuzzy search.

Search Columns

Select a search column from the list of available columns in the referenced table. Select the check box in the "available query columns" table to select a query column. When you select a column in the "available search columns" table, an output column is created, which contains the reference column values returned for each matching row.

Output alias

Enter an alias for the output of each query column. The default value is the name of the query column and a numeric index value is appended. However, you can select a unique descriptive name.

 

 

 

3. Advanced card selection

Attribute

Description

Maximum number of matching outputs for each query

Specify the maximum number of matching conversions returned for each input line. The default value is1.If the value is greater than 1, the output of the conversion may include multiple rows for each query, and some rows may be repeated.

Similarity threshold

Use the slider to set the similarity threshold at the component level. The closer this value is to 1, the closer the similarity between the search value and the source value must be to be considered a match. Because fewer candidate records need to be considered, increasing the threshold can increase the matching speed.

Delimiter

Specifies the delimiter used by conversion to split the column values into words.

The following example describes how to use fuzzy search conversion. We use the TXT text provided in the previous section as the data source.

1. Create a table using the following T-SQL statement in the adventureworks database 2. Create a package named "fuzzylookupdemo" and drag and drop a Data Flow task into the control flow. Create a flat file connector named "employdata" on the Connection Manager interface to connect "E: \ Bi materials \ integrationservices \ empolydata.txt"

3. On the regular tab, set the format attribute to the un-aligned (ragged right) on the Right and the code page attribute to 1252 (ANSI-Latin I ). For other settings, see:

4. On the column tab, click the column boundary to place two split lines at the 5th characters and 28th characters respectively.

 

 

5. Click the Advanced tab to set the title's outputcolumnwidth to 50 and save the connection.

 

6. Drag and Drop a flat file data source on the data flow tab. The connector attribute of the flat file is set to employdata.

7. Drag and Drop a fuzzy search conversion task. Link to the flat file source, and the arrow points to the fuzzy conversion task.

8. Click the new button in Connection Manager properties on the reference tab of the fuzzy search conversion task. Create an ole db Connection Manager. Used to connect to the adventureworks database. For other settings, see

9. In advanced options, the similarity threshold is set to 0.5, and others are default values. For example, Set

10. For the attribute settings on the column tab and the Advanced tab, see the following two diagrams:

11. Drag and Drop an ole db target source and set the ole db Connection Manager to the Database Connector you just created. Set the data access mode to a table or try-fast loading and click New. the dialog box is displayed.

Make the following settings for the SQL script in the box:

 

Click OK.

12. Click the ing tab to make the following settings.

 

 

13. Run the package. For example:

 

14. You can view the results in the adventureworks database:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.