Fulltext Index5:fundamental Component

Source: Internet
Author: User
Tags split words

In SQL Server 2012, you can use Fulltext search to implement fast queries for single-term or phrase, mainly with the following fundamental component components:

1,service:SQL full-text Filter Daemon Launcher

Service to launch Full-text filter daemon process which would perform document filtering and word breaking for SQL Server F Ull-text Search. Disabling this service would make Full-text search features of SQL Server unavailable.

2,Word breaker

Word segmentation, according to the grammatical rules, determine the position of the word demarcation, to identify the words in the statement; word breaker also records the position of each word in the string when it splits the word.

For example, "Kitty is a cute cat.", this statement can be split into four words: Kitty,is,a,cute,cat. "Kitty" and "cat" position are 1 and 5, respectively, through Fulltext Index, can be queried to match two distance between a certain phrase. Query statement:contains (column_name, ' near ((Kitty,cate), 3) ' means there are two word, "Kitty" and "Cate", with a maximum distance of 3 from Column_  The name is queried for the phrase that contains the stemmer, and the string "Kitty is a cute cat." Match conditions are met.

3,stoplist

Deactivate word list, stoplist

4,stemmer and thesaurus

Stemmer is stemmers, a stemmer extracts the root form of a given word.

Thesaurus is a synonym dictionary

Two, work breaker

Used to divide a string in column, by delimiter, into a single word.

1, use Sys.dm_fts_parser DMF to view the result of the string split.

Sys.dm_fts_parser ('query_string', LCID, stoplist_id, accent_sensitivity)

Returns the final tokenization result after applying a given word breaker, thesaurus, and stoplist combination to a query String input. The tokenization result is a equivalent to the output of the Full-text Engine for the specified query string.

stoplist_id

ID of the stoplist, if any, to is used by the word breaker identified by LCID. stoplist_id is Int. If you specify ' NULL ', no stoplist is used. If you specify 0, the system stoplist is used.

For example: Look at the string "Kitty is a cute cat" split words.

Select *  from Sys.dm_fts_parser (N'"Kitty is a cute cat"',1033,0,0   as

Display_term is the Word,keyword after the split is hexadecimal representation, which two column presents the same term in different ways.

Occurrence: After splitting the string, occurrence represents the position of each word, indicates the order of every term in the parsing result.

Special_term: If the value is noise Word, the term is one of the characters in Stoplist. Exact match is the character after the split.

Three, Stoplist

Stoplist is a list of discontinued words, which are commonly used words, such as "a", "and". etc, it doesn't make sense to search for these word, and when you create Fulltext index, SQL Server discards word in stoplist. Avoid the FTI too big.

Refer to "Configure and Manage Stopwords and stoplists for Full-text Search":

To prevent a full-text index from becoming bloated, SQL Server have a mechanism that discards commonly occurring strings th At don't help the search. These discarded strings is called stopwords. During index creation, the Full-text Engine omits stopwords from the Full-text index. This means, Full-text queries won't search on stopwords.

1,understanding Stopwords and Stoplists       

A stopword can be a word with meaning in a specific language, or it can be a token this does not having linguistic meani Ng. For example, in the Chinese language, words such as "A," "and," "is," and "the" was left out of the Full-text index since  They is known to being useless to a search.

Although it ignores the inclusion of stopwords, the Full-text index does take into account their position. For example, consider the phrase, "instructions is applicable to these Adventure Works Cycles models". The following table depicts the position of the words in the phrase:

 

Word

Position

Instructions

1

Is

2

Applicable

3

To

4

These

5

Adventure

6

Works

7

Cycles

8

Models

9

The Stopwords "is", "to", and "these" is in positions 2, 4, and 5 is left out of the The Full-text index. However, their positional information is maintained, thereby leaving the position of the other words in the phrase unaffec Ted.

Stopwords is managed in databases using objects called Stoplists. A stoplist is a list of stopwords this, when associated with a Full-text index, was applied to Full-text Queri Es on that index.

2, create stoplist, add stopwords to it

Create Stoplist syntax and add stopwords syntax

CREATEFulltext Stoplist Stoplist_name[From {[database_name.]Source_stoplist_name}|SYSTEM Stoplist][AUTHORIZATION owner_name];ALTERFulltext Stoplist stoplist_name{ADD [N] 'Stopword'LANGUAGE language_term| DROP     {        'Stopword'LANGUAGE language_term|  AllLANGUAGE language_term|  All     }};

Create Stoplist, add stopwords to it

Create fulltext Stoplist stop_list_test; Alter Fulltext Stoplist Stop_list_test Add N'cat'1033;

3, View Stoplist

Use sys.fulltext_stoplists and sys.fulltext_stopwords to view your custom stoplist and stopwords, and use the Ys.dm_fts_parser function to view words after splitting.

Select * fromsys.fulltext_stoplistswhereName=N'stop_list_test';Select * fromSys.fulltext_stopwordswherestoplist_id=5;Select * fromSys.dm_fts_parser (N'"Kitty is a cute cat"',1033,5,0) asP;

4. To view the system stoplist provided by SQL Server, there are 154 stopwords for 中文版.

Select *  from Sys.fulltext_system_stopwords where language_id=1033


Four, Stemmer and thesaurus

1,stemmer are different forms of verbs, and these words are homologous. Stemmer is also called conjugating verbs, according to the number, person, tense and so on to enumerate the verb change form. Use Formsof (inflectional <simple_verb_term>) in contains clause to use Stemmer.

A stemmer takes a word and generates inflectional forms, or conjugations. The example in Books Online, and a easy one to understand is "run". There is various forms of "run" that we would want to consider as equivalent when performing a search. For example, your would want to consider:

    • Ran
    • Running
    • Runs
    • Runner (perhaps)

The same could is said for "lay". That would generate

    • Lie
    • Laying
    • Lain
    • Lays

This was one of the big advantages over the like predicate in that stemmers can match these forms of the word being Searche D for. The index would relate all of these to the core, base word.

2,thesaurus is a synonym dictionary, for example, we can think that "database" and "DB" are synonyms, "Author", "Writer", "journalist" are synonyms, etc., SQL Server Use an XML file to configure Thesaurus.

Refer to "Configure and Manage thesaurus Files for Full-text Search"

in SQL Server, Full-text queries can search for synonyms of user-specified terms through the use of a thesaurus. A SQL Server Thesaurus defines a set of synonyms for a specific language. System administrators can define the forms of synonyms:expansion sets and replacement sets. By developing a thesaurus tailored to your full-text data, you can effectively broaden the scope of full-text queries on t Hat data. Thesaurus matching occurs for all FREETEXT and freetextable queries and for any CONTAINS and containstable queries that SP Ecify the FORMSOF thesaurus clause.

Reference doc:

SYS.DM_FTS_PARSER (Transact-SQL)

CREATE FULLTEXT STOPLIST (Transact-SQL)

ALTER FULLTEXT STOPLIST (Transact-SQL)

Configure and Manage Word Breakers and stemmers for Search

Fulltext Index5:fundamental Component

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.