SQL Server Full-Text Search

Source: Internet
Author: User
Tags filegroup logical operators

Full-text Search for SQL Server (Full-text search) is a word-based text retrieval feature that relies on full-text indexing. A full-text index differs from the traditional balanced-tree (b-tree) index and Columnstore index, which is made up of a data table, called an inverted index (Invert index), that stores the mapping of unique keys for word breakers and rows. Reversing an index is automatically created and maintained by SQL Server when you create a full-text index or when you update a full-text index. The full-text index consists mainly of three parsers: Word breaker, stemmers (stemmer), and synonym Analyzer. The data stored in the full-text index is a word breaker and its location, and the word segmentation is based on the grammar rules of a particular language, searching for the boundaries of words according to specific symbols, breaking down the text into "words", each word being called a participle (term); The full-text index sometimes extracts the word-breaker, storing multiple derived forms of stems as a single stem, a process called stemming, is a process called extracting synonyms based on a user-supplied list of custom synonyms that convert the related word to a synonym.

A full-text index is generated by segmenting the text data in the user table (Word breaker) and extracting stemming (stemmer), converting synonyms (thesaurus), filtering out the stop words (Stopword) in the word breaker, and finally storing the processed data in the full-text index. The process of storing data in full-text data is called a fill (Populate) or crawler (Crawl) process, and the full-text index can be manually populated, populated, or incrementally populated.

One, Create full-text catalogs and unique indexes

Before you create a full-text index, you must create a full-text catalog (Full-text catalog), which is used to organize full-text indexing and is a container for full-text indexes. Each full-text index must belong to a full-text catalog. A full-text catalog is a logical structure that is identical to the schema of a database, regardless of where the full-text index is stored.

Create Fulltext CATALOG Catalog_test  as default;

In order to create a full-text index, there must be a unique (unique), single-column (Single-column), non-empty (non-nullable) index on the underlying table that the full-text engine uses to map each row of data on the underlying table to a unique index key. The inverted index stores the mapping between the index key and the word breaker.

Create Unique Index  on[dbo]. [databaselog] ([databaselogid]);

Second, create a full-text index

only one full-text index can be created per table , when you create a full-text index, you must consider the file groups that the full-text index stores, the list of inactive words associated with the full-text index, how the full-text index is updated, and the language associated with the text, which must be a text field, for example:

CreateFulltextIndex  on [dbo].[DatabaseLog]([TSQL]Language1033)Key IndexUi_dblogid on(Catalog_test,filegroup[Primary])  with(change_tracking=off, no population, stoplist=System);

1, Language (language)

Option language is optional for specifying the language of the column level, the value of which can be the name of the language or the LCID, and if no language option is specified, the default language of the instance of SQL Server is used. View the system-supported languages and their corresponding LCID and names from System View sys.fulltext_languages (Transact-SQL).

2, full-text catalog (Fulltext_catalog)

The option fulltext_catalog_name is used to specify the grouping of full-text indexes.

3, Filegroup (filegroup)

Option Filegroup filegroup_name is used to specify the filegroup where the full-text index is stored, and if no filegroup is specified, the full-text index and the underlying table are stored in the same filegroup. because updating full-text indexes is an IO-intensive operation, for faster full-text indexing, it is best to store the full-text index on a physical hard disk or filegroup that differs from the underlying table to achieve maximum IO concurrency.

4, how to populate the full-text index

As with normal indexes, when the underlying table data is updated, the full-text index must be updated automatically, which is the default behavior of the system, or it can be configured to manually update the full-text index, or to automatically update the full-text index at a specific point in time.

the option change_tracking Specifies whether the data update (Update,delete, or insert) associated with the full-text indexed column needs to be synchronized to the full-text index.

    • change_tracking = MANUAL: manual update
    • change_tracking = Auto: Automatic Updates, default settings, when the underlying table data changes, the full-text index is automatically updated,
    • change_tracking =OFF, no POPULATION: Not updated, specifies the option no POPULATION, indicating that after the full-text index is created, the SQL The server does not update (populate) The full-text index, or if no option no POPULATION is specified, SQL server updates the full-text index after the full-text index is created.

5, Stop Word (stoplist)

A Stop word (StopWord) is also known as a noise word, and each full-text index is associated with a list of inactive words, by default, the full-text index is associated with a system-disabled word (systems stoplist). The full-text engine removes the stop word from the word breaker so that the full-text index does not contain a stop word.

Stoplist [=] {OFF | SYSTEM | Stoplist_name}

Third, populate the full-text index

The

Populate full-text index is also called a crawler (crawl) process, or a fill (Population) process. because creating or populating a full-text index consumes a large amount of system (IO, memory) resources, it is best to choose to populate the full-text index when the system is idle. When creating a full-text index, specify options   change_tracking =  MANUAL , or   change_tracking off , NO POPULATION , The new full-text index is not immediately populated, and the user can select

Alter Index  on  full| | Update } Population

There are three ways to update a full-text index:

    • Full POPULATION: Fills all, fetches each row from the base table, and re-integrates the full-text index;
    • INCREMENTAL POPULATION: Incremental population, provided that the underlying table contains timestamp fields, after the last population, only the updated data is indexed to the full-text index;
    • Update POPULATION: Updates the population, re-indexing the data rows that performed the update (insert, update, or delete) operations since the last population;

When you create a full-text index, if you specify change_tracking=AUTO or change_tracking= OFF , The new full-text index immediately begins populating the process.

Four, querying full-text indexing using the CONTAINS predicate

If you want to use full-text indexing in a query, you typically use the CONTAINS predicate to invoke a full-text index to implement a more complex text-matching query than the LIKE keyword, which is a fuzzy match and does not invoke a full-text index.

For example, an exact match query that uses the CONTAINS predicate to perform a single word breaker:

 select  [    [ dbo   ". [  ]  where  contains  ([ " ,  searchword   ' , language 1033 ); 

Full-text queries are faster and more complex to support the search function than like, using the CONTAINS predicate to not only perform the exact match or word breaker prefix matching query, but also to perform root-based queries, queries based on custom synonyms, and adjacent word-segmentation queries based on distance and order. However, the CONTAINS predicate cannot perform a suffix matching query compared to like.

The CONTAINS predicate returns a Boolean value that returns True if the full-text indexed column contains the specified keyword or lookup mode (pattern), otherwise, false is returned.

Contains predicates support word queries and phrase queries, Word refers to a single word breaker, the phrase (phrase) is composed of multiple words and spaced spaces, and for phrases, you must use double quotation marks to make multiple words a phrase.

1, logical combination query

Use and, and not, or or logical operators to match multiple word or multiple phrase

CONTAINS '  ')CONTAINS")

2, prefix query

Prefix matching using the CONTAINS predicate, and like ' prefix% ' function, except that the CONTAINS predicate uses "*" as the wildcard character, "*" matches 0,1 or more characters, the prefix match is: ' "prefix*", the full-text index can only perform prefix matching.

CONTAINS '  ')CONTAINS'"chain*" OR "full*")

3, query synonym (thesaurus) or stemming (stemmer)

Stemmer (stemming), for example, according to grammatical rules, English verbs have different forms of change depending on number (singular, plural), person, and tense, and these words are homologous.

CONTAINS '  ')

Thesaurus (synonyms), which need to be imported into XML for configuration, SQL Server provides a default thesaurus file, which is empty. If the thesaurus file is configured with "Author", "Writer", "journalist" is a synonym, when you use the Fulltext Index query, the match succeeds whenever any one of the synonyms is satisfied.

CONTAINS '  ')

4, distance query

Use the near function to query the rows of data that match adjacent participles, as defined in the next function, for the query pattern that requires a distance query to be specified in query mode:

<simple_term>| <prefix_term>[] )  [ ]  []

For example, using the near function to specify the distance and matching order of adjacent participle, near ((TERM1,TERM2,TERM3), 5) means that the distance between any two term cannot exceed 5, close ((TERM1,TERM2,TERM3), 5,true), The distance representing any two term cannot exceed 5, and is present in the string in the order of TERM1,TERM2,TERM3.

--regardless of the intervening distance and regardless of orderCONTAINS(COLUMN_NAME,'Near (Term1, "Term3 term4")')--searches for "AA" and "BB", in either order, within a maximum distance of fiveCONTAINS(COLUMN_NAME,'Near ((AA,BB), 5)')--In the specified order with regardless of the distanceCONTAINS(COLUMN_NAME,'Near ((Monday, Tuesday, Wednesday), MAX, TRUE)')

For near ((TERM1,TERM2,TERM3), 5,true), there is a maximum of 5 term between Term1 and TERM5, excluding internal search participle, "term2", for example:

CONTAINS ' Near ((AA,BB,CC), 5) ')

This query will match the following text, note that the internal search word cc does not calculate the distance:

BB One and CC three four five AA

For example, in the original text, the maximum distance between the word breaker bike and control cannot exceed 10, and the word breaker bike must appear in front of the word control:

CONTAINS ' Near ((Bike,control), ten, TRUE) ')

SQL Server provides full-text search functionality, richer than the like keyword, with the primary full-text search function, fast, easy to maintain, the disadvantage is that the full-text search function is very limited, in the actual development, can cooperate with open source full-text search engine, for example, SOLR, Elasticsearch to develop more powerful full-text search capabilities.

Reference Documentation:

Full-text Search (SQL Server)

CONTAINS (Transact-SQL)

CREATE FULLTEXT CATALOG (Transact-SQL)

Get Started with Full-text Search

SQL Server Full Text Search

Improve the performance of Full-text Queries

SQL Server Full-Text Search

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.