SQL Server full-text search function introduction, full-text search function Introduction

Source: Internet
Author: User
Tags filegroup stem words

SQL Server full-text search function introduction, full-text search function Introduction

Full-Text Search of SQL Server is a Text Search function based on Word Segmentation and relies on Full-Text indexing. Full-text indexes are different from traditional B-Tree indexes and column storage indexes. They are composed of data tables and are called Invert indexes ), stores the ing between Word Segmentation and the unique key of a row. The reverse index is automatically created and maintained by SQL Server when you create a full-text index or update a full-text index. Full-text indexing mainly includes three analyzers: Word Breaker, stemmer, and synonym analyzer. The full-text index stores information such as Word Segmentation and location. Word Segmentation is based on the syntax rules of a specific language. It searches for word boundaries based on specific symbols and splits the text into "Words ", each word is called a word segmentation (term). Full-text indexing sometimes extracts the stem of Word Segmentation and stores multiple derived forms of stem as a single stem. This process is called stem extraction; convert related words into synonyms based on the user-defined synonym list. This process is called synonym extraction.

The full-text index is generated to split text data in a user table into Word breaker, extract Stemmer, convert the synonym (Thesaurus), and filter out Stopword in the Word segmentation ), finally, store the processed data to the full-text index. The process of storing data to full-text data is called a Populate or crawler process. The full-text index can be updated manually, automatically, or incrementally.

1. Create a full-text directory and a unique index

Before creating a Full-Text index, you must create a Full-Text Catalog. The Full-Text directory is used to organize the Full-Text index, which is the container of the Full-Text index. Each full-text index must belong to a full-text directory. The full-text directory is a logical structure, which is the same as the database Schema and has nothing to do with the storage location of the full-text index.

create fulltext catalog catalog_testas default;

To create a full-text index, the base table must have a unique (unique), single-column, non-null (non-nullable) index, the full text engine uses this index to map each row of data on the base table to a unique index key. The inverted index stores the ing between the index key and word segmentation.

create unique index uidx_dbLogID on [dbo].[DatabaseLog]([DatabaseLogID]);

2. Create a full-text index

Each table can only create one full-text index. When creating a full-text index, you must consider the file group of the full-text index storage, the disabled word list associated with the full-text index, and the update method of the full-text index, and the language associated with the text, the full-text index column must be a text field, for example:

create fulltext index on [dbo].[DatabaseLog]([tsql] language 1033)key index ui_dbLogIDon (catalog_test,filegroup [primary]) with(change_tracking=off ,no population ,stoplist=system);

1. language)

The language option is optional and is used to specify the language at the column level. The value of this option can be the language name or LCID. If the language option is not specified, the default language of the SQL Server instance is used. View the Supported languages and their lcids and names in the System View sys. fulltext_ages ages (Transact-SQL.

2. Full-text directory (fulltext_catalog)

The fulltext_catalog_name option is used to specify the full-text index group,

3. filegroup)

Option filegroup filegroup_name is used to specify the file group for full-text index storage. If no file group is specified, the full-text index and basic table are stored in the same file group. Because I/O-intensive operations are required to update full-text indexes, it is recommended that you store full-text indexes on physical hard disks or file groups different from the basic tables for faster updates, to achieve the maximum I/O concurrency.

4. Full-text index Filling

Similar to normal indexes, full-text indexes must be automatically updated when basic table data is updated. This is the default behavior of the system. You can also configure manual full-text index updates, or automatically update the full-text index at a specific interval.

The CHANGE_TRACKING option specifies whether data Update (Update, Delete, or Insert) related to the full-text index column needs to be synchronized to the full-text index,
• CHANGE_TRACKING = MANUAL: MANUAL update
• CHANGE_TRACKING = AUTO: automatic update, which is set by default. When the basic table data changes, the full-text index is automatically updated,
• CHANGE_TRACKING = OFF, no population: NO update. If no population is specified, the SQL Server does not update the full-text index after the full-text index is created. If NO POPULATION is specified, after creating a full-text index, SQL Server updates the full-text index.

5. stopword (STOPLIST)

StopWord is also called a noise term. Each full-text index is associated with a disabled word list. By default, the full-text index is associated with the system StopWord (system stoplist ). The full text engine removes the deprecated word from the word segmentation so that the full text index does not contain the deprecated word.

STOPLIST [ = ] { OFF | SYSTEM | stoplist_name } 

3. Full-text index Filling

Full-text indexing is also called a crawler process or a Population process. Creating or filling a full-text index will consume a lot of system (IO, memory) resources, so try to fill the full-text index when the system is idle. When creating a full-text index, specify the options CHANGE_TRACKING = MANUAL, or CHANGE_TRACKING = OFF, no population. The new full-text index is not immediately filled. You can choose, use the alter fulltext index statement to perform the fill operation. The full-text index contains the Word Segmentation Data of the base table only after the full-text index is filled.

alter fulltext index on table_namestart { full | incremental | update } population;

There are three methods to update full-text indexes:

• Full population: Fill in all, retrieve each row from the base table, and re-compile the FULL-text index;
• Incremental population: INCREMENTAL filling, provided that the basic table contains the timestamp field. After the previous filling, only the updated data is compiled into the full-text index;
• Update population: update filling. The data rows that are updated (insert, UPDATE, or delete) after the last filling are re-indexed;

When creating a full-text index, if CHANGE_TRACKING = AUTO or CHANGE_TRACKING = OFF is specified, the new full-text index starts the filling process immediately.

4. query full-text indexes using ins predicates

If you want to use full-text indexes in a query, the CONTAINS predicate is usually used to call full-text indexes to implement text matching queries that are more complex than the LIKE keyword, while the LIKE keyword is fuzzy match, full-text index is not called.

For example, use the ins predicate to perform a full match query for a single word segmentation:

select [tsql] from [dbo].[DatabaseLog] where contains([tsql], 'searchword', language 1033);

Compared with Like, full-text query is faster and supports more complex search functions. Using the contains predicate can not only perform full match of Word Segmentation or prefix match query of word segmentation, you can also perform root-based queries, custom synonym-based queries, and distance-and order-based adjacent word segmentation queries. However, compared with Like, contains predicates cannot perform suffix matching queries.

The result returned by the ins predicate is a Boolean value. If the full-text index column contains the specified keyword or pattern, TRUE is returned. Otherwise, FALSE is returned.

The ins predicate supports word query and phrase query. word refers to a single word segmentation. A phrase (phrase) is composed of multiple words and spaces. Double quotation marks must be used for phrases, combine multiple words into one phrase.

1. logical combination query

Use and, and not, or logical operators to match multiple words or phrase

CONTAINS(Name, '"Mountain" OR "Road" ')CONTAINS(Name, ' Mountain OR Road ')

2. prefix Query

Prefix matching using the ins prefix is the same as like 'prefix % ', except that the contains prefix uses "*" as the wildcard and "*" matches 0, 1 or multiple characters, prefix matching is written as '"prefix *"'. Full-text indexes can only perform prefix matching.

CONTAINS(Name, ' "Chain*" ')CONTAINS(Name, '"chain*" OR "full*"')

3. query Synonyms (thesaurus) or stem words (stemmer)

Stemmer (stem). For example, according to the syntax, English verbs have different forms of change based on numbers (singular, plural), people, and tenses. These words are of the same source.

CONTAINS(Description, ' FORMSOF (INFLECTIONAL, ride) ')

THESAURUS (Synonym), which must be imported into XML for configuration. SQL Server provides a default Thesaurus file, which is Empty. If "Author" and "Writer" are configured in the Thesaurus file, and "journalist" is a synonym, if any synonym is satisfied during fulltext index query, the match is successful.

CONTAINS(Description, ' FORMSOF (THESAURUS, author) ')

4. distance query

Use the near function to query data rows that match adjacent word segmentation. The near function is defined as follows. It is used to specify the distance query mode in the query mode:

NEAR ( ( { <simple_term> | <prefix_term> } [ ,…n ] ) [, <maximum_distance> ] [, <match_order> ] ) 

For example, the Near function is used to specify the distance and matching sequence of adjacent word segmentation. near (term1, term2, term3), 5) indicates that the distance between any two terms cannot exceed 5, near (term1, term2, term3), 5, true) indicates that the distance between any two terms cannot exceed 5, and it exists in the string in the order of term1, term2, and term3.

--regardless of the intervening distance and regardless of orderCONTAINS(column_name, 'NEAR(term1,"term3 term4")')--searches for "AA" and "BB", in either order, within a maximum distance of fiveCONTAINS(column_name, 'NEAR((AA,BB),5)')--in the specified order with regardless of the distanceCONTAINS(column_name, 'NEAR ((Monday, Tuesday, Wednesday), MAX, TRUE)')

For near (term1, term2, term3), 5, true), there are up to five terms between term1 and term5, excluding the internal search word segmentation, "term2", for example:

CONTAINS(column_name, 'NEAR((AA,BB,CC),5)')

This query will match the following text. Note that the internal search word segmentation CC has no calculation distance:

BB one two CC three four five AA

For example, in the original text, the maximum distance between the word segmentation bike and the control cannot exceed 10, and the word segmentation bike must appear before the word segmentation control:

CONTAINS(Comments , 'NEAR((bike,control), 10, TRUE)')

The full-text search function provided by SQL Server is richer than the LIKE keyword. It has the basic full-text search function, which is fast and easy to maintain. The disadvantage is that the full-text search function is very limited, in actual development, you can work with open-source full-text search engines, such as Solr and Elasticsearch, to develop more powerful full-text search functions.

Summary

The above is the full-text SQL Server search function introduced by the editor. I hope it will help you. If you have any questions, please leave a message. The editor will reply to you in time!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.