SQL Server uses full-text indexing for page search

Source: Internet
Author: User
Tags keyword list sql server query mssql server

Original: SQL Server uses full-text indexing for page search

Tags: SQL server/mssql server/Database/dba/full-text index

Overview

The full-text engine uses information from a full-text index to compile full-text queries that can quickly search for specific words or phrases in a table. Full-text indexing stores information about important words and their locations in one or more columns of a database table. A full-text index is a special type of markup-based functional index that is generated and maintained by the SQL Server full-text engine. The process of generating a full-text index differs from building other types of indexes. Instead of constructing a B-tree structure based on values stored in a particular row, the full-text engine generates an inverted, stacked, and compressed index structure based on individual tags in the text to be indexed. In SQL Server 2008, the full-text index size is limited only by the available memory resources of the computer running the instance of SQL Server.

Recently encountered a demand, in a 1 million of the table through the keyword to a large character field to retrieve, similar to the search Baidu search engine, query out all the data containing the keyword and paging processing, and the highest matching data ranked first, the query response time is required to control about 1 seconds.

Test environment: SQL Server R2

Directory

    • Overview
    • Full-Text Indexing Concepts
    • To create a full-text index
      • Start the service
      • Create a full-text catalog
      • To create a full-text index
      • Full-text predicate
    • Demand
    • Summary
Full-Text Indexing Concepts
    • Full-text indexing is for data tables, only full-text indexes are created on tables, and full-text indexes cannot be created on the database.
    • Each database can contain no full-text catalogs or multiple full-text catalogs, and a full-text catalog can contain multiple full-text indexes, but a full-text index can only be used to compose a full-text catalog.
    • A data table can only create one full-text index, and a full-text index may contain multiple fields.
    • A table that creates a full-text index must have a unique, non-null index, and this unique, non-empty index can only be a single field, not a combined field.
    • Only one full-text index is allowed per table. To create a full-text index on a table, the table must have a unique and non-Null column. You can create full-text indexes on the following types of columns:char,varchar,nchar,nvarchar,text,ntext, image,XML,varbinary , and varbinary (max), which allows full-text search of these columns. Creating a full-text index on a column with a data type of varbinary,varbinary (max),image , or XML requires that you specify the type column. Type columns are table columns that are used to store the file name extensions (. doc,. pdf, XLS, and so on) for documents in each row.

Full-text search is supported by a full-text engine. The full-text engine has two roles: Index support and query support.

Full-Text Search architecture:

Starting with SQL Server 2008, the full-text search architecture includes the following processes:

    • SQL Server Process (sqlservr.exe)
    • The filter daemon host process (Fdhost.exe).

SQL Server Process components:

      • User table
        These tables contain the data to be full-text indexed.

      • Full-text collector
        The full-text collector uses the full-text crawl thread. It is responsible for scheduling and driving the population of full-text indexes, and is responsible for monitoring full-text catalogs.

      • Thesaurus File
        These files contain synonyms for the search term.

      • Non-indexed Word Table object
        The Stoplist object contains a list of common words that are not useful for searching.

      • SQL Server Query processor
        The query processor compiles and executes the SQL query. If the SQL query contains full-text search queries, the query is sent to the full-text engine during compilation and execution. The query results will match the full-text index.

      • Full-Text Engine
        The full-text engine in SQL Server is now fully integrated with the query processor. The full-text engine compiles and executes full-text queries. As part of the query execution, the full-text engine may receive input from the thesaurus and the Stoplist. In SQL Server 2008 and later versions, the full-text engine of SQL Server runs inside the SQL Server query processor.

      • Index writer (Indexer)
        The index writer generates a structure for storing index tags.

      • Filter Daemon Manager
        The Filter Daemon Manager is responsible for monitoring the status of the full-text Engine filter Daemon host.

Filter Daemon Host Component:

The filter daemon host is a process that is started by the full-text engine. It runs the following full-text search components that are responsible for accessing, filtering, and hyphenation of data in a table, as well as hyphenation and stemming of query input:

The components of the filter daemon host are as follows:

      • Protocol handlers
        This component extracts data from memory for further processing and accesses the data in the user table of the specified database. One of its responsibilities is to collect data from the full-text indexed column and pass the collected data to the filter daemon host, where filtering and word breakers are applied as needed by the host.

      • Filter
        Some data types require filtering before you can create full-text indexes for data in documents, including varbinary,varbinary (max),image , or XML columns. What filters are used for a given document depends on the document type. For example, Microsoft Word (. doc) documents, Microsoft Excel (. xls) documents, and XML (. xml) documents use separate filters. The filter then extracts the text chunk from the document, removes the embedded formatting and preserves the text, and, if possible, retains information about the location of the text. The results will appear in the form of a textual flow of information.

      • Word breakers and Stemmers
        Word breakers are language-specific components that look up word boundaries ("hyphenation") based on the lexical rules of a given language. Each word breaker is associated with a language-specific stemmers component that is used to combine verbs and perform warp extensions. When an index is created, the filter daemon host uses word breakers and stemmers to perform linguistic analysis on the text data in a given table column. The language associated with the table column in the full-text index determines the word breakers and stemmers to use when creating the index for the column.

To create a full-text indexStart the service

In the SQL Server Configuration Management tool, locate the ' SQL Full-text Filter Daemon Launcher ' service that is started with the local user.

Create a full-text catalog

Open the database you want to create a full-text catalog-storage-Full-text catalog-right-click New full-text catalog

Create a full-text catalog with a statement

CREATE [fd_housesearch]  with =  on  as DEFAULT AUTHORIZATION [dbo]
To create a full-text index

Right-click the table that needs to create the full-text index-full-text index-Define full-text index

1. The full-text index must have a unique non-null index, where the primary key is selected.

2. Select the column that requires full-text search, and select the word breaker language, because this field is mainly used to store Chinese, so it is also selected in Simplified Chinese.

Word breaker: The word breaker is used for linguistic analysis of full-text search data, to find the boundaries of words, that is, how to break up a very long piece of content into everyday words or characters. For example, "full-Text Search", may be broken into "full text", "Search", "full", "text", "search", "rope" and other words and phrases in line with the normal Chinese habits.

3. Select the Tracking method, where automatic tracking is selected, and the index is automatically populated when the table changes.

4. Select the full-text catalog, index file, non-indexed Word table

Non-indexed Word table: In the hyphenation just said how to break the word, here is the word is saved in a table, the place to select the system default Stoplist.

-- --Query hyphenation table SELECT TOP  + *  from Sys.dm_fts_index_keywords (db_id('object_id('))

5. Fill Plan

You can create a new fill schedule to populate the full-text index, which can be a full, incremental, and update population.

To create a full-text index with a statement

--The statement has a lot less default parameters, others by default
CREATEINDEX on the dbo. Housetest (Description)KEYINDEX pk_housetest on Fd_housesearch

Full-text predicate

Full-text queries use full-text predicates (CONTAINS and FREETEXT) and full-text functions (CONTAINSTABLE and FREETEXTTABLE). They support complex Transact-SQL syntax that supports various forms of query terms. To write full-text queries, you must understand when and how to use these predicates and functions.

The CONTAINS predicate can be searched:

      • Words or phrases.
      • The prefix of a word or phrase.
      • A word that is adjacent to another word.
      • Words generated by the inflection of another word (for example, the word drive is a stem of the drives, drove, driving, and driven inflection).
      • A synonym for another word that is determined using the thesaurus (for example, the term metal may have synonyms such as aluminum and steel).
"Mountain"
use adventureworks2008r2; GO SELECT Name, ListPrice from production.product WHERE CONTAINS ' Mountain ' ); GO

"Mountain"或 "Road "
Use ADVENTUREWORKS2008R2; Goselect namefrom production.productwhere CONTAINS (Name, ' Mountain ' OR ' Road ') GO
NameThere is at least one word in the column before the beginning of chain
Use ADVENTUREWORKS2008R2; Goselect namefrom production.productwhere CONTAINS (Name, ' chain* '); GO

The use of the FREETEXT predicate is not explained here!

Demand

Now for my recent needs, table data 1 million, the data is not out here, just to say the scheme, title is similar to the title of the article, description is the content is also the full-text index field

Program 1:like, after testing decisive elimination

Scenario 2: Use full-text search directly, sorting is expensive.

Scenario 3: Because the query needs to sort the title, the reverse index of the build Title field contains the other fields, and the last option is selected ( It is important to create a reverse index of the title field ).

--Give a partial field
CREATE TABLE [dbo].[Housetest]( [ID] [int] IDENTITY(1,1) not NULL, [Title] [varchar]( $)NULL, [Description] [nvarchar](Max) not NULL, [IsOnline] [tinyint] not NULL, CONSTRAINT [pk_housetest] PRIMARY KEY CLUSTERED ( [ID] ASC) with(Pad_index= OFF, Statistics_norecompute= OFF, Ignore_dup_key= OFF, Allow_row_locks= on, Allow_page_locks= on) on [PRIMARY]) on [PRIMARY]

Create an index

CREATE INDEX  on  DESC) INCLUDE (id,description,isonline)

Query description contains "Mei Yuan", and if the title is "Mei Garden" is ranked first, and in the form of paging, 20 records per page.

Select  * FROM (select  row_number () to (case title= ' US garden ' then 1 Else 0 End) desc) as Rowsnumber , Id,title,description  from housetest  Where  contains (Description, ' Mei Yuan ') and isonline=1) tab1  where rowsnumber between 1   and

Summary

We do not compare this with the search engine, it is certainly not comparable, because I only need to solve the demand here, so the solution for my current needs.

The full-text indexing function is similar to Baidu's search engine, but Baidu search engine has its own data dictionary, in the keyword list of keywords in order to save the keyword corresponding to the document ID, a document will only retain a few keywords, as usual to write articles to add tags, a general article on a few tags, When searching, the speed of matching is very fast, which requires a well-developed data dictionary table.

Another feature of full-text search is FileStream, which requires the addition of a file stream, enabling it in a field to store the document in a field as binary, so that large documents can be backed up with the database, and many sites store pictures as paths to the pictures. When you back up the database, the pictures are not backed up together.

The benefits of full-text indexing also have a certain impact on performance, especially when performing filtering operations on server performance, so selecting a feature requires consideration of the performance impact.

if the article to everyone has help, help point recommendation, thank you!!!

Note:

pursuer.chen

Blog:http://www.cnblogs.com/chenmh

This site all the essays are original, welcome to reprint, but reprint must indicate the source of the article, and at the beginning of the article clearly to the link, otherwise reserves the right to hold responsibility.

Welcome to the exchange of discussions

SQL Server uses full-text indexing for page search

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.