SQL Server 2008 Full-text Indexing page Search Instance Tutorial

Source: Internet
Author: User
Tags create index sql server query

Recently to develop a search engine similar to the in a 1 million table through the keyword to a large character field query, query all the data containing keywords and paging, and the highest matching data in the first place, requiring query response time control in about 1 seconds. With such a large amount of data to respond to in a short time, we use SQL Server 2008 Full-text indexing.

Test environment: SQL Server 2008 R2


Full-Text Indexing concept

Full-text indexing is for datasheets, you can only create Full-text indexes on tables, and you cannot create Full-text indexes on the database.

Each database can contain no Full-text catalogs or contain multiple Full-text catalogs, and a Full-text catalog can include multiple Full-text indexes, but a full-text index can only be used to make up a Full-text catalog.

A datasheet can only create one Full-text index, and one Full-text index may contain more than one field.

A table that creates a Full-text index must have a unique Non-empty index, and this unique Non-empty index can only be a field, not a combined field.

Only one Full-text index is allowed per table. To create a Full-text index on a table, the table must have a unique and not Null column. You can create Full-text indexes on the following types of columns: char, varchar, nchar, nvarchar, text, ntext, image, XML, varbinary, and varbinary (max), which enables full-text searching of these columns. Creating a Full-text index on a column with a data type of varbinary, varbinary (max), image, or XML requires that you specify the type column. A type column is a table column that is used to store the file name extension (. doc,. pdf, XLS, and so on) of the documents in each row.

Full-text search is supported by the Full-text engine. The Full-text engine has two roles: Index support and query support.

Full-Text Search architecture:


Starting with SQL Server 2008, the Full-text search architecture includes the following processes:

SQL Server Process (sqlservr.exe)

Filter Daemon Host process (Fdhost.exe).

SQL Server Process components:

User table

These tables contain the data to be full-text indexed.

Full-text collectors

Full-text collectors use Full-text crawl threads. It is responsible for planning and driving the population of Full-text indexes and for monitoring Full-text catalogs.

Synonym Library files

These files contain synonyms for the search term.

Non-indexed Word table objects

Non-indexed Word table objects contain a list of common words that are useless for searching.

SQL Server Query processor

The query processor compiles and executes the SQL query. If the SQL query contains a full-text search query, the query is sent to the Full-text engine during compilation and execution. The query results will match the Full-text index.

Full-Text Engine

The Full-text engine in SQL Server is now fully integrated with the query processor. Full-text engines compile and execute Full-text queries. As part of the query execution, the Full-text engine may receive input from the thesaurus and the non-indexed Word table. In SQL Server 2008 and later, the Full-text engine of SQL Server runs inside the SQL Server query processor.

Index writer (Indexer)

The index writer generates the structure used to store index tags.

Filter Daemon Manager

The Filter Daemon Manager is responsible for monitoring the status of the Full-text engine filter daemon host.

Filter Daemon Host Component:

The filter daemon host is a process initiated by the Full-text engine. It runs the following Full-text search components, which are responsible for accessing, filtering, and hyphenation data in a table, and are also responsible for hyphenation and extracting stemming from query input:

The filter daemon hosts the following components:

Protocol Handler

This component extracts data from memory for further processing and accesses data in the user table of the specified database. One of its responsibilities is to collect data from the Full-text indexed columns and pass the collected data to the filter daemon host, so that the host applies the filter and the break character as needed.

Filter

Some data types require filtering before you can create Full-text indexes for data in a document, including varbinary, varbinary (max), image, or data in an XML column. What filters are used for a given document depends on the document type. For example, a Microsoft Word (. doc) document, a Microsoft Excel (. xls) document, and an XML (. xml) document use separate filters. The filter then extracts the text block from the document, deletes the embedded formatting, preserves the text, and retains information about the text position if possible. The results will appear in the form of a text flow of information.

Break character and Stem analyzer

A hyphenation character is a language-specific component that looks up a word boundary ("hyphenation") based on the lexical rules of a given language. Each of the hyphenation characters is associated with the language-specific STEM parser component that is used to combine verbs and to perform transformation extensions. When you create an index, the filter daemon host uses the hyphenation and stemming analyzer to perform language analysis on the text data in a given table column. The language associated with a table column in a Full-text index determines the hyphenation and stemming parser to use when creating an index for the column.

To create a Full-text index

Start a service

In the SQL Server Configuration Management tool, locate the ' SQL full-text Filter Daemon Launcher ' service to start with a local user.


Create a Full-text Catalog

Open the database-storage-Full-text catalog that you want to create a Full-text catalog-right-click New Full-text Catalog


Create a Full-text catalog with a statement

CREATE fulltext CATALOG [Fd_housesearch]with accent_sensitivity = On

As DEFAULT

AUTHORIZATION [dbo]

To create a Full-text index

Right-click the table that you want to create a Full-text index-Full-text index-Define a FULL-TEXT index

1. Full-text indexing must have a unique Non-empty index, where the primary key is selected.


2. Select the column that requires Full-text search, and select the word-breaker language, because the field is used primarily for storing Chinese, so the Simplified Chinese is also selected here.

Break character: The word breaker is used to analyze the Full-text search data, to find the boundary of words, and to divide a long piece of content into everyday words or characters. For example, "full-text search" may be broken into "full text", ' Search ', ' whole ', ' text ', ' Search ', ' cable ' and other words or words that conform to Chinese normal habits.


3. Select the Tracking method, where automatic tracking is selected, which automatically fills the index when the table changes.


4. Select Full-text Catalog, index file, non-indexed Word table

Non-indexed table: In the word just said how to hyphenate, here is to save the word broken in a table, where you select the system default non-indexed Word table.

----Query Hyphenation Table

SELECT Top 1000 * sys.dm_fts_index_keywords (DB_ID ("), object_id ("))


5. Filling plan

You can create a new fill schedule to populate a Full-text index, which can be a full population, an incremental population, and an update population.



To create a Full-text index with a statement

--The statement has many default parameters, others are by default
CREATE fulltext INDEX on dbo. Housetest
(Description
)
KEY INDEX Pk_housetest
On Fd_housesearch


Full-text predicates

Full-text queries use Full-text predicates (CONTAINS and FREETEXT) and Full-text functions (CONTAINSTABLE and FREETEXTTABLE). They support complex Transact-SQL syntax, which supports various forms of query words. To write Full-text queries, you must know when and how to use these verbs and functions.

The CONTAINS predicate can search for:

Words or phrases.
The prefix of a word or phrase.
A word that is adjacent to another word.
A word produced by the inflection of another word (for example, the word drive is the stem of drives, drove, driving, and driven).
Synonyms for another word identified by using the thesaurus (for example, the word metal may have synonyms such as aluminum and steel).


---The following example finds the containing "mountain"
Use ADVENTUREWORKS2008R2;
Go
SELECT Name, ListPrice
From Production.Product
WHERE CONTAINS (Name, ' Mountain ');
Go

--The following example finds the containing "mountain" or "Road"

Use ADVENTUREWORKS2008R2;
Go
SELECT Name
From Production.Product
WHERE CONTAINS (Name, ' "Mountain" OR "Road")
Go

---The following example returns all product names with at least one word in their name column that previously chain the beginning of the

Use ADVENTUREWORKS2008R2;
Go
SELECT Name
From Production.Product
WHERE CONTAINS (Name, ' chain* ');
Go


FREETEXT predicate usage Here's the explanation!


Demand

Now for my recent needs, table data 1 million, the data here will not be out, only the scheme to say, title is similar to the titles of the article, description is content is also a full-text indexed field

Program 1:like, after testing decisively ruled out

Scenario 2: Direct use of Full-text search, the ordering of large consumption.

Scenario 3: Because the query needs to sort the title, the reverse index of the build Title field contains the other fields, and the last option (creating the reverse index of the title field is important).

--Give some of the fields
CREATE TABLE [dbo]. [Housetest] (
[ID] [int] IDENTITY (1,1) not NULL,
[Title] [varchar] (MB) NULL,
[Description] [nvarchar] (max) Not NULL,
[Isonline] [tinyint] Not NULL,
CONSTRAINT [pk_housetest] PRIMARY KEY CLUSTERED
(
[ID] ASC
With (Pad_index = off, Statistics_norecompute = off, Ignore_dup_key = off, Allow_row_locks = on, Allow_page_locks = O N) on [PRIMARY]
) on [PRIMARY]


Create an index

CREATE INDEX ix_housetest on Housetest
(Title DESC
)
INCLUDE
(ID,
Description,
Isonline
)


Query description contains "Mei Yuan", and if the title is "Beauty Park" is ranked first, and in the form of pagination, each page 20 records.

SELECT * FROM (the Select Row_number () over (in case when Title= ' Mei Yuan ' Then 1 Else 0 end) desc) as Rowsnumber,id,title, Description
From Housetest
Where contains (Description, ' Mei Yuan ') and isonline=1) tab1
where Rowsnumber between 1 and 20

Summarize


The Full-text indexing function is similar to the Baidu search engine, but Baidu this kind of search engine has its own data dictionary, in the keyword table to sort keywords, save keyword corresponding to the document ID, a document will only keep a few keywords, as usual write articles to add labels, general an article on a few labels, The speed of matching is very fast when searching, which requires a perfect data dictionary table.

Full-Text Search there is another function is FileStream, you need to add a file stream, enable this feature in the service can be in the field of the document in the form of binary saved in the field, so that large documents can also be backed up with the database, many Web sites store pictures are the path to store pictures, The picture does not back up together when you back up the database.

The benefits of Full-text indexing can also have a certain impact on performance, especially when filtering operations affect server performance, so selecting a feature requires consideration of the impact on performance.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.