Summary of full-text index of SQL Server

Source: Internet
Author: User
Personal Summary of full-text index of SQL Server from cat nest in carfield (I)

As we all know, like queries are slow. Full-text indexes are related indexes in advance to indicate which keywords can be found in which records, or even calculate rank in advance, you can first list highly relevant items during retrieval, which greatly improves the retrieval speed.

For example, you have a lot of small drawers, each of which contains some debris. If you want to find something, the original method is to flip the drawers one by one, which is no index.

If you are smart, give the drawer number (unique key), record what is in the drawer of the number on the paper, and look for something to see this paper first. This is a common index, if you want to know which drawer has something, you can quickly find the drawer number on the paper (you know this is a search tree) and get the relevant information. In this case, the normal index is very fast; but to find a specific drawer, you need to traverse the whole sheet of paper. This is like query. If you want to find two or more drawers with two or more items at the same time, like is more complicated. If a table has tens of millions of records, you can imagine the cost of query.

You can change your mind and find another piece of paper to record the drawers in which the same thing exists:

Clip: 1, 3, 4, 5, 6, 9, 12...

Coin: 2, 3, 4, 7, 12...

Pills: 1, 3, 5, 6...

In this way, it is easy to find something or a few items.

Full-text indexing differs from normal SQL Indexing in many ways:

Common SQL Index Full-text index
Storage is controlled by the database where they are located. Stored in the file system, but managed through the database.
Each table can have several common indexes. Each table can have only one full-text index.
When data is inserted, updated, or deleted as its basis, it is automatically updated. Adding data to a full-text index is called filling. A full-text index can be requested by scheduling or specific requests, or automatically generated when new data is added.
Do not group. Groups one or more full-text directories in the same database.
Use the SQL Server Enterprise Manager, wizard, or transact-SQL statement to create and remove SQL statements. Create, manage, and remove SQL Server Enterprise Manager, wizard, or stored procedure.

If you use full-text indexing, you can refer to the following post (thanks for your great efforts and lihonggen0 ):

???? How to create a full-text index in sqlserver:
???? Http://www.csdn.net/develop/Read_Article.asp? Id = 17137
???
???? How to Use the image field:
???? Http://expert.csdn.net/Expert/topic/1594/1594455.xml

The following is a frequently asked question:

The query clause only contains ignored words.

This is because some simple words, such as "yes", are used for queries.

The solution is to clear c: \ Program Files \ Microsoft SQL Server \ MSSQL \ ftdata \ sqlserver \ config \ noise. CHS

I think this method is not advisable. You can open this file and find the following words: Is, are, be, at, I, yes

These words are frequently-used words and have little significance in queries, as if there are paper scraps in almost every drawer. indexing these words is not worth the candle, therefore, the full-text engine calls these words as interfering words without indexing. I personally think it is better to filter these words in the application and then give the user a friendly prompt, instead of clearing noise. CHS treats the full-text engine rudely. For example, you can search for "of" in Google"

-------------------------------------------------------------------------

In addition, thanks to ghj, a very important thing is missing. Unlike the normal index immediate update, full-text indexes generally maintain indexes on a regular basis, so it is inappropriate for frequently updated data, the objects that require full-text indexing are generally paper webpages and so on. It is suitable for pulling!

I personally think that my database is not representative, so I will not elaborate: When indexing, the CPU and memory usage are very high and the time is also very long (my database is used throughout the night below ), after completing the query, you do not need to use a lot of system resources. When multiple full-text queries are concurrent, there is a lot of CPU consumption, but it is better than like.

The database on my system is 123 m, which is too small and does not have any special advantage in using full-text indexes. However, you can think about the massive data volumes like Google, it is unimaginable to use like: Of course, no one else uses relational databases.

Summary of SQL Server full-text index (II)-Chinese Word Segmentation

In the process of using SQL search, I also found a problem: it separates Chinese characters by words. I will explain it below:

For example, in the 'blog Hall members are mostly MVPs 'sentence, if words are indexed one by one, it is better to use 'blog tang', 'members ', MVP's indexes generated by several words are much larger, which not only wastes space, but also affects the efficiency and accuracy of indexes. If English is indexed by letters rather than words, it is estimated that there is no full-text index or Google in the world.

However, in terms of word segmentation, Chinese words have a natural barrier than English ones, and there is a gap between English words. However, Chinese words are not. Computers must use artificial intelligence to separate sentences into words. Sometimes, the sentence itself is not enough. It can be determined only by context or some daily knowledge. For example, when the table tennis racket/sold/finished and the Table Tennis/auction/finished, how can the computer know what it means and correct word segmentation!

Based on the results, SQL search uses word segmentation for Chinese (probably because it was originally an English engine). For example, if you want to query 'mark ', it will also bring 'mar' to you.

I have a m database with a full-text index of 55 m, and each full-text query is slow (of course the machine is also very slow ).

Bytes --------------------------------------------------------------------------------------------------

Word Segmentation:

It should be more appropriate to say "Yi Hongyu". Let's look at this sentence:

Can the operating system rewrite the statement to limit the usage of each port?

In order to verify word segmentation, the incorrect word segmentation is intentionally used. If the sentence can be indexed, it indicates word segmentation. For example, use 'yonghue '? The query can also find sentences, so I got the sqlserver word segmentation conclusion. I did not perform further checks, But now I find that the 'write limited' method cannot be used, it proves that SQL Server still has simple word segmentation, but the word segmentation result is not ideal.

In addition, sqlserver can use third-party products to enhance word segmentation.

Bytes --------------------------------------------------------------------------------------------------

If you are interested in Word Segmentation, here are someCodeYou can see that the use of Word Segmentation accuracy is still very high, but to register can be obtained: http://www.nlp.org.cn/project/project.php? Proj_id = 6

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.