Index full-text retrieval in depth and simplicity

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We all know like query is very slow, full-text indexing is the relevant index in advance, indicating which keyword can be found in which records, and even in advance of the calculation of rank, the search can be high correlation of the first list, which can greatly improve the speed of retrieval.

For example, you have a lot of small drawers, each drawer put some sundries, if you want to find things, the most primitive way is a drawer turn, this is no index of the situation.

If it's smarter, give the drawer a number (unique key), and put what number drawer has something on the paper, look at this piece of paper first, this is the normal index, if you want to know which drawer has what, you can quickly find the drawer number on paper (you know it is using the lookup tree), and then get the relevant information, The normal index is very fast, but to find a specific thing which drawer has, you have to traverse the whole piece of paper, this is like query, if you want to find which drawer at the same time there are 2 or more kinds of items, like more cumbersome. If a table has tens of millions of records, you can imagine the cost of the inquiry.

You can change your mind, find another piece of paper, and record what drawers exist in the same thing:

Clip: 1,3,4,5,6,9,12 ...

Coin: 2,3,4,7,12 ...

Pill: 1,3,5,6 ...

It's easy to find something or something.

There are many differences between full-text indexing and normal SQL indexes:

Normal SQL Index	Full-text indexing
Storage is controlled by the database in which they are defined.	stored in the file system, but managed through the database.
Each table allows several common indexes.	Only one Full-text index is allowed per table.
They are automatically updated when the data on which they are based is inserted, updated, or deleted.	Adding data to a Full-text index is called a fill, and full-text indexing can be requested either through a schedule or a specific request, or it can occur automatically when new data is added.
Do not group.	Group into one or more full-text catalogs within the same database.
Create and drop using SQL Server Enterprise Manager, wizards, or Transact-SQL statements.	Create, manage, and drop using SQL Server Enterprise Manager, wizards, or stored procedures.

If you use Full-text indexing, you can look at the following posts (thank you for your efforts and Lihonggen0):

???? How to establish a Full-text index in SQL Server:
???? http://www.csdn.net/develop/Read_Article.asp?Id=17137
???
???? How to use the image field:
???? Http://expert.csdn.net/Expert/topic/1594/1594455.xml

Find that you have a frequently asked question, which is about the following information:

The query clause contains only the words that are ignored

This is because of the use of some very simple words, such as ' yes ', for querying.

The solution is nothing more than to empty the C:/Program Files/microsoft SQL server/mssql/ftdata/sqlserver/config/noise.chs

Think this method is not advisable, we open this file to see, found inside is some such words: Is,are,be,at, I, is

These words are all very high frequency words, and in the query is not very meaningful, as if almost every drawer has the same paper scraps, so that the index is not worth the word, so the full-text engine of these words is called a noise word does not index, the individual feel in the application filter these words and then give users a friendly hint better, Instead of using empty noise.chs rudely treat full-text engines. For example, you can look at Google search for ""

-------------------------------------------------------------------------

In addition, thanks to Ghj, a very important thing missing, and the general index update immediately different, full-text indexing is generally regular maintenance index, so for frequently updated data is not appropriate, the need to do full-text indexing objects are generally paper pages, such as, but also suitable for pull.

Personally feel that my database is not representative, so also do not elaborate: when indexing, CPU and memory use is very high, time is also very long (below my database is the whole evening), completed not need to use a lot of system resources, multiple Full-text query concurrency also has a large CPU consumption, but stronger than like.

The database on my system is 123M, too small, and using full-text indexing does not feel particularly advantageous, but it can be imagined that with such massive data as Google, it would be unthinkable to use like: Of course, no one else uses the relational database.

In the process of using SQL search, also found a problem: it to Chinese, is the word participle, the following I explain:

For example, ' Many of the members of the blog is the MVP ' sentence, if one of the words indexed, than the use of the ' blog Hall ', ' members ', MVP ' several words indexed a large number of indexes, so not only waste space, but also affect the efficiency and accuracy of the index. If English is indexed by letters rather than words, it is estimated that there are no full-text indexes or Google in the world today.

But Chinese in participle, compared to English has a natural barrier, the English word between the interval, but Chinese is not, must use the computer artificial intelligence to divide the sentence into one word, sometimes, according to the sentence itself is not enough, must also be based on the context, or some of the day-to-day knowledge to judge. such as table tennis Racket/sold/finished and table tennis/auction/finished, computer how can I know what is the meaning and correct participle.

According to the results of use, SQL search should be used in Chinese word segmentation (probably because of the English engine), for example, you want to check ' mark ', it will be ' Marx ' also to you.

My 123M database, full-text indexing has 55M, each full-text query is slower (of course, the machine is also very time).

--------------------------------------------------------------------------------------------------

About Word segmentation:

Should still be satisfying red Childe's saying more appropriate, everybody look this sentence:

Can the operating system limit its usage to each port by rewriting the language of the remittance scam?

In order to verify participle, intentionally using the wrong word, if all can be indexed to the sentence, it is according to Word participle. Like using ' sinks '? Query, can also find out the sentence, so that the SQL Server by word of the conclusion, I did not further check, but now found that the use of ' write limit ', the use of ' EC ' can not find out, to prove that there is a simple word in SQL Server, but the result is not ideal.

In addition, SQL Server can use Third-party products to enhance word segmentation capabilities.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Index full-text retrieval in depth and simplicity

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support