The use of lucene.net to achieve high-efficiency wildcardquery, remember a similar Baidu search pull down keyword Lenovo function implementation.

Source: Internet
Author: User
Tags lenovo

Open Baidu Input

The search in the station also implements similar functions. The most basic approach, write a method to check the database Search history synthesis table Keywordsearch (first to record the searched keywords into a table, record the number of times they were searched, how many results were last searched)

Probably an SQL statement: Select keyword,searchcount,xxxx from table where keyword like ' Accounting% '

When the table Keywordsearch record is hundreds of tens of millions of times, like obviously can not respond in time. But it's a bad experience when the keyword is associated with a delayed return. I haven't waited for you to return. Lenovo results, users have already lost their own .... It also reminds of a ball.

And then this time, the thought was to use lucene.net

The apart is open and dry. Soon implemented the demo, created the index 2g, the search core code is as follows: (slag code, do not spray)

The test will return the result in about 1 seconds, but it is not fast enough and has a noticeable sense of delay.

Nima..... Unexpectedly good solution, and then tried to try Ramdirectory, or not, after all, ramdirectory just read the index once to the memory, to avoid a warm-up process, so the bottleneck feeling should be out of this wildcardquery ( Who knows lucene.net to achieve this kind of query, there are other ways to be efficient? See the have to know please comment tell me thank you. )。

At the time, I thought that I could only throw away some keywords, such as summarizing the key words of the last year, and making the index smaller.

But, the weekend rest, let me think of a way.

Why don't I split the index if I want to reduce the size of a single index?

First, when you create an index, you decide which index to put on the first character of the keyword. For example, "accounting" placed in the D:\LuceneIndex\Searchkeyword\k\ directory index, "management" in the D:\LuceneIndex\Searchkeyword\k\ directory.

Then I retrieve the different directories according to the keywords entered by the user. This should solve the problem.

Just do it and start changing the code.

Because the code is quite large, I'm going to stick to the core. (Ask again to forgive my slag code ....) )

The Getindexwriter method, which is based on the first spelling of the Chinese characters,

The Bllindexwriter class uses dictionary<string, string> objects, to load all the letters and their corresponding index paths.

It then uses a dictionary<string, indexwriter> object, to load all the letters, and the corresponding IndexWriter objects.

After all the indexes have been created, all IndexWriter objects are traversed and then closed and optimized.

Finally, the index corresponds to one by one different directories.

See figure, the original index and the current index.

After averaging so much, the search is basically the result immediately. Because each one has only dozens of MB of 100 MB. The amount of this, Lucene's wildcard can still be fixed.

For such queries, the database is divided into tables, and then like is also possible. As long as you're willing to use the database.

But who knows, is there any better way to solve this problem?

Record here. Slag code does not upload, if anyone just need to write such a function, and really can not write out the code of the inside me, I send a copy to you ....

The use of lucene.net to achieve high-efficiency wildcardquery, remember a similar Baidu search pull down keyword Lenovo function implementation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.