Java implementation of Google and Baidu "you are not looking for" function

Source: Internet
Author: User
Tags create directory

Background:

In the use of search engines and the power of the electronic business, we must have encountered such a situation: I want to search blog park, can not be careful to lose into the blog, do not worry about the search results you want, because the search engine based on large data will help you automatically correct, For this example, Google and Baidu return to me are:

Displays the results of the following query words: Blog Park and you are looking for is not: Blog Park, they have done automatic error correction, about automatic error correction I also wrote a text before, was the implementation of their own N-gram model, but the effect is not too good, mainly for different corpus algorithm accuracy is not the same, I want to try a different algorithm, the current mainstream calculation Ging distance (on the contrary, you can also be understood as similarity) is Levenshtein, when to achieve, discover Lucene has done this thing, then we stand on the shoulders of giants grow it.

Reference Package:

Lucene-core-3.1.0.jar + Lucene-spellchecker-3.1.0.jar, you can get here

Use examples:

Add the following code to the main method of class Spellcorrector

Create directory
File Dict = new file ("");
Directory directory = Fsdirectory.open (dict);
   
Instantiate the spelling checker 
spellchecker sp = new spellchecker (directory); 
   
Create dictionary
File dictionary = new file (SpellCorrecter.class.getResource ("Dictionary.txt"). GetFile ());
   
Index the dictionary
sp.indexdictionary (New Plaintextdictionary (dictionary));
   
Search String with typos
= "very undisturbed";
    
   
The number of suggestions, here I just want the closest one, you can set to other numbers, such as 3
//View this column more highlights: http://www.bianceng.cnhttp://www.bianceng.cn/Programming /java/
int suggestionnumber = 1;
    
Get the recommended keyword
string[] suggestions = sp.suggestsimilar (search, suggestionnumber);
   
Display results
System.out.println ("Search:" + searches);
   
for (String word:suggestions) {
    System.out.println ("What you are looking for is not:" + word);
}

Note: Before you need to have a corpus, I am here a file with the correct video name, the format is as follows:

Beauty
of Blood and tears ice fire general passion in the hands of the
enemy
Wind competition boat King second
fishing beetle
Xiaoxiang Road A
play outside the second season
Prairie Wolf Jazz
Save Private Ryan

OK, let's run it directly, see the following figure:

complete code and a dictionary here (limited to work reasons, the dictionary only retains part of the movie name, you can use your own corpus)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.