Lucene 4.x spellcheck instructions for use

Last Update:2015-04-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

SpellCheck is a new version of Lucene functionality, before introducing spellcheck, we need to figure out spellcheck support several data sources. The SpellCheck constructor requires an incoming dictionary interface:

Package org.apache.lucene.search.spell;/* * Licensed to the Apache software Foundation (ASF) under one or more * contribut  or license agreements. See the NOTICE file distributed with * This work for additional information regarding copyright ownership. * The ASF licenses this file to you under the Apache License, Version 2.0 * (the "License");  You are not a use of this file except in compliance with * the License. Obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * unless required by applicab Le law or agreed into writing, software * Distributed under the License is distributed on a "as is" BASIS, * without WAR Ranties or CONDITIONS of any KIND, either express OR implied. * See the License for the specific language governing permissions and * limitations under the License. */import java.io.ioexception;import org.apache.lucene.search.suggest.inputiterator;/** * A Simple interface Representing a Dictionary. A Dictionary * Here is a list of entries,Where every entry consists of * term, weight and payload. * */public interface Dictionary {/** * Returns an iterator through all the entries * @return iterator */Inputitera Tor Getentryiterator () throws IOException;}

Commonly used dictionary mainly have the following types, commonly used mainly in text-based and Lucene-based index building:

Here is a piece of code that I tested, including index build and Index queries:

Package Com.tianditu.com.search;import Java.io.file;import Java.io.ioexception;import Org.apache.lucene.index.directoryreader;import Org.apache.lucene.index.indexwriterconfig;import Org.apache.lucene.search.spell.lucenedictionary;import Org.apache.lucene.search.spell.spellchecker;import Org.apache.lucene.store.directory;import Org.apache.lucene.store.fsdirectory;import Org.apache.lucene.store.mmapdirectory;import Org.apache.lucene.util.version;public class GlobalSuggest {// The index built by the spelling checker private final string spell_check_folder = "c:\\spellcheck\\";//based on an existing index private final string global_pinyin_ SUGGEST = "o:\\searchwork_custom\\data_index\\pinyin2008\\";//Build index public void testIndexPinyin2008 () throws Ioexception{long start = System.currenttimemillis ();//Beijing Jiwei Times Software Co., Ltd.//string indexdir = "O:\\searchwork_custom\\data _index\\globalindex\\ ";D irectory direct = new Mmapdirectory (new File (global_pinyin_suggest)); Lucenedictionary ld = new Lucenedictionary (Directoryreader.open (direct), "name"); ld.getentryIterator ();D irectory spd = Fsdirectory.open (new File (Spell_check_folder)); Spellchecker sc = new Spellchecker (SPD),//sc.inindexwriterconfig IWC = new Indexwriterconfig (version.lucene_30,null); /write index to spellcheck directory--------------sc.indexdictionary (LD, IWC, true); Sc.close (); Long end = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("Index completed, time consuming:" + (End-start) + "MS");} public void Testindex () throws Ioexception{long start = System.currenttimemillis ();//Beijing Jiwei Times Software Co., ltd. string indexdir = "O : \\searchwork_custom\\data_index\\GlobalIndex\\ ";D irectory direct = new Mmapdirectory (new File (Indexdir)); Lucenedictionary ld = new Lucenedictionary (Directoryreader.open (direct), "name"), Ld.getentryiterator ();D irectory SPD = Fsdirectory.open (new File (Spell_check_folder)); Spellchecker sc = new Spellchecker (SPD),//sc.inindexwriterconfig IWC = new Indexwriterconfig (version.lucene_30,null); Sc.indexdictionary (LD, IWC, true); Sc.close (); Long end = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("Index completed, time consuming:" + (End-start) + "MS");} Public void Testsearch (String wd) throws ioexception{//build directorydirectory spd = Fsdirectory.open (new File (Spell_check_ FOLDER));//Instantiate the SpellCheck component Spellchecker sc = new spellchecker (SPD);//Get n the closest chance to the input keyword the third one despises the accuracy the greater the match installation actually needs to adjust string[] suggests = Sc.suggestsimilar (wd, 10,0.6f), if (Suggests!=null) {for (String word:suggests) {System.out.println ("Dou Mean: "+word);}}} /** * @param args * @throws ioexception */public static void Main (string[] args) throws IOException {Globalsuggest SPELLC Heck = new Globalsuggest ();//spellcheck.testindexpinyin2008 () Spellcheck.testsearch ("Beijing Peking Duck");// Spellcheck.testsearch ("Beijng");}}

Where Index Building code:

Build index public void testIndexPinyin2008 () throws Ioexception{long start = System.currenttimemillis ();//Beijing Jiwei Times Software Co., Ltd.// String Indexdir = "o:\\searchwork_custom\\data_index\\globalindex\\";D irectory direct = new Mmapdirectory (New File ( Global_pinyin_suggest)); Lucenedictionary ld = new Lucenedictionary (Directoryreader.open (direct), "name"), Ld.getentryiterator ();D irectory SPD = Fsdirectory.open (new File (Spell_check_folder)); Spellchecker sc = new Spellchecker (SPD),//sc.inindexwriterconfig IWC = new Indexwriterconfig (version.lucene_30,null); /write index to spellcheck directory--------------sc.indexdictionary (LD, IWC, true); Sc.close (); Long end = System.currenttimemillis (); SYSTEM.OUT.PRINTLN ("Index completed, time consuming:" + (End-start) + "MS");}

The code here is the index required to build the spellcheck based on an existing index.

The spellcheck query index code snippet is as follows:

Build directorydirectory spd = Fsdirectory.open (new File (Spell_check_folder));//Instantiate spellcheck component Spellchecker sc = new Spellchecker (SPD);//  The most approximate probability of obtaining n according to the input keyword the third contempt for accuracy the larger the match installation actually needs adjustment string[] suggests = Sc.suggestsimilar (wd, 10,0.6f); if (Suggests!=null) {for (String word:suggests) {System.out.println ("Dou You mean:" +word);}}

Correlation algorithm: The default is Levensteindistance.

Query Sample:

1, query Chinese characters, there is a typo situation:

2, query pinyin:

3, pinyin Chinese characters inclusions:

(Note: The problem is found, pinyin and Chinese characters are not the case, if you want to use, you need some sort of treatment.) ）

4, if processing a long list of Chinese characters, the middle inclusion of typos:

Summary: It seems that spellcheck ability is still limited, if needed, may also be modified.

Lucene 4.x spellcheck instructions for use

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene 4.x spellcheck instructions for use

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene 4.x spellcheck instructions for use

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support