Step by step with me to learn Lucene (TEN)---the Suggest principle and application of the associative words hint of lucene search

Source: Internet
Author: User

Yesterday learned about the contents of the spell in the suggest package, mainly the spelling checker and similarity query hints;

Today is ready to understand the content of the Lenovo Word, Lucene's associative words are under the Org.apache.lucene.search.suggest package, providing automatic completion or association hints function support;

Inputiterator description

Inputiterator is an interface that supports enumeration of Term,weight,payload triples for suggester use, and currently supports only Analyzingsuggester, FuzzySuggester and AnalyzingInfixSuggester Three kinds of suggester support payloads;

There are several implementation classes for Inputiterator:

Bufferedinputiterator: Polling the input of the binary type;

Documentinputiterator: Polling in the field of the store from the index;

Fileiterator: Every time a single row of data polling is read from a file, the interval is at \ t (and the maximum number of \ t is 2);

Highfrequencyiterator: From the index in the store field polling, ignoring the length of the text is less than the set value;

Inputiteratorwrapper: Traverse Bytesrefiterator and the returned content does not contain payload and weight are 1;

Sortedinputiterator: Input polling of binary type and sorting according to the specified comparator algorithm;


Inputiterator provides the following methods:


Weight (): This method sets the weight of a term, the higher the suggest the higher the priority;

Payload (): each suggestion corresponding to the binary representation of the metadata, we need to transfer the object or object to convert a property of the Bytesref type, the corresponding Suggester call lookup will return payloads information;

Haspayload (): Judge iterator whether there is payloads;

Contexts (): Gets the contexts of a term that is used to filter the contents of suggest, and returns null if the Suggest list is empty

Hascontexts (): Get iterator whether there is contexts;

suggester Query Tool lookupClass description

This class provides the associative query function of a string

The lookup class provides a charsequencecomparator, which is primarily used to sort the charsequence, sorted by character order;

The built-in Lookupresult is used to return the results of suggest and is also sorted by key charsequencecomparator;

Built-in lookuppriorityqueue for storing lookupresult;


Methods provided by lookup

Build (Dictionary dict): Build from the specified directory;

Load (InputStream input): Turns InputStream into datainput and executes the load (Datainput) method;

Store (outputstream output): Turns OutputStream into DataOutput and executes the store (DataOutput) method;

GetCount (): Gets the number of entries for the build of lookup;

Build (Inputiterator Inputiterator): Constructs the Lookup object according to the specified inputiterator;

Lookup (Charsequence key, boolean onlymorepopular, int num): The possible results from the key query are returned with a value of list<lookupresult>;


The related implementations of lookup are as follows:

Write your own suggest module

Note: In suggest we need to import Lucene-misc-5.1.0.jar otherwise the system will prompt class Sortedmergepolicy not found;

First we define our own entity classes:

package Com.lucene.suggest;import Java.io.serializable;public class Product implements Serializable {private static final long Serialversionuid = 1l;private string Name;private string image;private string[] regions;private int numbersold;public Pro Duct (string name, string image, string[] regions, int numbersold) {this.name = Name;this.image = Image;this.regions = Regi Ons;this.numbersold = Numbersold;} Public String GetName () {return name;} public void SetName (String name) {this.name = name;} Public String GetImage () {return image;} public void SetImage (String image) {this.image = image;} Public string[] Getregions () {return regions;} public void Setregions (string[] regions) {this.regions = regions;} public int Getnumbersold () {return numbersold;} public void Setnumbersold (int numbersold) {this.numbersold = Numbersold;}} 

Then define inputiterator here to define the consumer is list<object>, and the list is traversed into the payload:

Package Com.lucene.suggest;import Java.io.bytearrayoutputstream;import Java.io.ioexception;import Java.io.objectoutputstream;import Java.io.unsupportedencodingexception;import Java.util.Comparator;import Java.util.hashset;import Java.util.iterator;import Java.util.set;import Org.apache.lucene.search.suggest.inputiterator;import Org.apache.lucene.util.bytesref;public Class    Productiterator implements Inputiterator {private iterator<product> productiterator;    Private Product currentproduct;    Productiterator (iterator<product> productiterator) {this.productiterator = Productiterator;    } public boolean hascontexts () {return true;    }/** * Whether there is set payload information */public boolean haspayloads () {return true;    } public comparator<bytesref> Getcomparator () {return null;            Public Bytesref Next () {if (Productiterator.hasnext ()) {currentproduct = Productiterator.next ();                try {return new Bytesref (Currentproduct.getname (). GetBytes ("UTF8"));            } catch (Unsupportedencodingexception e) {throw new RuntimeException ("couldn ' t convert to UTF-8", e);        }} else {return null;            }} public Bytesref payload () {try {bytearrayoutputstream bos = new Bytearrayoutputstream ();            ObjectOutputStream out = new ObjectOutputStream (BOS);            Out.writeobject (currentproduct);            Out.close ();        return new Bytesref (Bos.tobytearray ());        } catch (IOException e) {throw new RuntimeException ("Well that ' s unfortunate."); }} public set<bytesref> contexts () {try {set<bytesref> regions = new Hashset<byt            Esref> ();            For (String region:currentProduct.getRegions ()) {Regions.add (New Bytesref (Region.getbytes ("UTF8"));        } return regions; } catch (UnsupportedencodiNgexception e) {throw new RuntimeException ("couldn ' t convert to UTF-8");    }} public long weight () {return currentproduct.getnumbersold (); }}

Writing test Classes
Package Com.lucene.suggest;import Java.io.bytearrayinputstream;import Java.io.ioexception;import Java.io.objectinputstream;import Java.nio.file.paths;import Java.util.arraylist;import Java.util.HashSet;import Java.util.list;import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.search.suggest.lookup.lookupresult;import Org.apache.lucene.search.suggest.analyzing.analyzinginfixsuggester;import org.apache.lucene.store.Directory; Import Org.apache.lucene.store.fsdirectory;import Org.apache.lucene.util.bytesref;public class SuggestProducts { private static void lookup (Analyzinginfixsuggester suggester, String name,string region) throws IOException {hashset< bytesref> contexts = new hashset<bytesref> () Contexts.add (New Bytesref (Region.getbytes ("UTF8")); list<lookupresult> results = suggester.lookup (name, contexts, 2, true, false); System.out.println ("--\" "+ name +" \ "(" + Region + "):"), for (Lookupresult result:results) {System.out.println (resUlt.key); Bytesref Bytesref = Result.payload;objectinputstream is = new ObjectInputStream (New Bytearrayinputstream ( Bytesref.bytes)); Product Product = null;try {Product = (product) is.readobject ();} catch (ClassNotFoundException e) {//TODO auto-generated Catch Blocke.printstacktrace ();} System.out.println ("Product-name:" + product.getname ()); System.out.println ("product-regions:" + product.getregions ()); System.out.println ("Product-image:" + product.getimage ()); System.out.println ("Product-numbersold:" + product.getnumbersold ());} System.out.println ();} public static void Main (string[] args) {try {Directory Indexdir = Fsdirectory.open (Paths.get ("Suggestpath", new String[0] )); StandardAnalyzer Analyzer = new StandardAnalyzer (); Analyzinginfixsuggester suggester = new Analyzinginfixsuggester (Indexdir, analyzer); arraylist<product> products = new arraylist<product> ();p Roducts.add (New Product ("Electric Guitar", "http ://images.example/electric-guitar.jpg ", new string[] {" US "," CA "}, 100));Products.add (New Product ("Electric Train", "http://images.example/train.jpg", new string[] {"US", "CA"}, 100)); Products.add (New Product ("Acoustic Guitar", "http://images.example/acoustic-guitar.jpg", new string[] {"US", "ZA"}, 80 );p Roducts.add (New Product ("Guarana Soda", "http://images.example/soda.jpg", new string[] {"ZA", "IE"}, 130)); Suggester.build (New Productiterator (Products.iterator ())), lookup (Suggester, "Gu", "US"), Lookup (Suggester, "Gu", "ZA Lookup (Suggester, "Gui", "CA"), Lookup (Suggester, "Electric guit", "US"); Suggester.refresh ();} catch (IOException e) {System.err.println ("error!");}}}

The code will be released tomorrow.

Step by step with me to learn Lucene is a summary of the recent Lucene index, we have a question to contact my q-q: 891922381, at the same time I new Q-q group: 106570134 (Lucene,solr,netty,hadoop), such as Mongolia joined, Greatly appreciated, we discuss together, I strive for a daily Bo, I hope that we continue to pay attention, will bring you surprise




Step by step with me to learn Lucene (TEN)---the Suggest principle and application of the associative words hint of lucene search

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.