Today we implement a simple word breaker, just do the demo using the following functions:
1, the participle according to the space, the horizontal bar, the point number to divide;
2, the implementation of HI and hello synonym query function;
3, to achieve hi and hello synonym highlighting;
Myanalyzer Implementation code:
public class Myanalyzer extends Analyzer {private int analyzertype;public myanalyzer (int type) {super (); analyzertype = Typ e;} @Overrideprotected tokenstreamcomponents createcomponents (String fieldName, Reader Reader) {Mytokenizer tokenizer = new Mytokenizer (FieldName, Reader, Analyzertype); return new tokenstreamcomponents (Tokenizer);}}
Mytokenizer Implementation code:
public class Mytokenizer extends Tokenizer {public class wordunit{wordunit (String word, int start, int length) {This.word = Word;this.start = Start;this.length = Length;//system.out.println ("\twordunit:" + Word + "|" + Start + "|" + length);} String word;int start;int length;} private int analyzertype;private int endposition;private iterator<wordunit> it;private arraylist<wordunit > Words;private final chartermattribute termatt;private final offsetattribute offsetatt;public MyTokenizer (String FieldName, Reader in, int type) {super (in); it = Null;endposition = 0;analyzertype = Type;offsetatt = AddAttribute (offsetat Tribute.class); Termatt = AddAttribute (Chartermattribute.class); AddAttribute (Payloadattribute.class);} @Overridepublic Boolean Incrementtoken () throws IOException {clearattributes (); char[] Inputbuf = new Char[1024];if (it = = NULL) {int bufSize = Input.read (INPUTBUF); if (bufSize <= 0) return false;int beginindex = 0;int EndIndex = 0;words = new Arraylist<wordunit> (); for (endIndex = 0; EndIndex < bufSize; endindex++) {if (Inputbuf[endindex]! = '-' && inputbuf[endindex]! = ' && Inputbuf[endindex]! = '. ') Continue;addword (Inputbuf, Beginindex, endIndex); beginindex = EndIndex + 1;} Addword (Inputbuf, Beginindex, EndIndex);//add the Lastif (Words.isempty ()) Return false;it = Words.iterator ();} if (it = null && it.hasnext ()) {Wordunit word = It.next (); Termatt.append (Word.word); Termatt.setlength ( Word.word.length ()); endposition = Word.start + word.length;offsetatt.setoffset (Word.start, endposition); return true;} return false;} @Overridepublic void Reset () throws IOException {Super.reset (); it = Null;endposition = 0;} @Overridepublic final void End () {int finaloffset = Correctoffset (this.endposition); Offsetatt.setoffset (Finaloffset, Finaloffset);} private void Addword (char[] inputbuf, int begin, int end) {if (end <= begin) return; String word = new string (Inputbuf, begin, End-begin), Words.add (New Wordunit (Word, begin, End-begin)); If(Analyzertype = = 0 && word.equals ("HI")) Words.add (New Wordunit ("Hello", begin, 2)); if (Analyzertype = = 0 && word.equals ("Hello")) Words.add (New Wordunit ("HI", begin, 5));}}
When indexing the word breaker type: analyzertype=0;
When searching the word breaker type: analyzertype=1;
When highlighting the word breaker type: analyzertype=0;
The effect of searching for Hello is as follows:
Score Doc 0 hightlight to:look <em>hello</em> onscore Doc 1 hightlight to:i am <em>hi</em> China Chinese
You can see that documents with hi are also searched and highlighted.
Lucene implements a custom word breaker (synonym query and highlighting)