Spit Groove:
1. This week is busy like a dog, forget to learn, is still in copy other people's code, do not know why the awkward point ...
Description
1. Segmentation of the data flow:reader->tokenizer-> multiple Tokenfilter filter->tokenstream
2. The use of Chinese synonyms, the need for Mmseg4jjar package support, the main use of people's Word breaker (Mmsegtokenizer Class), this I still directly with ready-made.
And then customizing the Tokenizeffilter, and finally customizing a Chinese word breaker mysynonymanalyzer
3. Implementation to have a thesaurus file, such as the code in the Thesaurus directory D:\test\dictory, the preservation of Chinese Word Library information construction Mmsegtokenizer object is to read Word library information
I can find the amount on the Internet.
Code:
1. Custom Word breaker Filter
package synonym;
Import java.io.IOException;
Import Java.util.Queue;
Import Java.util.concurrent.LinkedBlockingQueue;
Import Org.apache.lucene.analysis.TokenFilter;
Import Org.apache.lucene.analysis.TokenStream;
Import Org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
Import Org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
Import Org.apache.lucene.util.AttributeSource; /** * Synonym Filter * * @version 2014-8-25 PM 01:56:08 */public class Mysynonymfilter extends Tokenfilter {private Ch
Artermattribute CTA;
Private Positionincrementattribute pia;
Private Attributesource.state current;
Save synonym information private queue<string> synonymqueue;
Custom get synonym interface private synonymcontext ynonymcontext;
Protected Mysynonymfilter (Tokenstream input, Synonymcontext ynonymcontext) {super (input);
CTA = This.addattribute (Chartermattribute.class);
Pia = This.addattribute (Positionincrementattribute.class); Synonymqueue = new Linkedblockingqueue<stRing> ();
This.ynonymcontext = Ynonymcontext;
@Override public boolean Incrementtoken () throws IOException {//is a synonym replacement if (synonymqueue.size () >0) {
String sysnonymstr = Synonymqueue.poll ();
Restore state restorestate (current) first;
Set synonym Cta.setempty ();
Cta.append (SYSNONYMSTR);
Hanging in the same position pia.setpositionincrement (0);
return true;
} if (!this.input.incrementtoken ()) {return false;
}else{if (addsynonym (cta.tostring ())) {//If a synonym exists, save the current state present = Capturestate ();
return true;
} Private Boolean addsynonym (String source) {string[] Allstrs = ynonymcontext.getsamewords (source);
if (allstrs!= null && allstrs.length > 0) {for (string string:allstrs) {Synonymqueue.add (string);
return true;
return false;
}
}
2. Custom synonym Breaker
package synonym;
Import Java.io.Reader;
Import Org.apache.lucene.analysis.Analyzer;
Import Org.apache.lucene.analysis.TokenStream;
Import Org.apache.lucene.analysis.Tokenizer;
Import com.chenlb.mmseg4j.Dictionary;
Import Com.chenlb.mmseg4j.MaxWordSeg;
Import Com.chenlb.mmseg4j.analysis.MMSegTokenizer;
/**
* simple synonym breaker
* @version 2014-8-25 Morning 10:46:21 * * * * */Public
class Mysynonymanalyzer extends Analyzer {
//Thesaurus directory D:\test\dictory
protected Dictionary dic;
Public Mysynonymanalyzer (String path) {
dic = dictionary.getinstance (path);
}
@Override
protected tokenstreamcomponents createcomponents (String fieldName,
Reader Reader) {
// reader->tokenizer-> multiple Tokenfilter->tokenstream
tokenizer Source = new Mmsegtokenizer (New MaxWordSeg ( DIC), reader);
Tokenstream filter = new Mysynonymfilter (source,new simplesynonymcontext ());
return new tokenstreamcomponents (Source,filter);
}