1. Custom Analyzer:
ImportCom.dys.lucene.filter.SameWordTokenFilter;ImportOrg.apache.lucene.analysis.Analyzer;ImportOrg.apache.lucene.analysis.standard.StandardTokenizer; Public classSamewordanalyzerextendsAnalyzer {@Overrideprotectedtokenstreamcomponents createcomponents (String fieldName) {Standardtokenizer Standardtokenizer=NewStandardtokenizer (); Samewordtokenfilter Samewordtokenfilter=NewSamewordtokenfilter (Standardtokenizer); Tokenstreamcomponents tokenstreamcomponents=Newtokenstreamcomponents (Standardtokenizer, Samewordtokenfilter); returntokenstreamcomponents; }}
2. Custom Tokenfilter
ImportOrg.apache.lucene.analysis.TokenFilter;ImportOrg.apache.lucene.analysis.TokenStream;ImportOrg.apache.lucene.analysis.tokenattributes.CharTermAttribute;ImportOrg.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;Importjava.io.IOException;ImportJava.util.HashMap;ImportJava.util.Map;ImportJava.util.Stack; Public classSamewordtokenfilterextendsTokenfilter {PrivateChartermattribute Chartermattribute; PrivatePositionincrementattribute Positionincrementattribute; PrivateState state ; PrivateStack<string>Stack; Publicsamewordtokenfilter (Tokenstream input) {Super(input); This. Stack =NewStack<>(); This. Chartermattribute = This. AddAttribute (Chartermattribute.class); This. Positionincrementattribute = This. AddAttribute (Positionincrementattribute.class); This. Stack =NewStack<>(); } @Override Public Final BooleanIncrementtoken ()throwsIOException { while( This. Stack.size () > 0) { This. Restorestate ( This. State); This. Chartermattribute.setempty (); This. Chartermattribute.append ( This. Stack.pop ()); This. positionincrementattribute.setpositionincrement (0); return true; } if(! This. Input.incrementtoken ()) { return false; } String Term= This. chartermattribute.tostring (); if( This. Getsamewords (term)) { This. State = This. Capturestate (); } return true; } Private Booleangetsamewords (String name) {Map<string, string[]> map =NewHashmap<>(); Map.put ("Beauty",Newstring[]{"Beautiful", "good looking"}); Map.put ("Flower",NewString[]{"Flowers", "Flowers"}); string[] Words=map.get (name); if(Words! =NULL) { for(String word:words) { This. Stack.push (word); } return true; } return false; }}
3. Using custom Analyzer and custom Tokenfilter
Arraylist<string> strings =NewArraylist<string>() {{ This. Add ("Little devil"); This. Add ("Yankee"); }}; Analyzer Analyzer=NewCustomstandardanalyzer (strings); String content= "Little Devil and Yankee are playing together!"; Tokenstream Tokenstream= Analyzer.tokenstream ("MyField", content); Tokenstream.reset (); Chartermattribute Chartermattribute= Tokenstream.addattribute (Chartermattribute.class); while(Tokenstream.incrementtoken ()) {//custom stop words have been filtered out//output: Playing togetherSystem.out.println (chartermattribute.tostring ()); } tokenstream.end (); Tokenstream.close (); Analyzer.close ();
4. Code interpretation, the association between the specific analyzer and Tokenfilter, with the debug function of Eclipse, tracking understanding.
Lucene 7.2.1 Custom Analyzer and Tokenfilter