If the IK word breaker is configured as
<fieldtype name= "Text_ik" class= "SOLR. TextField "> <analyzer type=" index "ismaxwordlength=" false "class=" Org.wltea.analyzer.lucene.IKAnalyzer " /> <analyzer type= "Query" ismaxwordlength= "true" class= "Org.wltea.analyzer.lucene.IKAnalyzer"/></ Fieldtype>
I test the words can be divided, but synonyms, expand the thesaurus is not used,
Online check all kinds of information said IK word breaker has a bug, to own jar file to change, so find IK source code, inside only Ikanalyzer of the source codes are as follows
Package Org.wltea.analyzer.lucene;import Java.io.reader;import Org.apache.lucene.analysis.analyzer;import org.apache.lucene.analysis.tokenizer;/** * IK word breaker, Lucene Analyzer interface implementation * Compatible with Lucene 4.0 version */public final class Ikanalyzer ext Ends Analyzer{private Boolean usesmart;public boolean Usesmart () {return usesmart;} public void Setusesmart (Boolean usesmart) {this.usesmart = Usesmart;} /** * IK word breaker Lucene Analyzer Interface Implementation class * * Default fine-grained segmentation algorithm */public Ikanalyzer () {this (false);} /** * IK word breaker Lucene Analyzer Interface Implementation class * * @param usesmart when True, the word breaker intelligently shards */public Ikanalyzer (Boolean Usesmart) {super (); This.usesmart = Usesmart;} /** * Overload Analyzer interface, construct sub-phrase */@Overrideprotected tokenstreamcomponents createcomponents (String fieldName, Final Reader in {Tokenizer _iktokenizer = new Iktokenizer (in, This.usesmart ()); return new tokenstreamcomponents (_iktokenizer);}}
I added a ikanalyzersolrfactory, the code is as follows
Package Org.wltea.analyzer.lucene;import Java.io.reader;import Java.util.map;import Org.apache.lucene.analysis.tokenizer;import Org.apache.lucene.analysis.util.tokenizerfactory;import Org.apache.lucene.util.AttributeSource.AttributeFactory; public class Ikanalyzersolrfactory extends tokenizerfactory{ private Boolean usesmart; public Boolean Usesmart () { return usesmart; } public void Setusesmart (Boolean usesmart) { this.usesmart = Usesmart; } Public ikanalyzersolrfactory (map<string,string> args) { super (args); Assurematchversion (); This.setusesmart (Args.get ("Usesmart"). ToString (). Equals ("true")); @Override Public Tokenizer Create (attributefactory factory, Reader input) { Tokenizer _iktokenizer = new Iktokenizer (input, this.usesmart); return _iktokenizer; } }
This allows you to configure the ikanalyzersolrfactory in the configuration file.
Here are the specific configuration descriptions:
1. Modify the IK jar file, add ikanalyzersolrfactory (if not change my QQ 632132852 ask me to)
2. Modify the Solrconfig.xml file to add
<lib dir= "/contrib/analysis-extras/lib" regex= ". *\.jar"/>
3. Modify the Schema.xml file to add
<!--IK word breakers--<fieldtype name= "Text_ik" class= "SOLR. TextField "> <analyzer type=" index "> <tokenizer class=" Org.wltea.analyzer.lucene.IKAnalyzerSolrFactory "usesmart=" true "/> </analyzer> <analyzer Type= "Query" > <tokenizer class= "Org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" usesmart= "true"/> <filter class= "SOLR. Synonymfilterfactory "synonyms=" Synonyms.txt "ignorecase=" true "expand=" true "/> </analyzer> < /fieldtype>
4. In the classes (no new) under SOLR webinfo, add some files in the IK compressed file, as follows:
5. Configure the custom thesaurus in Ext.dic, the words that do not need to be segmented are here, and the synonyms are written in synonyms.txt. Format: Notifications, announcements
Note that changing the thesaurus or synonym every time requires a restart of the service.
Original address: http://www.cnblogs.com/wudi521/p/5558880.html
IK word breaker integration solr4.7 with synonyms, segmentation words, stop words