IK word breaker integration solr4.7 with synonyms, segmentation words, stop words

Source: Internet
Author: User
Tags solr

If the IK word breaker is configured as

<fieldtype name= "Text_ik" class= "SOLR. TextField ">       <analyzer type=" index "ismaxwordlength=" false "class=" Org.wltea.analyzer.lucene.IKAnalyzer " />           <analyzer type= "Query" ismaxwordlength= "true" class= "Org.wltea.analyzer.lucene.IKAnalyzer"/></ Fieldtype>

I test the words can be divided, but synonyms, expand the thesaurus is not used,

Online check all kinds of information said IK word breaker has a bug, to own jar file to change, so find IK source code, inside only Ikanalyzer of the source codes are as follows

Package Org.wltea.analyzer.lucene;import Java.io.reader;import Org.apache.lucene.analysis.analyzer;import org.apache.lucene.analysis.tokenizer;/** * IK word breaker, Lucene Analyzer interface implementation * Compatible with Lucene 4.0 version */public final class Ikanalyzer ext Ends Analyzer{private Boolean usesmart;public boolean Usesmart () {return usesmart;} public void Setusesmart (Boolean usesmart) {this.usesmart = Usesmart;} /** * IK word breaker Lucene  Analyzer Interface Implementation class *  * Default fine-grained segmentation algorithm */public Ikanalyzer () {this (false);} /** * IK word breaker Lucene Analyzer Interface Implementation class *  * @param usesmart when True, the word breaker intelligently shards */public Ikanalyzer (Boolean Usesmart) {super (); This.usesmart = Usesmart;} /** * Overload Analyzer interface, construct sub-phrase */@Overrideprotected tokenstreamcomponents createcomponents (String fieldName, Final Reader in {Tokenizer _iktokenizer = new Iktokenizer (in, This.usesmart ()); return new tokenstreamcomponents (_iktokenizer);}}

  

I added a ikanalyzersolrfactory, the code is as follows

Package Org.wltea.analyzer.lucene;import Java.io.reader;import Java.util.map;import Org.apache.lucene.analysis.tokenizer;import Org.apache.lucene.analysis.util.tokenizerfactory;import Org.apache.lucene.util.AttributeSource.AttributeFactory; public class Ikanalyzersolrfactory extends tokenizerfactory{         private Boolean usesmart;         public Boolean Usesmart () {        return usesmart;    }         public void Setusesmart (Boolean usesmart) {        this.usesmart = Usesmart;    }          Public ikanalyzersolrfactory (map<string,string> args) {         super (args);         Assurematchversion ();         This.setusesmart (Args.get ("Usesmart"). ToString (). Equals ("true"));      @Override Public    Tokenizer Create (attributefactory factory, Reader input) {        Tokenizer _iktokenizer = new Iktokenizer (input, this.usesmart);        return _iktokenizer;    } }

This allows you to configure the ikanalyzersolrfactory in the configuration file.

Here are the specific configuration descriptions:

1. Modify the IK jar file, add ikanalyzersolrfactory (if not change my QQ 632132852 ask me to)

2. Modify the Solrconfig.xml file to add

<lib dir= "/contrib/analysis-extras/lib" regex= ". *\.jar"/>

3. Modify the Schema.xml file to add

<!--IK word breakers--<fieldtype name= "Text_ik" class= "SOLR. TextField ">        <analyzer type=" index ">            <tokenizer class=" Org.wltea.analyzer.lucene.IKAnalyzerSolrFactory "usesmart=" true "/>        </analyzer>        <analyzer Type= "Query" >            <tokenizer class= "Org.wltea.analyzer.lucene.IKAnalyzerSolrFactory" usesmart= "true"/> <filter class= "SOLR. Synonymfilterfactory "synonyms=" Synonyms.txt "ignorecase=" true "expand=" true "/>          </analyzer>    < /fieldtype>

4. In the classes (no new) under SOLR webinfo, add some files in the IK compressed file, as follows:

5. Configure the custom thesaurus in Ext.dic, the words that do not need to be segmented are here, and the synonyms are written in synonyms.txt. Format: Notifications, announcements

Note that changing the thesaurus or synonym every time requires a restart of the service.

Original address: http://www.cnblogs.com/wudi521/p/5558880.html

IK word breaker integration solr4.7 with synonyms, segmentation words, stop words

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.