Mmseg4j is a good Chinese word breaker, and the integration of SOLR and mmseg4j is very simple. As follows:
The first step: Download mmseg4j jar package, online search for a lot, the following is a connection on the CSDN: http://download.csdn.net/detail/nrs12345/6986585
Step Two: Copy the downloaded Mmseg4j-analysis-1.9.1.jar, Mmseg4j-core-1.9.1.jar, Mmseg4j-solr-2.2.0.jar three jar packages to the WEBAPPS/SOLR under Tomcat The/web-inf/lib directory.
Step Three: Modify the configuration file to open a directory of a core in the SOLR home directory, such as CORE0: Then open the Schema.xml file under Core0/conf.
Fourth step: Insert the following code in the Filedtype field:
<FieldTypename= "Textcomplex"class= "SOLR." TextField " > <Analyzer> <Tokenizerclass= "Com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"Mode= "complex"Dicpath= "dic/"/> </Analyzer> </FieldType> <FieldTypename= "Textmaxword"class= "SOLR." TextField " > <Analyzer> <Tokenizerclass= "Com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"Mode= "Max-word"Dicpath= "dic/"/> </Analyzer> </FieldType> <FieldTypename= "Textsimple"class= "SOLR." TextField " > <Analyzer> <Tokenizerclass= "Com.chenlb.mmseg4j.solr.MMSegTokenizerFactory"Mode= "simple"Dicpath= "dic/"/> </Analyzer> </FieldType>
4. Unzip the Mmseg4j-all-1.8.4-with-dic.war in a temporary folder and copy the Chars.dic, Units.dic, words.dic three dictionary files in the Data folder to tomcat_home/ The Solr_home/core0/dic directory;
5. Start Tomcat, Access http://localhost:8080/solr/admin/analysis.jsp, select the type in the field drop-down option, followed by Enter Textcomplex, and then in field The value of a random copy of the Chinese text, click Analyz, you can see the mmseg4j word results.
SOLR Series II: The Integration of SOLR and mmseg4j