Solr6.5 configure Chinese Word Segmentation IKAnalyzer and pinyin word segmentation pinyinAnalyzer (2), solrikanalyzer Configuration
Previously, the installation and configuration of Solr6.5 on Centos6 (I) introduced solr6.5 installation. This article mainly introduces how to create Solr Core and configure Chinese IKAnalyzer word segmentation and pinyin search.
1. Create a Core:
1. First, create the mycore directory in solrhome (see Solr6.5 installation and configuration on Centos6 (1) solr web. xml;
[root@localhost down]# [root@localhost down]# mkdir /down/apache-tomcat-8.5.12/solrhome/mycore[root@localhost down]# cd /down/apache-tomcat-8.5.12/solrhome/mycore
[root@localhost mycore]#
2. Copy all files under solr-6.5.0 \ example-DIH \ solr to the/down/apache-tomcat-8.5.12/solrhome/mycore directory:
[root@localhost mycore]# cp -R /down/solr-6.5.0/example/example-DIH/solr/solr/* ./[root@localhost mycore]# lsconf core.properties[root@localhost mycore]#
3. Restart tomcat;
[root@localhost down]# /down/apache-tomcat-8.5.12/bin/shutdown.sh[root@localhost down]# /down/apache-tomcat-8.5.12/bin/startup.sh
4. Enter http: // localhost: 8080/solr/index.html in the browser to display the Solr management interface.
2. Configure the Chinese word segmentation that comes with solr:
1. Configure solr6.5 with Chinese word segmentation. Copy the solr-6.5.0/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-6.5.0.jar to the apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/directory.
[root@localhost down]# cp /down/solr-6.5.0/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-6.5.0.jar /down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/
2. Added support for Chinese word segmentation for core. Edit the managed-schema file under conf in mycore.
[root@localhost conf]# cd /down/apache-tomcat-8.5.12/solrhome/mycore/conf[root@localhost conf]# vi managed-schema
Add
<fieldType name="text_smartcn" class="solr.TextField" positionIncrementGap="0"> <analyzer type="index"> <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/> </analyzer></fieldType>
Restart tomcat and enter http: // localhost: 8080/solr/index.html #/mycore/analysis in the browser.
Enter some Chinese characters in the Field Value (Index) text box, and then Analyse Fieldname/FieldType: Select text_smartcn to view the effect of Chinese word segmentation.
3. Configure the Chinese word segmentation of IKAnalyzer:
1. DownloadIKAnalyzerThis is the latest solr6.5.
After decompression, there will be four files.
[root@localhost ikanalyzer-solr5]# lsext.dic IKAnalyzer.cfg.xml ik-analyzer-solr5-5.x.jar stopword.dic
Ext. dic is the extended dictionary, stopword. dic is the Stop Word Dictionary, IKAnalyzer. cfg. xml is the configuration file, the ik-analyzer-solr5-5.x.jar is the word segmentation jar package.
2. Run IKAnalyzer in the folder. cfg. xml, ext. dic And stopword. copy the three dic files to the/webapps/solr/WEB-INF/classes directory and modify IKAnalyzer. cfg. xml
[root@localhost ikanalyzer-solr5]# cp ext.dic IKAnalyzer.cfg.xml stopword.dic /down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/classes/
<? Xml version = "1.0" encoding = "UTF-8"?> <! DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment> IK Analyzer Extension Configuration </comment> <! -- You can configure your own extended dictionary here --> <entry key = "ext_dict"> ext. dic; </entry> <! -- You can configure your own extended stopword dictionary here --> <entry key = "ext_stopwords"> stopword. dic; </entry> </properties>
3. add your own extended dictionary in ext. dic.
4. Copy the ik-analyzer-solr5-5.x.jar to the/down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/directory.
[root@localhost down]# cp /down/ikanalyzer-solr5/ik-analyzer-solr5-5.x.jar /down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/
5. Add the following configuration before the solrhome \ mycore \ conf \ managed-schema file </schema>.
<! -- IK word segmentation I added --> <fieldType name = "text_ik" class = "solr. textField "> <analyzer type =" index "isMaxWordLength =" false "class =" org. wltea. analyzer. lucene. IKAnalyzer "/> <analyzer type =" query "isMaxWordLength =" true "class =" org. wltea. analyzer. lucene. IKAnalyzer "/> </fieldType>
Note: Remember to encode stopword. dic, ext. dic as a UTF-8 without BOM.
Restart tomcat to view the word segmentation effect.
4. Configure pinyin search:
1, preparation, need to use pinyin4j-2.5.0.jar, pinyinAnalyzer. jar these two jar packages ,.
2. Copy the pinyin4j-2.5.0.jar and pinyinAnalyzer. jar packages to the/down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/directory.
[root@localhost down]# cp pinyin4j-2.5.0.jar pinyinAnalyzer4.3.1.jar /down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/
3. Add the following configuration before the solrhome \ mycore \ conf \ managed-schema file </schema>:
<fieldType name="text_pinyin" class="solr.TextField" positionIncrementGap="0"> <analyzer type="index"> <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/> <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2" /> <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" minGram="1" maxGram="20" /> </analyzer> <analyzer type="query"> <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/> <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2" /> <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" minGram="1" maxGram="20" /> </analyzer></fieldType>
Restart tomcat to view the pinyin search results.
Here we use the Chinese word segmentation and pinyin4j provided by solr.
Related file:
Ikanalyzer-solr5.zip
Pinyin.zip