Solr6.5 configure Chinese Word Segmentation IKAnalyzer and pinyin word segmentation pinyinAnalyzer (2), solrikanalyzer Configuration

Source: Internet
Author: User
Tags solr

Solr6.5 configure Chinese Word Segmentation IKAnalyzer and pinyin word segmentation pinyinAnalyzer (2), solrikanalyzer Configuration

Previously, the installation and configuration of Solr6.5 on Centos6 (I) introduced solr6.5 installation. This article mainly introduces how to create Solr Core and configure Chinese IKAnalyzer word segmentation and pinyin search.

1. Create a Core:

1. First, create the mycore directory in solrhome (see Solr6.5 installation and configuration on Centos6 (1) solr web. xml;

[root@localhost down]# [root@localhost down]# mkdir /down/apache-tomcat-8.5.12/solrhome/mycore[root@localhost down]# cd /down/apache-tomcat-8.5.12/solrhome/mycore
[root@localhost mycore]# 

2. Copy all files under solr-6.5.0 \ example-DIH \ solr to the/down/apache-tomcat-8.5.12/solrhome/mycore directory:

[root@localhost mycore]# cp -R /down/solr-6.5.0/example/example-DIH/solr/solr/* ./[root@localhost mycore]# lsconf  core.properties[root@localhost mycore]# 

3. Restart tomcat;

[root@localhost down]# /down/apache-tomcat-8.5.12/bin/shutdown.sh[root@localhost down]# /down/apache-tomcat-8.5.12/bin/startup.sh

4. Enter http: // localhost: 8080/solr/index.html in the browser to display the Solr management interface.

2. Configure the Chinese word segmentation that comes with solr:

1. Configure solr6.5 with Chinese word segmentation. Copy the solr-6.5.0/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-6.5.0.jar to the apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/directory.

[root@localhost down]# cp /down/solr-6.5.0/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-6.5.0.jar /down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/

2. Added support for Chinese word segmentation for core. Edit the managed-schema file under conf in mycore.

[root@localhost conf]# cd /down/apache-tomcat-8.5.12/solrhome/mycore/conf[root@localhost conf]# vi managed-schema 

Add

<fieldType name="text_smartcn" class="solr.TextField" positionIncrementGap="0">    <analyzer type="index">      <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>    </analyzer>    <analyzer type="query">       <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>    </analyzer></fieldType>

Restart tomcat and enter http: // localhost: 8080/solr/index.html #/mycore/analysis in the browser.

Enter some Chinese characters in the Field Value (Index) text box, and then Analyse Fieldname/FieldType: Select text_smartcn to view the effect of Chinese word segmentation.

 

3. Configure the Chinese word segmentation of IKAnalyzer:

1. DownloadIKAnalyzerThis is the latest solr6.5.

After decompression, there will be four files.

[root@localhost ikanalyzer-solr5]# lsext.dic  IKAnalyzer.cfg.xml  ik-analyzer-solr5-5.x.jar  stopword.dic
Ext. dic is the extended dictionary, stopword. dic is the Stop Word Dictionary, IKAnalyzer. cfg. xml is the configuration file, the ik-analyzer-solr5-5.x.jar is the word segmentation jar package.

2. Run IKAnalyzer in the folder. cfg. xml, ext. dic And stopword. copy the three dic files to the/webapps/solr/WEB-INF/classes directory and modify IKAnalyzer. cfg. xml
[root@localhost ikanalyzer-solr5]# cp ext.dic IKAnalyzer.cfg.xml stopword.dic /down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/classes/
<? Xml version = "1.0" encoding = "UTF-8"?> <! DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment> IK Analyzer Extension Configuration </comment> <! -- You can configure your own extended dictionary here --> <entry key = "ext_dict"> ext. dic; </entry> <! -- You can configure your own extended stopword dictionary here --> <entry key = "ext_stopwords"> stopword. dic; </entry> </properties>

3. add your own extended dictionary in ext. dic.

4. Copy the ik-analyzer-solr5-5.x.jar to the/down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/directory.

[root@localhost down]# cp /down/ikanalyzer-solr5/ik-analyzer-solr5-5.x.jar /down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/

5. Add the following configuration before the solrhome \ mycore \ conf \ managed-schema file </schema>.

<! -- IK word segmentation I added --> <fieldType name = "text_ik" class = "solr. textField "> <analyzer type =" index "isMaxWordLength =" false "class =" org. wltea. analyzer. lucene. IKAnalyzer "/> <analyzer type =" query "isMaxWordLength =" true "class =" org. wltea. analyzer. lucene. IKAnalyzer "/> </fieldType>

 

Note: Remember to encode stopword. dic, ext. dic as a UTF-8 without BOM.

Restart tomcat to view the word segmentation effect.

 

 
4. Configure pinyin search:

1, preparation, need to use pinyin4j-2.5.0.jar, pinyinAnalyzer. jar these two jar packages ,.

2. Copy the pinyin4j-2.5.0.jar and pinyinAnalyzer. jar packages to the/down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/directory.

[root@localhost down]# cp pinyin4j-2.5.0.jar pinyinAnalyzer4.3.1.jar /down/apache-tomcat-8.5.12/webapps/solr/WEB-INF/lib/

3. Add the following configuration before the solrhome \ mycore \ conf \ managed-schema file </schema>:

<fieldType name="text_pinyin" class="solr.TextField" positionIncrementGap="0">    <analyzer type="index">        <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>        <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2" />        <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" minGram="1" maxGram="20" />    </analyzer>    <analyzer type="query">        <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>        <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2" />        <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" minGram="1" maxGram="20" />    </analyzer></fieldType>

Restart tomcat to view the pinyin search results.

Here we use the Chinese word segmentation and pinyin4j provided by solr.

 

Related file:

Ikanalyzer-solr5.zip

Pinyin.zip

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.