Solr adds a Chinese IK word breaker and configures a custom thesaurus

Last Update:2015-05-08 Source: Internet

Author: User

Tags solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

SOLR is a lucene-based Java search engine server. SOLR provides level search, hit highlighting, and supports multiple output formats (including XML/XSLT and JSON formats). It is easy to install and configure, and comes with an HTTP-based management interface. SOLR has been used in a number of large sites, more mature and stable. SOLR has packaged and expanded Lucene, so Solr basically follows the terms of Lucene. More importantly, the index created by SOLR is fully compatible with the Lucene search engine library. By properly configuring SOLR, some situations may require coding, and SOLR can read and use indexes built into other Lucene applications. In addition, many Lucene tools (such as Nutch, Luke) can also use the index created by SOLR.

SOLR default is not support Chinese word segmentation, so we need to manually configure the Chinese word breaker, here we choose the IK Analyzer Chinese word breaker.

IK analyzer:https://code.google.com/p/ik-analyzer/downloads/list

By default everyone has downloaded and unzipped SOLR, where we use the SOLR 4.10.4 version

Test environment CentOS 6.5, JDK1.7

Integration steps

1: Unzip the downloaded Ikanalyzer_2012_ff_hf1.zip compressed package, copy the Ikanalyzer2012ff_u1.jar to Solr-4.10.4/example/solr-webapp/webapp/web-inf Under the/lib directory

2: Create the directory classes under the Solr-4.10.4/example/solr-webapp/webapp/web-inf directory, Then copy the IKAnalyzer.cfg.xml and Stopword.dic to the newly created classes directory.

3: Modify SOLR Core schema file, default is Solr-4.10.4/example/solr/collection1/conf/schema.xml, add the following configuration

<!--word breaker at index time--
<analyzer type= "index" ismaxwordlength= "false" class= "Org.wltea.analyzer.lucene.IKAnalyzer"/>
<!--the word breaker when querying--
<analyzer type= "Query" ismaxwordlength= "true" class= "Org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>

4: Starting SOLR,BIN/SOLR start

5: Enter SOLR Web interface HTTP://LOCALHOST:8983/SOLR, see the operation result is configured successfully

So far, SOLR has successfully integrated with the IK Analyzer Chinese word breaker.

However, if I want to customize some thesaurus, so that the IK word breaker can be recognized, then you need to customize the extension thesaurus.

Operation Steps:

1: Modify the IKAnalyzer.cfg.xml configuration file in the Solr-4.10.4/example/solr-webapp/webapp/web-inf/classes directory, add the following configuration

2: Create a new Ext.dic file, add the following (note: Ext.dic encoding must be Encodein UTF-8 without BOM, otherwise the customized thesaurus will not be recognized)

Superman College

3: Restart SOLR

4: In the SOLR Web interface do the following, see the results of the operation of the diagram is a successful configuration.

For more information, please visit: http://bbs.superwu.cn, who is concerned about Superman Academy: Bj-crxy

Solr adds a Chinese IK word breaker and configures a custom thesaurus

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More