Lucene Full-text Search word breaker: Use the IK analyzer Chinese word breaker (to modify IK Analyzer source to support lucene5.5.x) __lucene

Source: Internet
Author: User

Note: Based on lucene5.5.x version One, simple introduction of IK Analyzer

IK Analyzer is linliangyi2007 's work, and then thanks for his blog address:http://linliangyi2007.iteye.com/

IK Analyzer supports two participle, one is the most fine-grained participle (recommended use, IK default to use the most fine-grained), there is a kind of intelligent participle (testing the intelligent word is not lucene with the word segmentation accurate, hehe). ii. IK Analyzer compatibility Problem Solving method

Ikanalyzer current version only support to lucene4.x, solr4.x, so we need to modify the Ikanalyzer source code, let it support the lucene5.5 version.

The compatible lucene5.x version of IK Analyzer, as modified by me, is provided here: http://download.csdn.net/detail/eguid_1/9576005

Note: Based on the lucene5.5.2 version, using the jdk1.7 environment, lucene6.x Please use the jdk1.8,lucene5.5.x API with a few minor changes to the previous version.


third, why use the Chinese analyzer

Then return to the title, why to use Chinese word breaker, the reason is Lucene's own analyzer StandardAnalyzer although support Chinese, but the segmentation is not fine, for some obvious Chinese words do not have participle.


Iv. How to use the Chinese analyzer

I took the analyzer analyzer alone and handled it independently (there are a lot of benefits to it that I can easily expand the use of the new word breaker)

The other source code is completely unchanged, only need to change the analyzerserv.

The IK word breaker has three related profiles by default:

Ext.dic (extended thesaurus);

IKAnalyzer.cfg.xml (extended Word library and stop Word Library configuration);

Stopword.dic (stop word)


(1) When indexing is used:

False-the most fine granularity participle; true-intelligent participle


Analyzer analyzer=new Ikanalyzer (false);

Indexwriterconfig = new Indexwriterconfig (analyzer);


(2) Use when searching:

False-the most fine granularity participle; true-intelligent participle


Analyzer analyzer=new Ikanalyzer (false);

QueryBuilder parser = new QueryBuilder (analyzer);


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.