Lucene uses ikanalyzer Chinese Word Segmentation notes

Last Update:2014-09-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly describes the specific use of ikanalyzer (hereinafter referred to as 'ik ') in Lucene. The background and functions of Lucene and IK word divider will not be discussed here. I have to lament that the Lucene version has changed rapidly. Now we have reached 4.9.0. I believe that this process is inevitable for the development and growth of any technology. This document uses ipve4.0 and ikanalyzer uses 2012ff.

To download Lucene, go to the official website. ik is as follows:

Http://code.google.com/p/ik-analyzer/downloads/list

After the IK download is complete, copy it to the project. The directory structure is shown in:

The src directory contains three configuration files: Extended dictionary File Ext. DIC, stopwprd. DIC, and ikanalyzer. cfg. xml. The configuration file ikanalyzer. cfg. XML is the path of the extended dictionary file and the stopword dictionary file. The ikanalyzer. cfg. xml file is stored in the root directory of classpath by default. You can modify the source code to change the file location.

Using ik in a program is simple. You only need to create an ikanalyzer object, because ikanalyzer inherits from Lucene's analyzer.

Ik parameter-free constructor uses a fine-grained Splitting Algorithm by default,

Analyzer analyzer = new ikanalyzer (); // fine-grained Splitting Algorithm

You can also use the intelligent Splitting Algorithm to set parameters.

Analyzer analyzer = new ikanalyzer (true); // intelligent splitting

Demo example:

Import Java. io. ioexception; import Org. apache. lucene. analysis. analyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import org.apache.e.doc ument. stringfield; import org.apache.w.e.doc ument. textfield; import Org. apache. lucene. index. corruptindexexception; import Org. apache. lucene. index. directoryreader; import Org. apache. lucene. index. indexreader; import Org. APAC He. lucene. index. indexwriter; import Org. apache. lucene. index. indexwriterconfig; import Org. apache. lucene. index. indexwriterconfig. openmode; import Org. apache. lucene. queryparser. classic. parseexception; import Org. apache. lucene. queryparser. classic. queryparser; import Org. apache. lucene. search. indexsearcher; import Org. apache. lucene. search. query; import Org. apache. lucene. search. scoredoc; import Org. apache. lucen E. search. topdocs; import Org. apache. lucene. store. directory; import Org. apache. lucene. store. lockobtainfailedexception; import Org. apache. lucene. store. ramdirectory; import Org. apache. lucene. util. version; import Org. wltea. analyzer. lucene. ikanalyzer;/*** demo of using ikanalyzer for Lucene indexing and query ** 2012-3-2 ** The following is a combination of javase4.0 API syntax **/public class luceneindexandsearchdemo {/*** simulation: * Create an index for a single record and search for it * @ Param ARG S */public static void main (string [] ARGs) {// Lucene document domain name string fieldname = "text "; // search content string text = "ik analyzer is an open-source Chinese Word Segmentation toolkit that combines dictionary word segmentation and grammar word segmentation. It uses a new fine-grained Splitting Algorithm for forward iteration. "; // Instantiate ikanalyzer analyzer = new ikanalyzer (true); directory = NULL; indexwriter iwriter = NULL; indexreader ireader = NULL; indexsearcher isearcher = NULL; try {// create a memory index object directory = new ramdirectory (); // configure indexwriterconfigindexwriterconfig iwconfig = new indexwriterconfig (version. paie_40, analyzer); iwconfig. setopenmode (openmode. create_or_append); iwriter = new indexwriter (direc Others, iwconfig); // write the index document DOC = new document (); Doc. add (New stringfield ("ID", "10000", field. store. yes); Doc. add (New textfield (fieldname, text, field. store. yes); iwriter. adddocument (DOC); iwriter. close (); // The search process is **********************************// instantiate searcher ireader = directoryreader. open (directory); isearcher = new indexsearcher (ireader); string keyword = "Chinese Word Segmentation toolkit"; // use queryparser to query analyzer to construct a query object query Parser QP = new queryparser (version. required e_40, fieldname, analyzer); QP. setdefaultoperator (queryparser. and_operator); query = QP. parse (keyword); system. out. println ("query =" + query); // you can specify topdocs = isearcher for the five records with the highest similarity. search (query, 5); system. out. println ("Hit:" + topdocs. totalhits); // output result scoredoc [] scoredocs = topdocs. scoredocs; For (INT I = 0; I <topdocs. totalhits; I ++) {document targetd OC = isearcher.doc(scoredocs? I =.doc); system. out. println ("content:" + targetdoc. tostring () ;}} catch (corruptindexexception e) {e. printstacktrace ();} catch (lockobtainfailedexception e) {e. printstacktrace ();} catch (ioexception e) {e. printstacktrace ();} catch (parseexception e) {e. printstacktrace ();} finally {If (ireader! = NULL) {try {ireader. Close () ;}catch (ioexception e) {e. printstacktrace () ;}} if (directory! = NULL) {try {directory. Close ();} catch (ioexception e) {e. printstacktrace ();}}}}}

Look at the code. ik is really easy to use. The code for this example is in the org/wltea/Analyzer/sample/ik package. For more information about Lucene, see the following article:

Http://www.52jialy.com/article/showArticle? ArticleID = 402881e546d8b14b0146d8e638640008

Lucene uses ikanalyzer Chinese Word Segmentation notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Lucene uses ikanalyzer Chinese Word Segmentation notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Lucene uses ikanalyzer Chinese Word Segmentation notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support