Lucene uses ikanalyzer Chinese Word Segmentation notes

Source: Internet
Author: User

This article mainly describes the specific use of ikanalyzer (hereinafter referred to as 'ik ') in Lucene. The background and functions of Lucene and IK word divider will not be discussed here. I have to lament that the Lucene version has changed rapidly. Now we have reached 4.9.0. I believe that this process is inevitable for the development and growth of any technology. This document uses ipve4.0 and ikanalyzer uses 2012ff.

To download Lucene, go to the official website. ik is as follows:

Http://code.google.com/p/ik-analyzer/downloads/list

After the IK download is complete, copy it to the project. The directory structure is shown in:

The src directory contains three configuration files: Extended dictionary File Ext. DIC, stopwprd. DIC, and ikanalyzer. cfg. xml. The configuration file ikanalyzer. cfg. XML is the path of the extended dictionary file and the stopword dictionary file. The ikanalyzer. cfg. xml file is stored in the root directory of classpath by default. You can modify the source code to change the file location.

Using ik in a program is simple. You only need to create an ikanalyzer object, because ikanalyzer inherits from Lucene's analyzer.

Ik parameter-free constructor uses a fine-grained Splitting Algorithm by default,

Analyzer analyzer = new ikanalyzer (); // fine-grained Splitting Algorithm

You can also use the intelligent Splitting Algorithm to set parameters.

Analyzer analyzer = new ikanalyzer (true); // intelligent splitting

 

Demo example:

Import Java. io. ioexception; import Org. apache. lucene. analysis. analyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import org.apache.e.doc ument. stringfield; import org.apache.w.e.doc ument. textfield; import Org. apache. lucene. index. corruptindexexception; import Org. apache. lucene. index. directoryreader; import Org. apache. lucene. index. indexreader; import Org. APAC He. lucene. index. indexwriter; import Org. apache. lucene. index. indexwriterconfig; import Org. apache. lucene. index. indexwriterconfig. openmode; import Org. apache. lucene. queryparser. classic. parseexception; import Org. apache. lucene. queryparser. classic. queryparser; import Org. apache. lucene. search. indexsearcher; import Org. apache. lucene. search. query; import Org. apache. lucene. search. scoredoc; import Org. apache. lucen E. search. topdocs; import Org. apache. lucene. store. directory; import Org. apache. lucene. store. lockobtainfailedexception; import Org. apache. lucene. store. ramdirectory; import Org. apache. lucene. util. version; import Org. wltea. analyzer. lucene. ikanalyzer;/*** demo of using ikanalyzer for Lucene indexing and query ** 2012-3-2 ** The following is a combination of javase4.0 API syntax **/public class luceneindexandsearchdemo {/*** simulation: * Create an index for a single record and search for it * @ Param ARG S */public static void main (string [] ARGs) {// Lucene document domain name string fieldname = "text "; // search content string text = "ik analyzer is an open-source Chinese Word Segmentation toolkit that combines dictionary word segmentation and grammar word segmentation. It uses a new fine-grained Splitting Algorithm for forward iteration. "; // Instantiate ikanalyzer analyzer = new ikanalyzer (true); directory = NULL; indexwriter iwriter = NULL; indexreader ireader = NULL; indexsearcher isearcher = NULL; try {// create a memory index object directory = new ramdirectory (); // configure indexwriterconfigindexwriterconfig iwconfig = new indexwriterconfig (version. paie_40, analyzer); iwconfig. setopenmode (openmode. create_or_append); iwriter = new indexwriter (direc Others, iwconfig); // write the index document DOC = new document (); Doc. add (New stringfield ("ID", "10000", field. store. yes); Doc. add (New textfield (fieldname, text, field. store. yes); iwriter. adddocument (DOC); iwriter. close (); // The search process is **********************************// instantiate searcher ireader = directoryreader. open (directory); isearcher = new indexsearcher (ireader); string keyword = "Chinese Word Segmentation toolkit"; // use queryparser to query analyzer to construct a query object query Parser QP = new queryparser (version. required e_40, fieldname, analyzer); QP. setdefaultoperator (queryparser. and_operator); query = QP. parse (keyword); system. out. println ("query =" + query); // you can specify topdocs = isearcher for the five records with the highest similarity. search (query, 5); system. out. println ("Hit:" + topdocs. totalhits); // output result scoredoc [] scoredocs = topdocs. scoredocs; For (INT I = 0; I <topdocs. totalhits; I ++) {document targetd OC = isearcher.doc(scoredocs? I =.doc); system. out. println ("content:" + targetdoc. tostring () ;}} catch (corruptindexexception e) {e. printstacktrace ();} catch (lockobtainfailedexception e) {e. printstacktrace ();} catch (ioexception e) {e. printstacktrace ();} catch (parseexception e) {e. printstacktrace ();} finally {If (ireader! = NULL) {try {ireader. Close () ;}catch (ioexception e) {e. printstacktrace () ;}} if (directory! = NULL) {try {directory. Close ();} catch (ioexception e) {e. printstacktrace ();}}}}}

Look at the code. ik is really easy to use. The code for this example is in the org/wltea/Analyzer/sample/ik package. For more information about Lucene, see the following article:

Http://www.52jialy.com/article/showArticle? ArticleID = 402881e546d8b14b0146d8e638640008

 

 


Lucene uses ikanalyzer Chinese Word Segmentation notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.