This article mainly describes the specific use of ikanalyzer (hereinafter referred to as 'ik ') in Lucene. The background and functions of Lucene and IK word divider will not be discussed here. I have to lament that the Lucene version has changed rapidly. Now we have reached 4.9.0. I believe that this process is inevitable for the development and growth of any technology. This document uses ipve4.0 and ikanalyzer uses 2012ff.
To download Lucene, go to the official website. ik is as follows:
Http://code.google.com/p/ik-analyzer/downloads/list
After the IK download is complete, copy it to the project. The directory structure is shown in:
The src directory contains three configuration files: Extended dictionary File Ext. DIC, stopwprd. DIC, and ikanalyzer. cfg. xml. The configuration file ikanalyzer. cfg. XML is the path of the extended dictionary file and the stopword dictionary file. The ikanalyzer. cfg. xml file is stored in the root directory of classpath by default. You can modify the source code to change the file location.
Using ik in a program is simple. You only need to create an ikanalyzer object, because ikanalyzer inherits from Lucene's analyzer.
Ik parameter-free constructor uses a fine-grained Splitting Algorithm by default,
Analyzer analyzer = new ikanalyzer (); // fine-grained Splitting Algorithm
You can also use the intelligent Splitting Algorithm to set parameters.
Analyzer analyzer = new ikanalyzer (true); // intelligent splitting
Demo example:
Import Java. io. ioexception; import Org. apache. lucene. analysis. analyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import org.apache.e.doc ument. stringfield; import org.apache.w.e.doc ument. textfield; import Org. apache. lucene. index. corruptindexexception; import Org. apache. lucene. index. directoryreader; import Org. apache. lucene. index. indexreader; import Org. APAC He. lucene. index. indexwriter; import Org. apache. lucene. index. indexwriterconfig; import Org. apache. lucene. index. indexwriterconfig. openmode; import Org. apache. lucene. queryparser. classic. parseexception; import Org. apache. lucene. queryparser. classic. queryparser; import Org. apache. lucene. search. indexsearcher; import Org. apache. lucene. search. query; import Org. apache. lucene. search. scoredoc; import Org. apache. lucen E. search. topdocs; import Org. apache. lucene. store. directory; import Org. apache. lucene. store. lockobtainfailedexception; import Org. apache. lucene. store. ramdirectory; import Org. apache. lucene. util. version; import Org. wltea. analyzer. lucene. ikanalyzer;/*** demo of using ikanalyzer for Lucene indexing and query ** 2012-3-2 ** The following is a combination of javase4.0 API syntax **/public class luceneindexandsearchdemo {/*** simulation: * Create an index for a single record and search for it * @ Param ARG S */public static void main (string [] ARGs) {// Lucene document domain name string fieldname = "text "; // search content string text = "ik analyzer is an open-source Chinese Word Segmentation toolkit that combines dictionary word segmentation and grammar word segmentation. It uses a new fine-grained Splitting Algorithm for forward iteration. "; // Instantiate ikanalyzer analyzer = new ikanalyzer (true); directory = NULL; indexwriter iwriter = NULL; indexreader ireader = NULL; indexsearcher isearcher = NULL; try {// create a memory index object directory = new ramdirectory (); // configure indexwriterconfigindexwriterconfig iwconfig = new indexwriterconfig (version. paie_40, analyzer); iwconfig. setopenmode (openmode. create_or_append); iwriter = new indexwriter (direc Others, iwconfig); // write the index document DOC = new document (); Doc. add (New stringfield ("ID", "10000", field. store. yes); Doc. add (New textfield (fieldname, text, field. store. yes); iwriter. adddocument (DOC); iwriter. close (); // The search process is **********************************// instantiate searcher ireader = directoryreader. open (directory); isearcher = new indexsearcher (ireader); string keyword = "Chinese Word Segmentation toolkit"; // use queryparser to query analyzer to construct a query object query Parser QP = new queryparser (version. required e_40, fieldname, analyzer); QP. setdefaultoperator (queryparser. and_operator); query = QP. parse (keyword); system. out. println ("query =" + query); // you can specify topdocs = isearcher for the five records with the highest similarity. search (query, 5); system. out. println ("Hit:" + topdocs. totalhits); // output result scoredoc [] scoredocs = topdocs. scoredocs; For (INT I = 0; I <topdocs. totalhits; I ++) {document targetd OC = isearcher.doc(scoredocs? I =.doc); system. out. println ("content:" + targetdoc. tostring () ;}} catch (corruptindexexception e) {e. printstacktrace ();} catch (lockobtainfailedexception e) {e. printstacktrace ();} catch (ioexception e) {e. printstacktrace ();} catch (parseexception e) {e. printstacktrace ();} finally {If (ireader! = NULL) {try {ireader. Close () ;}catch (ioexception e) {e. printstacktrace () ;}} if (directory! = NULL) {try {directory. Close ();} catch (ioexception e) {e. printstacktrace ();}}}}}
Look at the code. ik is really easy to use. The code for this example is in the org/wltea/Analyzer/sample/ik package. For more information about Lucene, see the following article:
Http://www.52jialy.com/article/showArticle? ArticleID = 402881e546d8b14b0146d8e638640008
Lucene uses ikanalyzer Chinese Word Segmentation notes