Lucene Chinese Word segmentation diagram

Source: Internet
Author: User

This document records the use of lucene+paoding diagrams:First, download Lucene (official website:http://archive.apache.org/dist/lucene/java/) This article is used: 2.9.4, download, unzip, Lucene requires the following list of basic jar files: Lucene-core-2.9.4.jar lucene core Jar lucene-analyzers-2.9.4.jar lucene participle jar Lucene-highlighter-2.9.4.jar lucene highlighting Jar second, because the Chinese word segmentation in Lucene can not achieve the functions we need, so we need to download the third party package (blister ding Niu) (official website: http://code.google.com/p/paoding/) The latest version is: Paoding-analysis-2.0.4-beta.zip Download the extracted, Lucene uses the ' blister ding ' required jar file as follows list: Paoding-analysis.jar lucene requires jar for Chinese participle commons-logging.jar log Files {padoding_home}/dic (paoding_home: representative of the extracted paoding directory)third, open Eclipse and create a Java project ( the project name and the path of the project cannot contain spaces ), in this case project Name:paoding1_1: Create a folder--lib (for storing all jars) in paoding Project, and copy the previously mentioned jar files to the Lib directory. and add all the jars under Lib to the project Classpath.    1_2: Copy {paoding_home}/dic directory to paoding project/SRC the entire project chart below: Four, create the Testfileindex.java class, the implementation function is: D:\data\*.txt all the files read into memory, and write to the index directory (d:\luceneindex)Testfileindex.java PackageCom.lixing.paoding.index;

ImportJava.io.BufferedReader;
ImportJava.io.File;
ImportJava.io.FileInputStream;
ImportJava.io.InputStreamReader;

ImportNet.paoding.analysis.analyzer.PaodingAnalyzer;

ImportOrg.apache.lucene.analysis.Analyzer;
ImportOrg.apache.lucene.document.Document;
ImportOrg.apache.lucene.document.Field;
ImportOrg.apache.lucene.index.IndexWriter;
ImportOrg.apache.lucene.store.Directory;
ImportOrg.apache.lucene.store.FSDirectory;

Public classTestfileindex {
Public Static voidMain (string[] args)throwsException {
String datadir="D:/data";
String indexdir="D:/luceneindex";

File[] Files=NewFile (DataDir). Listfiles ();
System.out.println (files.length);

Analyzer analyzer=NewPaodinganalyzer ();
Directory Dir=fsdirectory.open (NewFile (Indexdir));
IndexWriter writer=NewIndexWriter (dir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);

for(inti=0;i<files.length;i++) {
StringBuffer strbuffer=NewStringBuffer ();
String line= "";
FileInputStream is=NewFileInputStream (Files[i].getcanonicalpath ());
BufferedReader reader=NewBufferedReader (NewInputStreamReader (IS,"gb2312"));
Line=reader.readline ();
while(Line! =NULL){
Strbuffer.append (line);
Strbuffer.append ("\ n");
Line=reader.readline ();
}

Document doc=NewDocument ();
Doc.add (NewField ("FileName", Files[i].getname (), Field.Store.YES, Field.Index.ANALYZED));
Doc.add (NewField ("Contents", Strbuffer.tostring (), Field.Store.YES, Field.Index.ANALYZED));
Writer.adddocument (DOC);
Reader.close ();
Is.close ();
}

Writer.optimize ();
Writer.close ();
Dir.close ();
System.out.println ("OK");
}
}Create Testfilesearcher.java, the real function is to read the contents of the index:Testfilesearcerh.java PackageCom.lixing.paoding.index;

ImportJava.io.File;

ImportNet.paoding.analysis.analyzer.PaodingAnalyzer;

ImportOrg.apache.lucene.analysis.Analyzer;
ImportOrg.apache.lucene.document.Document;
ImportOrg.apache.lucene.queryParser.QueryParser;
ImportOrg.apache.lucene.search.IndexSearcher;
ImportOrg.apache.lucene.search.Query;
ImportOrg.apache.lucene.search.ScoreDoc;
ImportOrg.apache.lucene.search.TopDocs;
ImportOrg.apache.lucene.store.Directory;
ImportOrg.apache.lucene.store.FSDirectory;
ImportOrg.apache.lucene.util.Version;

Public classTestfilesearcher {
Public Static voidMain (string[] args)throwsException {
String Indexdir ="D:/luceneindex";
Analyzer Analyzer =NewPaodinganalyzer ();
Directory dir = Fsdirectory.open (NewFile (Indexdir));
Indexsearcher searcher =NewIndexsearcher (dir,true);
Queryparser parser =NewQueryparser (version.lucene_29,"Contents", analyzer);
Query query = Parser.parse ("Cry for Help");
//term term=new term ("FileName", "university");
//termquery query=new termquery (term);

Topdocs docs=searcher.search (query, 1000);
Scoredoc[] Hits=docs.scoredocs;
System.out.println (hits.length);
for(inti=0;iDocument Doc=searcher.doc (Hits[i].doc);
System.out.print (Doc.get ("FileName")+"--:\n");
System.out.println (Doc.get ("Contents")+"\ n");
}

Searcher.close ();
Dir.close ();
}
}

This article is from the "Li Xin Blog" blog, please be sure to keep this source http://kinglixing.blog.51cto.com/3421535/702663

(turn) Lucene Chinese participle plot

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.