Helloworld for Lucene full-text search

Source: Internet
Author: User

Helloworld for Lucene full-text search
1. Download javase4.4 and decompress it.
2. Create a Java project named hellolucene
3. Create a new Lib folder and copy the required jar files to Lib. The jar files required for this project are as follows:


[Figure]

Add these jar files to buildpath.
3. Create a new package com. njupt. ZHB and a new class: hellolucene. java. The Code is as follows:

[Java code]

Package COM. njupt. ZHB; import Java. io. bufferedreader; import Java. io. file; import Java. io. fileinputstream; import Java. io. filenotfoundexception; import Java. io. ioexception; import Java. io. inputstreamreader; import Org. apache. lucene. analysis. analyzer; import Org. apache. lucene. analysis. standard. standardanalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import Org. APAC He.e.doc ument. longfield; import org.apache.e.doc ument. stringfield; import org.apache.w.e.doc ument. textfield; import Org. apache. lucene. index. directoryreader; import Org. apache. lucene. index. indexreader; import Org. apache. lucene. index. indexwriter; import Org. apache. lucene. index. indexwriterconfig; import Org. apache. lucene. index. indexwriterconfig. openmode; import Org. apache. lucene. index. term; impor T Org. apache. lucene. queryparser. classic. parseexception; import Org. apache. lucene. queryparser. classic. queryparser; import Org. apache. lucene. search. indexsearcher; import Org. apache. lucene. search. query; import Org. apache. lucene. search. scoredoc; import Org. apache. lucene. search. topdocs; import Org. apache. lucene. store. directory; import Org. apache. lucene. store. fsdirectory; import Org. apache. lucene. util. version ;/ ** @ Author: zhenghaibo * Web: http://blog.csdn.net/nuptboyzhb * mail: zhb931706659@126.com * 2013-7-05 Nanjing, njupt, China */public class hellolucene {/*** index all text files under a directory. * string indexpath = "Index"; // path for storing the Index * string docspath = ""; // path for saving the document (to be indexed) */Public void index (string indexpath, string docspath) {try {// 1. create directorydirectory dir = fsdirectory. open (new file (indexpath )); // Save it on the hard disk // 2. create indexwriteranalyzer analyzer = new standardanalyzer (version. required e_44); indexwriterconfig IWC = new indexwriterconfig (version. paie_44, analyzer); IWC. setopenmode (openmode. create_or_append); // set the create or append mode: indexwriter writer = new indexwriter (Dir, IWC); Final file docdir = new file (docspath); indexdocs (writer, docdir); writer. close ();} catch (ioexception e) {// todo auto-generated catch blo Cke. printstacktrace () ;}} public void indexdocs (indexwriter writer, file) throws ioexception {If (file. canread () {If (file. isdirectory () {// if it is a folder, it traverses all the files in the folder string [] files = file. list (); // an IO error cocould occurif (files! = NULL) {for (INT I = 0; I <files. length; I ++) {indexdocs (writer, new file (file, files [I]) ;}} else {// if it is a file fileinputstream FCM; try {FD = new fileinputstream (File);} catch (filenotfoundexception fnfe) {return ;}try {// 3. create Document Object document DOC = new document (); // 4. add field // Add the path of the file as a field named "path" to the document ". use a // field that is indexed (I. e. searchable), but don't // tokenize // the field into separate words and don't index term // frequency // or positional information: // create fieldfield pathfield = new stringfield ("path", file. getpath (), field. store. yes); Doc. add (pathfield); // Add it to the document // create an index domain doc with the file name. add (New stringfield ("FILENAME", file. getname (), field. store. yes); // Add the Last modified Date of the file a field named // "modified ". // use a Longfield that is indexed (I. e. efficiently // filterable with // numericrangefilter ). this indexes to Milli-second // resolution, which // is often too fine. you cocould instead create a number // Based on // year/month/day/hour/minutes/Seconds, down the resolution // you require. // For example the long value 2011021714 wocould mean // February 17,201 1, 2-3 pm.doc. add (New Longfield ("modified", file. lastmodified (), field. store. yes); // Add the contents of the file to a field named "contents ". // specify a reader, // so that the text of the file is tokenized and indexed, // but not stored. // note that filereader expects the file to be in UTF-8 // encoding. // if that's not the case searching for special characters // will fail. // create an index field Doc Based on the file content. add (New textfield ("contents", new bufferedreader (New inputstreamreader (FCM, "UTF-8"); If (writer. getconfig (). getopenmode () = openmode. create) {// new index, so we just add the document (no old // document can be there): system. out. println ("adding" + file); writer. adddocument (DOC); // write the document to the index (as created)} else {// existing index (an old copy of this document may have // been indexed) so // we use updatedocument instead to replace the old one // matching the exact // path, if present: system. out. println ("Updating" + file); writer. updatedocument (new term ("path", file. getpath (), DOC); // writes data to the index in append mode} finally {FS. close () ;}}/ *** search * http://blog.csdn.net/nuptboyzhb */Public void searcher (string indexpath) {try {indexreader reader = directoryreader. open (fsdirectory. open (new file (indexpath); indexsearcher searcher = new indexsearcher (Reader); analyzer = new standardanalyzer (version. required e_44); string field = "contents"; // The search domain is: queryparser parser = new queryparser (version. paie_44, field, analyzer); query = parser. parse ("Nanjing"); // The topdocs TDS = searcher. search (query, 10); // search for the top 10 scoredoc [] SDS = TDs. scoredocs; For (scoredoc SD: SDS) {// traverse the document containing the keyword "Nanjing" in the content once document document?searcher.doc(sd.doc); system. out. println ("score:" + SD. score + "-- filename:" + document. get ("FILENAME") + "-- path:" + document. get ("path") + "-- Time" + document. get ("modified"); // print the path of the document in the search result} reader. close ();} catch (ioexception e) {// todo auto-generated catch blocke. printstacktrace ();} catch (parseexception e) {// todo auto-generated catch blocke. printstacktrace ();}}}

4. For the purpose of the experiment, I created a folder Lucene under drive D, which contains three files with the following content:
Lucene1.txt
 

Nanjing University Of Posts & Telec

Lucene2.txt

Nanjing University of Posts and Telecommunications, Nanjing Road, Haidian District, Beijing

Lucene3. txt

2014 Nanjing Youth Olympic Games

5. Create a JUnit test class to test the index and searcher functions. The Code is as follows:

[Java code]

package com.njupt.zhb;import org.junit.Test;/* *@author: ZhengHaibo   *web:     http://blog.csdn.net/nuptboyzhb *mail:    zhb931706659@126.com *2013-7-05  Nanjing,njupt,China */public class TestJunit {   @Test   public void TestIndex(){   HelloLucene hLucene=new HelloLucene();   hLucene.index("index", "D:\\lucene");   }   @Test   public void TestSearcher(){   HelloLucene hLucene=new HelloLucene();   hLucene.searcher("index");   }}

Run the testindex function in JUnit mode. The running result is as follows:
Updating D: \ Lucene \ javase1.txt
Updating D: \ Lucene \ javase2.txt
Updating D: \ Lucene \ lucene3.txt
Index created!
In the index directory of the project directory, the following index file is generated:


[Figure]

6. Search and test the testsearcher function. The running result is as follows:

score:0.53033006--filename:lucene3.txt--path:D:\lucene\lucene3.txt--time1376828819375score:0.48666292--filename:lucene2.txt--path:D:\lucene\lucene2.txt--time1376828783791

As you can see, here we just print out the score, and there is no ranking. The smaller the score, the more similar the description!
Source code download: http://download.csdn.net/detail/nuptboyzhb/5971331is not allowed for commercial purposes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.