Helloworld for Lucene full-text search
1. Download javase4.4 and decompress it.
2. Create a Java project named hellolucene
3. Create a new Lib folder and copy the required jar files to Lib. The jar files required for this project are as follows:
[Figure]
Add these jar files to buildpath.
3. Create a new package com. njupt. ZHB and a new class: hellolucene. java. The Code is as follows:
[Java code]
Package COM. njupt. ZHB; import Java. io. bufferedreader; import Java. io. file; import Java. io. fileinputstream; import Java. io. filenotfoundexception; import Java. io. ioexception; import Java. io. inputstreamreader; import Org. apache. lucene. analysis. analyzer; import Org. apache. lucene. analysis. standard. standardanalyzer; import org.apache.e.doc ument. document; import org.apache.e.doc ument. field; import Org. APAC He.e.doc ument. longfield; import org.apache.e.doc ument. stringfield; import org.apache.w.e.doc ument. textfield; import Org. apache. lucene. index. directoryreader; import Org. apache. lucene. index. indexreader; import Org. apache. lucene. index. indexwriter; import Org. apache. lucene. index. indexwriterconfig; import Org. apache. lucene. index. indexwriterconfig. openmode; import Org. apache. lucene. index. term; impor T Org. apache. lucene. queryparser. classic. parseexception; import Org. apache. lucene. queryparser. classic. queryparser; import Org. apache. lucene. search. indexsearcher; import Org. apache. lucene. search. query; import Org. apache. lucene. search. scoredoc; import Org. apache. lucene. search. topdocs; import Org. apache. lucene. store. directory; import Org. apache. lucene. store. fsdirectory; import Org. apache. lucene. util. version ;/ ** @ Author: zhenghaibo * Web: http://blog.csdn.net/nuptboyzhb * mail: zhb931706659@126.com * 2013-7-05 Nanjing, njupt, China */public class hellolucene {/*** index all text files under a directory. * string indexpath = "Index"; // path for storing the Index * string docspath = ""; // path for saving the document (to be indexed) */Public void index (string indexpath, string docspath) {try {// 1. create directorydirectory dir = fsdirectory. open (new file (indexpath )); // Save it on the hard disk // 2. create indexwriteranalyzer analyzer = new standardanalyzer (version. required e_44); indexwriterconfig IWC = new indexwriterconfig (version. paie_44, analyzer); IWC. setopenmode (openmode. create_or_append); // set the create or append mode: indexwriter writer = new indexwriter (Dir, IWC); Final file docdir = new file (docspath); indexdocs (writer, docdir); writer. close ();} catch (ioexception e) {// todo auto-generated catch blo Cke. printstacktrace () ;}} public void indexdocs (indexwriter writer, file) throws ioexception {If (file. canread () {If (file. isdirectory () {// if it is a folder, it traverses all the files in the folder string [] files = file. list (); // an IO error cocould occurif (files! = NULL) {for (INT I = 0; I <files. length; I ++) {indexdocs (writer, new file (file, files [I]) ;}} else {// if it is a file fileinputstream FCM; try {FD = new fileinputstream (File);} catch (filenotfoundexception fnfe) {return ;}try {// 3. create Document Object document DOC = new document (); // 4. add field // Add the path of the file as a field named "path" to the document ". use a // field that is indexed (I. e. searchable), but don't // tokenize // the field into separate words and don't index term // frequency // or positional information: // create fieldfield pathfield = new stringfield ("path", file. getpath (), field. store. yes); Doc. add (pathfield); // Add it to the document // create an index domain doc with the file name. add (New stringfield ("FILENAME", file. getname (), field. store. yes); // Add the Last modified Date of the file a field named // "modified ". // use a Longfield that is indexed (I. e. efficiently // filterable with // numericrangefilter ). this indexes to Milli-second // resolution, which // is often too fine. you cocould instead create a number // Based on // year/month/day/hour/minutes/Seconds, down the resolution // you require. // For example the long value 2011021714 wocould mean // February 17,201 1, 2-3 pm.doc. add (New Longfield ("modified", file. lastmodified (), field. store. yes); // Add the contents of the file to a field named "contents ". // specify a reader, // so that the text of the file is tokenized and indexed, // but not stored. // note that filereader expects the file to be in UTF-8 // encoding. // if that's not the case searching for special characters // will fail. // create an index field Doc Based on the file content. add (New textfield ("contents", new bufferedreader (New inputstreamreader (FCM, "UTF-8"); If (writer. getconfig (). getopenmode () = openmode. create) {// new index, so we just add the document (no old // document can be there): system. out. println ("adding" + file); writer. adddocument (DOC); // write the document to the index (as created)} else {// existing index (an old copy of this document may have // been indexed) so // we use updatedocument instead to replace the old one // matching the exact // path, if present: system. out. println ("Updating" + file); writer. updatedocument (new term ("path", file. getpath (), DOC); // writes data to the index in append mode} finally {FS. close () ;}}/ *** search * http://blog.csdn.net/nuptboyzhb */Public void searcher (string indexpath) {try {indexreader reader = directoryreader. open (fsdirectory. open (new file (indexpath); indexsearcher searcher = new indexsearcher (Reader); analyzer = new standardanalyzer (version. required e_44); string field = "contents"; // The search domain is: queryparser parser = new queryparser (version. paie_44, field, analyzer); query = parser. parse ("Nanjing"); // The topdocs TDS = searcher. search (query, 10); // search for the top 10 scoredoc [] SDS = TDs. scoredocs; For (scoredoc SD: SDS) {// traverse the document containing the keyword "Nanjing" in the content once document document?searcher.doc(sd.doc); system. out. println ("score:" + SD. score + "-- filename:" + document. get ("FILENAME") + "-- path:" + document. get ("path") + "-- Time" + document. get ("modified"); // print the path of the document in the search result} reader. close ();} catch (ioexception e) {// todo auto-generated catch blocke. printstacktrace ();} catch (parseexception e) {// todo auto-generated catch blocke. printstacktrace ();}}}
4. For the purpose of the experiment, I created a folder Lucene under drive D, which contains three files with the following content:
Lucene1.txt
Nanjing University Of Posts & Telec
Lucene2.txt
Nanjing University of Posts and Telecommunications, Nanjing Road, Haidian District, Beijing
Lucene3. txt
2014 Nanjing Youth Olympic Games
5. Create a JUnit test class to test the index and searcher functions. The Code is as follows:
[Java code]
package com.njupt.zhb;import org.junit.Test;/* *@author: ZhengHaibo *web: http://blog.csdn.net/nuptboyzhb *mail: zhb931706659@126.com *2013-7-05 Nanjing,njupt,China */public class TestJunit { @Test public void TestIndex(){ HelloLucene hLucene=new HelloLucene(); hLucene.index("index", "D:\\lucene"); } @Test public void TestSearcher(){ HelloLucene hLucene=new HelloLucene(); hLucene.searcher("index"); }}
Run the testindex function in JUnit mode. The running result is as follows:
Updating D: \ Lucene \ javase1.txt
Updating D: \ Lucene \ javase2.txt
Updating D: \ Lucene \ lucene3.txt
Index created!
In the index directory of the project directory, the following index file is generated:
[Figure]
6. Search and test the testsearcher function. The running result is as follows:
score:0.53033006--filename:lucene3.txt--path:D:\lucene\lucene3.txt--time1376828819375score:0.48666292--filename:lucene2.txt--path:D:\lucene\lucene2.txt--time1376828783791
As you can see, here we just print out the score, and there is no ranking. The smaller the score, the more similar the description!
Source code download: http://download.csdn.net/detail/nuptboyzhb/5971331is not allowed for commercial purposes