Simple Lucene Application

Last Update:2014-09-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, Lucene was used for a relatively simple intra-site search. Here we will share with you. Full-text retrieval can be divided into two types: database and generated files (Doc, HTML, TXT ......).

No matter which method, the implementation principle is the same. There are two major steps:

1. Convert the data source to a Lucene file and save it to the Set directory.

Private Static string filepath = "d :\\ rookie \ date \"; // file storage path
Private Static string indexpath = "D: \ rookie \ source"; // index storage path

Public static void main (string [] ARGs) throws exception {
/* Specify the location of the folder to be indexed. Here is the folder under drive D */
File filedir = new file (filepath );
/* Place the index file here */
File indexdir = new file (indexpath );

Analyzer luceneanalyzer = new standardanalyzer ();
Indexwriter = new indexwriter (indexdir, luceneanalyzer, true); // reminder: if the last parameter is false, an additional index is required if the index folder is not re-created (that is, false is used when the index is updated)
File [] textfiles = filedir. listfiles ();
Long starttime = new date (). gettime ();
// Add document to index
For (INT I = 0; I <textfiles. length; I ++ ){

// Supports HTML and TXT files
If (textfiles [I]. isfile () & textfiles [I]. getname (). endswith (". txt ")){
String temp = filereaderall (textfiles [I]. GetCanonicalPath (), "GBK ");
Document document = new document ();

Field fieldid = new field ("ID", "12345", field. Store. Yes, field. Index. un_tokenized); // we strongly recommend that you save an ID when adding a field.
Field fieldpath = new field ("path", textfiles [I]. getpath (), field. Store. Yes, field. Index. un_tokenized );
Field fieldbody = new field ("contents", temp, field. Store. Yes, field. Index. tokenized, field. termvector. with_positions_offsets );

Document. Add (fieldid );
Document. Add (fieldpath );
Document. Add (fieldbody );
Indexwriter. adddocument (document );
}

}
// The optimize () method is used to optimize the index.
Indexwriter. Optimize ();
Indexwriter. Close ();

// Test the index time
Long endtime = new date (). gettime ();
System. Out. println ("the index has been added to the document. The total cost is" + (endtime-starttime) + "milliseconds! The index path is: "+ filedir. getpath ());
}

/**
* Function: Read HTML, TXT...
* @ Author rookie_d
*/
Public static string filereaderall (string filename, string charset)
Throws ioexception {
Bufferedreader reader = new bufferedreader (New inputstreamreader (
New fileinputstream (filename), charset ));
String line = new string ();
String temp = new string ();

While (line = reader. Readline ())! = NULL ){
Temp + = line;
}
Reader. Close ();
Return temp;
}

2. Search from the Lucene File

/**
* Function: Query all objects whose names are to be searched from the index.
* @ Author rookie_d
*/
Public static list incluesearcher (){

String querystring = "good"; // string to be retrieved
String indexpath = "d :\\ rookie \ source"; // obtain the index storage path.
Hits hits = NULL;
Query query = NULL;
Indexsearcher searcher;
List list = new arraylist ();
Try {
Searcher = new indexsearcher (indexpath );
Analyzer analyzer = new standardanalyzer ();
Queryparser QP = new queryparser ("contents", analyzer );
System. Out. println (QP. getfield ());
Try {
Query = QP. parse (querystring );
System. Out. println (query );
} Catch (Org. Apache. Lucene. queryparser. parseexception e ){
E. printstacktrace ();
}
If (searcher! = NULL ){
Hits = searcher. Search (query );
System. Out. println (hits. Length ());
If (hits! = NULL & hits. Length ()> 0 ){
System. Out. println ("found in total:" + hits. Length () + "results! ");
For (INT I = 0; I Document document = hits.doc (I );
String Path = Document. Get ("path ");
File file = new file (PATH );
List. Add (file. getpath ());
}
} Else {
System. Out. println ("***** no result find *****");
}

}
} Catch (ioexception e ){
E. printstacktrace ();
}
Return list;
}

During the development process, I encountered a small problem of updating indexes. The following code is also transferred to cainiao, who thinks this code is useful.

Mport java. Io. ioexception;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. field;
Import org. Apache. Lucene. Index. indexwriter;
Import org. Apache. Lucene. Index. term;
Import org. Apache. Lucene. queryparser. queryparser;
Import org. Apache. Lucene. Search. Hits;
Import org. Apache. Lucene. Search. indexsearcher;
Import org. Apache. Lucene. Search. query;
Import org. Apache. Lucene. analysis. Standard. standardanalyzer;

Public class updatedocument {

Private Static string Path = "D:/Index ";

Public static void main (string [] ARGs ){
// Addindex ();
Updateindex ();
Search ("Li Si ");
Search ("Wang Wu ");
}

Public static void addindex (){
Try {
Indexwriter write = new indexwriter (path, new standardanalyzer (), true );

Document Doc = new document ();
Doc. Add (new field ("ID", "123456", field. Store. Yes, field. Index. un_tokenized ));
Doc. Add (new field ("username", "Zhang San", field. Store. Yes, field. Index. tokenized ));
Doc. Add (new field ("comefrom", "Beijing", field. Store. Yes, field. Index. tokenized ));

Write. adddocument (DOC );

Write. Close ();

} Catch (ioexception e ){
E. printstacktrace ();
}
}

Public static void updateindex (){
Try {

Indexwriter write = new indexwriter (path, new standardanalyzer (), false );
Document docnew = new document ();
Docnew. Add (new field ("ID", "123456", field. Store. Yes, field. Index. un_tokenized ));
Docnew. Add (new field ("username", "Wang Wu", field. Store. Yes, field. Index. tokenized ));
Term term = new term ("ID", "123456 ");
/**
Call the updatedocument method and pass it a new doc to update the data,
Term term = new term ("ID", "1234567 ");
First, search for the doc with ID 1234567 in the index file, and update it if there is one (if there are multiple, there is only one after the last update ). If not, Add.
When the database is updated, we can only update a column, while Lucene can only update a row of data.
*/
Write. updatedocument (term, docnew );

Write. Close (); // note that write must be disabled here.

} Catch (ioexception e ){
E. printstacktrace ();
}
}

Public static query queryparser (string Str ){
Queryparser = new queryparser ("username", new standardanalyzer ());
Try {
Query query = queryparser. parse (STR );
Return query;
} Catch (exception e ){
E. printstacktrace ();
}
Return NULL;
}

Public static void search (string Str ){
Try {
Indexsearcher search = new indexsearcher (PATH );

Query query = queryparser (STR );

Hits hits = search. Search (query );
If (hits = NULL ){
Return;
}
If (hits. Length () = 0 ){
System. Out. println ("not found '" + STR + "'");
Return;
}
For (INT I = 0; I Document Doc = hits.doc (I );
System. Out. println ("ID =" + hits. ID (I ));
System. Out. println ("own id =" + Doc. Get ("ID "));
System. Out. println ("username =" + Doc. Get ("username "));
System. Out. println ("come from =" + Doc. Get ("comefrom "));
System. Out. println ("");
}

} Catch (exception e ){
E. printstacktrace ();
}
}

}

The following code deletes the index:

// Delete the corresponding index in Lucene
File indexdir = new file (indexpath);/* place the index file here */
File [] textfiles = indexdir. listfiles ();
Analyzer luceneanalyzer = new standardanalyzer ();
Boolean create = false;
If (textfiles = NULL | textfiles. Length <= 0 ){
Create = true;
}
Indexwriter = new indexwriter (indexdir, luceneanalyzer, create );
Term term = new term ("ID", news. GETID ());
Indexwriter. deletedocuments (TERM );
Indexwriter. Optimize (); // The optimize () method is used to optimize the index.
Indexwriter. Close (); // close

When deleting and updating indexes, pay attention to new indexwriter (indexdir, luceneanalyzer, false). The last parameter is false.

There are still many things to learn about full-text search. This article will help beginners and yourself get familiar with Lucene. I hope it will be helpful to you!

Simple Lucene Application

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Simple Lucene Application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Simple Lucene Application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support