Simple Lucene Application

Source: Internet
Author: User

Recently, Lucene was used for a relatively simple intra-site search. Here we will share with you. Full-text retrieval can be divided into two types: database and generated files (Doc, HTML, TXT ......).

No matter which method, the implementation principle is the same. There are two major steps:

1. Convert the data source to a Lucene file and save it to the Set directory.

 

Private Static string filepath = "d :\\ rookie \ date \"; // file storage path
Private Static string indexpath = "D: \ rookie \ source"; // index storage path

Public static void main (string [] ARGs) throws exception {
/* Specify the location of the folder to be indexed. Here is the folder under drive D */
File filedir = new file (filepath );
/* Place the index file here */
File indexdir = new file (indexpath );

Analyzer luceneanalyzer = new standardanalyzer ();
Indexwriter = new indexwriter (indexdir, luceneanalyzer, true); // reminder: if the last parameter is false, an additional index is required if the index folder is not re-created (that is, false is used when the index is updated)
File [] textfiles = filedir. listfiles ();
Long starttime = new date (). gettime ();
// Add document to index
For (INT I = 0; I <textfiles. length; I ++ ){

// Supports HTML and TXT files
If (textfiles [I]. isfile () & textfiles [I]. getname (). endswith (". txt ")){
String temp = filereaderall (textfiles [I]. GetCanonicalPath (), "GBK ");
Document document = new document ();

Field fieldid = new field ("ID", "12345", field. Store. Yes, field. Index. un_tokenized); // we strongly recommend that you save an ID when adding a field.
Field fieldpath = new field ("path", textfiles [I]. getpath (), field. Store. Yes, field. Index. un_tokenized );
Field fieldbody = new field ("contents", temp, field. Store. Yes, field. Index. tokenized, field. termvector. with_positions_offsets );

Document. Add (fieldid );
Document. Add (fieldpath );
Document. Add (fieldbody );
Indexwriter. adddocument (document );
}


}
// The optimize () method is used to optimize the index.
Indexwriter. Optimize ();
Indexwriter. Close ();

// Test the index time
Long endtime = new date (). gettime ();
System. Out. println ("the index has been added to the document. The total cost is" + (endtime-starttime) + "milliseconds! The index path is: "+ filedir. getpath ());
}

/**
* Function: Read HTML, TXT...
* @ Author rookie_d
*/
Public static string filereaderall (string filename, string charset)
Throws ioexception {
Bufferedreader reader = new bufferedreader (New inputstreamreader (
New fileinputstream (filename), charset ));
String line = new string ();
String temp = new string ();

While (line = reader. Readline ())! = NULL ){
Temp + = line;
}
Reader. Close ();
Return temp;
}

2. Search from the Lucene File

 

/**
* Function: Query all objects whose names are to be searched from the index.
* @ Author rookie_d
*/
Public static list incluesearcher (){

String querystring = "good"; // string to be retrieved
String indexpath = "d :\\ rookie \ source"; // obtain the index storage path.
Hits hits = NULL;
Query query = NULL;
Indexsearcher searcher;
List list = new arraylist ();
Try {
Searcher = new indexsearcher (indexpath );
Analyzer analyzer = new standardanalyzer ();
Queryparser QP = new queryparser ("contents", analyzer );
System. Out. println (QP. getfield ());
Try {
Query = QP. parse (querystring );
System. Out. println (query );
} Catch (Org. Apache. Lucene. queryparser. parseexception e ){
E. printstacktrace ();
}
If (searcher! = NULL ){
Hits = searcher. Search (query );
System. Out. println (hits. Length ());
If (hits! = NULL & hits. Length ()> 0 ){
System. Out. println ("found in total:" + hits. Length () + "results! ");
For (INT I = 0; I Document document = hits.doc (I );
String Path = Document. Get ("path ");
File file = new file (PATH );
List. Add (file. getpath ());
}
} Else {
System. Out. println ("***** no result find *****");
}

}
} Catch (ioexception e ){
E. printstacktrace ();
}
Return list;
}

 

During the development process, I encountered a small problem of updating indexes. The following code is also transferred to cainiao, who thinks this code is useful.

Mport java. Io. ioexception;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. field;
Import org. Apache. Lucene. Index. indexwriter;
Import org. Apache. Lucene. Index. term;
Import org. Apache. Lucene. queryparser. queryparser;
Import org. Apache. Lucene. Search. Hits;
Import org. Apache. Lucene. Search. indexsearcher;
Import org. Apache. Lucene. Search. query;
Import org. Apache. Lucene. analysis. Standard. standardanalyzer;

Public class updatedocument {

Private Static string Path = "D:/Index ";


Public static void main (string [] ARGs ){
// Addindex ();
Updateindex ();
Search ("Li Si ");
Search ("Wang Wu ");
}

Public static void addindex (){
Try {
Indexwriter write = new indexwriter (path, new standardanalyzer (), true );

Document Doc = new document ();
Doc. Add (new field ("ID", "123456", field. Store. Yes, field. Index. un_tokenized ));
Doc. Add (new field ("username", "Zhang San", field. Store. Yes, field. Index. tokenized ));
Doc. Add (new field ("comefrom", "Beijing", field. Store. Yes, field. Index. tokenized ));

Write. adddocument (DOC );

Write. Close ();

} Catch (ioexception e ){
E. printstacktrace ();
}
}


Public static void updateindex (){
Try {

Indexwriter write = new indexwriter (path, new standardanalyzer (), false );
Document docnew = new document ();
Docnew. Add (new field ("ID", "123456", field. Store. Yes, field. Index. un_tokenized ));
Docnew. Add (new field ("username", "Wang Wu", field. Store. Yes, field. Index. tokenized ));
Term term = new term ("ID", "123456 ");
/**
Call the updatedocument method and pass it a new doc to update the data,
Term term = new term ("ID", "1234567 ");
First, search for the doc with ID 1234567 in the index file, and update it if there is one (if there are multiple, there is only one after the last update ). If not, Add.
When the database is updated, we can only update a column, while Lucene can only update a row of data.
*/
Write. updatedocument (term, docnew );

Write. Close (); // note that write must be disabled here.

} Catch (ioexception e ){
E. printstacktrace ();
}
}

Public static query queryparser (string Str ){
Queryparser = new queryparser ("username", new standardanalyzer ());
Try {
Query query = queryparser. parse (STR );
Return query;
} Catch (exception e ){
E. printstacktrace ();
}
Return NULL;
}

Public static void search (string Str ){
Try {
Indexsearcher search = new indexsearcher (PATH );

Query query = queryparser (STR );

Hits hits = search. Search (query );
If (hits = NULL ){
Return;
}
If (hits. Length () = 0 ){
System. Out. println ("not found '" + STR + "'");
Return;
}
For (INT I = 0; I Document Doc = hits.doc (I );
System. Out. println ("ID =" + hits. ID (I ));
System. Out. println ("own id =" + Doc. Get ("ID "));
System. Out. println ("username =" + Doc. Get ("username "));
System. Out. println ("come from =" + Doc. Get ("comefrom "));
System. Out. println ("");
}

} Catch (exception e ){
E. printstacktrace ();
}
}

}

The following code deletes the index:

// Delete the corresponding index in Lucene
File indexdir = new file (indexpath);/* place the index file here */
File [] textfiles = indexdir. listfiles ();
Analyzer luceneanalyzer = new standardanalyzer ();
Boolean create = false;
If (textfiles = NULL | textfiles. Length <= 0 ){
Create = true;
}
Indexwriter = new indexwriter (indexdir, luceneanalyzer, create );
Term term = new term ("ID", news. GETID ());
Indexwriter. deletedocuments (TERM );
Indexwriter. Optimize (); // The optimize () method is used to optimize the index.
Indexwriter. Close (); // close

When deleting and updating indexes, pay attention to new indexwriter (indexdir, luceneanalyzer, false). The last parameter is false.

There are still many things to learn about full-text search. This article will help beginners and yourself get familiar with Lucene. I hope it will be helpful to you!

Simple Lucene Application

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.