Full-text Lucene search application example and code analysis
Source: Internet
Author: User
Lucene is a sub-project of the Jakarta Project Team of the Apache Software Foundation. It is an open-source full-text search engine toolkit and architecture that provides a complete query engine and index engine, it implements some common word segmentation algorithms and reserves many lexical analyzer interfaces. This article takes the code for full-text retrieval using Lucene in the myrss.easyjf.com website system as an example to briefly demonstrate the application of Lucene in actual projects.
To use Lucene for full-text search, follow these three steps:
1. Create an index database: Create a Lucene index file based on the existing data in the website news database.
2. search through the index database: with the index, you can use the standard lexical analyzer or direct lexical analyzer for full-text search.
3. Maintain the index database: The information in the website news information database will be constantly changed, including adding, modifying, and deleting the information, which must be further reflected in the Lucene index file. The following is the code for myrss.easyjf.com!
I. index management (creation and maintenance) The index management class myrssindexmanage is mainly used to create indexes and maintain indexes based on the data in the website information library. Because the indexing process takes some time, the index management class implements the runnable interface, so that we can run it in a new thread in the program.
Package com. easyjf. Lucene; import java. util. date;
Import java. util. List; import org. Apache. Lucene. analysis. Standard. standardanalyzer;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. field;
Import org. Apache. Lucene. Index. indexreader;
Import org. Apache. Lucene. Index. indexwriter;
Import org. Apache. Lucene. queryparser. multifieldqueryparser;
Import org. Apache. Lucene. queryparser. queryparser;
Import org. Apache. Lucene. Search. Hits;
Import org. Apache. Lucene. Search. indexsearcher;
Import org. Apache. Lucene. Search. query;
Import org. Apache. Lucene. Search. searcher; import com. easyjf. DBO. easyjdb;
Import com. easyjf. News. Business. newsdir;
Import com. easyjf. News. Business. newsdoc;
Import com. easyjf. News. Business. newsutil;
Import com. easyjf. Web. Tools. ipagelist;
Public class myrssindexmanage implements runnable {
Private string indexdir;
Private string indextype = "add ";
Public void run (){
// Todo auto-generated method stub
If ("add". Equals (indextype ))
Normalindex ();
Else if ("init". Equals (indextype) reindexall ();
}
Public void normalindex ()
{
Try {
Date start = new date ();
Int num = 0;
Indexwriter writer = new indexwriter (indexdir, new standardanalyzer (), false );
// Newsdir dir = newsdir. readbysn ();
String scope = "(needindex <2) or (needindex is null )";
Ipagelist plist = newsutil. pagelist (scope, 1, 50 );
For (INT p = 0; P <plist. getpages (); P ++)
{
Plist = newsutil. pagelist (scope, P, 100 );
List list = plist. getresult ();
For (INT I = 0; I <list. Size (); I ++)
{
Newsdoc Doc = (newsdoc) list. Get (I );
Writer. adddocument (newsdoc2lucenedoc (DOC ));
Num ++;
}
}
Writer. Optimize ();
Writer. Close ();
Easyjdb.getinstance(cmd.exe cute ("Update newsdoc set needindex = 2 where" + scope );
Date end = new date ();
System. out. print ("New Index" + num + ", total:" + (end. gettime ()-start. gettime ()/60000 + "minutes! ");
}
Catch (exception E)
{
E. printstacktrace ();
}
}
Public void reindexall ()
{
Try {
Date start = new date ();
Int num = 0;
Indexwriter writer = new indexwriter (indexdir, new standardanalyzer (), true );
Newsdir dir = newsdir. readbysn ("easyjf ");
Ipagelist plist = newsutil. pagelist (Dir, 1, 50 );
For (INT p = 0; P <plist. getpages (); P ++)
{
Plist = newsutil. pagelist (Dir, P, 100 );
List list = plist. getresult ();
For (INT I = 0; I <list. Size (); I ++)
{
Newsdoc Doc = (newsdoc) list. Get (I );
Writer. adddocument (newsdoc2lucenedoc (DOC ));
Num ++;
}
}
Writer. Optimize ();
Writer. Close ();
Easyjdb.getinstance(cmd.exe cute ("Update newsdoc set needindex = 2 where dirpath like 'easyjf % '");
Date end = new date ();
System. out. print ("all re-indexed, total processed" + num + "information, flowers:" + (end. gettime ()-start. gettime ()/60000 + "minutes! ");
}
Catch (exception E)
{
E. printstacktrace ();
}
}
Private document newsdoc2lucenedoc (newsdoc DOC)
{
Document ldoc = new document ();
Ldoc. Add (new field ("title", Doc. gettitle (), field. Store. Yes, field. Index. tokenized ));
Ldoc. Add (new field ("content", Doc. getcontent (), field. Store. Yes, field. Index. tokenized ));
Ldoc. Add (new field ("url", Doc. getremark (), field. Store. Yes, field. Index. No ));
Ldoc. Add (new field ("CID", Doc. getcid (), field. Store. Yes, field. Index. No ));
Ldoc. Add (new field ("Source", Doc. getsource (), field. Store. Yes, field. Index. No ));
Ldoc. Add (new field ("inputtime", Doc. getinputtime (). tostring (), field. Store. Yes, field. Index. No ));
Return ldoc;
}
Public String getindexdir (){
Return indexdir;
}
Public void setindexdir (string indexdir ){
This. indexdir = indexdir;
}
Public String getindextype (){
Return indextype;
}
Public void setindextype (string indextype ){
This. indextype = indextype;
}
}
Ii. Use Lucene for full-text search The following is the source code of the myrsssearch class. This class mainly uses searcher and queryparser in Lucene to search for keywords from the index library.
Package com. easyjf. Lucene; import java. util. List;
Import org. Apache. Lucene. analysis. Standard. standardanalyzer;
Import org.apache.e.doc ument. Document;
Import org. Apache. Lucene. Index. indexreader;
Import org. Apache. Lucene. queryparser. multifieldqueryparser;
Import org. Apache. Lucene. queryparser. queryparser;
Import org. Apache. Lucene. Search. Hits;
Import org. Apache. Lucene. Search. indexsearcher;
Import org. Apache. Lucene. Search. query;
Import org. Apache. Lucene. Search. searcher; import com. easyjf. Search. myrssutil;
Import com. easyjf. Search. searchcontent;
Import com. easyjf. Web. Tools. ipagelist;
Import com. easyjf. Web. Tools. pagelist; public class myrsssearch {
Private string indexdir;
Indexreader IR;
Searcher search;
Public ipagelist search (string key, int pagesize, int currentpage)
{
Ipagelist plist = new pagelist (New hitsquery (dosearch (key )));
Plist. dolist (pagesize, currentpage, "", "", null );
If (plist! = NULL)
{
List list = plist. getresult ();
If (list! = NULL ){
For (INT I = 0; I <list. Size (); I ++)
{
List. Set (I, lucene2searchobj (document) list. Get (I), key ));
}
}
}
Try {
If (search! = NULL) search. Close ();
If (IR! = NULL) IR. Close ();
}
Catch (exception E)
{
E. printstacktrace ();
}
Return plist;
}
Private searchcontent using e2searchobj (document DOC, string key)
{
Searchcontent searchobj = new searchcontent ();
String title = Doc. getfield ("title"). stringvalue ();
Searchobj. settitle (title. replaceall (key, "<font color = Red>" + key + "</font> "));
Searchobj. settvalue (Doc. getfield ("CID"). stringvalue ());
Searchobj. seturl (Doc. getfield ("url"). stringvalue ());
Searchobj. setsource (Doc. getfield ("Source"). stringvalue ());
Searchobj. setlastupdated (Doc. getfield ("inputtime"). stringvalue ());
Searchobj. setintro (myrssutil. content2intro (Doc. getfield ("content"). stringvalue (), key ));
Return searchobj;
}
Public hits dosearch (string key)
{
Hits hits = NULL;
Try {
IR = indexreader. Open (indexdir );
Search = new indexsearcher (IR );
String fields [] = {"title", "content "};
Queryparser parser = new multifieldqueryparser (fields, new standardanalyzer ());
Query query = parser. parse (key );
Hits = search. Search (query );
}
Catch (exception E)
{
E. printstacktrace ();
}
// System. Out. println ("Search Result:" + hits. Length ());
Return hits;
}
Public String getindexdir (){
Return indexdir;
}
Public void setindexdir (string indexdir ){
This. indexdir = indexdir;
}
} In the code above, the search method returns an ipagelist that encapsulates the paging query results. ipagelist is the paging engine in the easyjweb tools business engine. For ipagelist usage, please refer to my article "design and implementation of service engine paging in easyjweb Tools": we have written an hitsquery for Lucene's query result hits structure. The Code is as follows:
Package com. easyjf. Lucene; import java. util. arraylist;
Import java. util. collection;
Import java. util. List; import org. Apache. Lucene. Search. Hits; import com. easyjf. Web. Tools. IQUERY; public class hitsquery implements IQUERY {
Private int begin = 0;
Private int max = 0;
Private hits;
Public hitsquery ()
{
}
Public hitsquery (hits)
{
If (hits! = NULL)
{
This. Hits = hits;
This. max = hits. Length ();
}
}
Public int getrows (string arg0 ){
// Todo auto-generated method stub
Return (hits = NULL? 0: hits. Length ());
} Public list getresult (string arg0 ){
// Todo auto-generated method stub
List list = new arraylist ();
For (INT I = begin; I <(begin + max) & (I {
Try {
List.add(hits.doc (I ));
}
Catch (exception E)
{
E. printstacktrace ();
}
}
Return list;
} Public void setfirstresult (INT begin ){
// Todo auto-generated method stub
This. Begin = begin;
} Public void setmaxresults (INT max ){
// Todo auto-generated method stub
This. max = max;
} Public void setparavalues (Collection arg0 ){
// Todo auto-generated method stub
} Public list getresult (string condition, int begin, int max ){
// Todo auto-generated method stub
If (begin> = 0) & (begin <max) This. Begin = begin;
If (! (Max> hits. Length () This. max = max;
Return getresult (condition );
}
}
Iii. Web call The following shows how to call the full-text search function of the business logic layer on the web. The following is the source code of the search section in the request action:
Package com. easyjf. News. Action;
Public class searchaction implements iwebaction {public page dosearch (webform form, module) throws exception
{
String key = commutil. null2string (Form. Get ("v "));
Key = urldecoder. Decode (urlencoder. encode (key, "iso8859_1"), "UTF-8 ");
Form. Set ("V", key );
Form. addresult ("V2", urlencoder. encode (key, "UTF-8 "));
If (key. getbytes (). length> 2 ){
String orderby = commutil. null2string (Form. Get ("order "));
Int currentpage = commutil. null2int (Form. Get ("page "));
Int pagesize = commutil. null2int (Form. Get ("pagesize "));
If (currentpage <1) currentpage = 1;
If (pagesize <1) pagesize = 15;
Searchengine search = new searchengine (Key, orderby, pagesize, currentpage );
Search. getincluesearch (). setindexdir (globals. app_base_dir + "/WEB-INF/Index ");
Search. dosearchbylucene ();
Ipagelist plist = search. getresult ();
If (plist! = NULL & plist. getrowcount ()> 0 ){
Form. addresult ("list", plist. getresult ());
Form. addresult ("pages", new INTEGER (plist. getpages ()));
Form. addresult ("rows", new INTEGER (plist. getrowcount ()));
Form. addresult ("page", new INTEGER (plist. getcurrentpage ()));
Form. addresult ("gotopagehtml", commutil. showpagehtml (plist. getcurrentpage (), plist. getpages ()));
}
Else
{
Form. addresult ("notfound", "true"); // No data is found
}
}
Else
Form. addresult ("errmsg", "the keyword you entered is too short! ");
Form. addresult ("hotsearch", searchengine. gethotsearch (20 ));
Return NULL;
} The source code about Lucene in the searchengine class called:
Public class searchengine {
Private myrsssearch elastic esearch = new myrsssearch ();
Public void dosearchbylucene ()
{
Searchkey keyobj = readcache ();
If (keyobj! = NULL ){
Result = maid. Search (Key, pagesize, currentpage );
If (updatestatus ){
Keyobj. setreadtimes (New INTEGER (keyobj. getreadtimes (). intvalue () + 1 ));
Keyobj. Update ();
}
}
Else // The keyword information is not in the cache, And the keyword search result is generated.
{
Keyobj = new searchkey ();
Keyobj. settitle (key );
Keyobj. setlastupdated (new date ());
Keyobj. setreadtimes (New INTEGER (1 ));
Keyobj. setstatus (New INTEGER (0 ));
Keyobj. setsequence (New INTEGER (1 ));
Keyobj. setvdate (new date ());
Keyobj. Save ();
Result = maid. Search (Key, pagesize, currentpage );;
}
}
}
Iv. Program demo Effect This is the running effect of myrss.easyjf.com, which provides Java information search on the official website of easyjf team.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.