Full-text Lucene search application example and code analysis

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Lucene is a sub-project of the Jakarta Project Team of the Apache Software Foundation. It is an open-source full-text search engine toolkit and architecture that provides a complete query engine and index engine, it implements some common word segmentation algorithms and reserves many lexical analyzer interfaces. This article takes the code for full-text retrieval using Lucene in the myrss.easyjf.com website system as an example to briefly demonstrate the application of Lucene in actual projects.
To use Lucene for full-text search, follow these three steps:
1. Create an index database: Create a Lucene index file based on the existing data in the website news database.
2. search through the index database: with the index, you can use the standard lexical analyzer or direct lexical analyzer for full-text search.
3. Maintain the index database: The information in the website news information database will be constantly changed, including adding, modifying, and deleting the information, which must be further reflected in the Lucene index file. The following is the code for myrss.easyjf.com! I. index management (creation and maintenance)
The index management class myrssindexmanage is mainly used to create indexes and maintain indexes based on the data in the website information library. Because the indexing process takes some time, the index management class implements the runnable interface, so that we can run it in a new thread in the program.
Package com. easyjf. Lucene; import java. util. date;
Import java. util. List; import org. Apache. Lucene. analysis. Standard. standardanalyzer;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. field;
Import org. Apache. Lucene. Index. indexreader;
Import org. Apache. Lucene. Index. indexwriter;
Import org. Apache. Lucene. queryparser. multifieldqueryparser;
Import org. Apache. Lucene. queryparser. queryparser;
Import org. Apache. Lucene. Search. Hits;
Import org. Apache. Lucene. Search. indexsearcher;
Import org. Apache. Lucene. Search. query;
Import org. Apache. Lucene. Search. searcher; import com. easyjf. DBO. easyjdb;
Import com. easyjf. News. Business. newsdir;
Import com. easyjf. News. Business. newsdoc;
Import com. easyjf. News. Business. newsutil;
Import com. easyjf. Web. Tools. ipagelist;
Public class myrssindexmanage implements runnable {
Private string indexdir;
Private string indextype = "add ";
Public void run (){
// Todo auto-generated method stub
If ("add". Equals (indextype ))
Normalindex ();
Else if ("init". Equals (indextype) reindexall ();
}
Public void normalindex ()
{
Try {
Date start = new date ();
Int num = 0;
Indexwriter writer = new indexwriter (indexdir, new standardanalyzer (), false );
// Newsdir dir = newsdir. readbysn ();
String scope = "(needindex <2) or (needindex is null )";
Ipagelist plist = newsutil. pagelist (scope, 1, 50 );
For (INT p = 0; P <plist. getpages (); P ++)
{
Plist = newsutil. pagelist (scope, P, 100 );
List list = plist. getresult ();
For (INT I = 0; I <list. Size (); I ++)
{
Newsdoc Doc = (newsdoc) list. Get (I );
Writer. adddocument (newsdoc2lucenedoc (DOC ));
Num ++;
}
}
Writer. Optimize ();
Writer. Close ();
Easyjdb.getinstance(cmd.exe cute ("Update newsdoc set needindex = 2 where" + scope );
Date end = new date ();
System. out. print ("New Index" + num + ", total:" + (end. gettime ()-start. gettime ()/60000 + "minutes! ");
}
Catch (exception E)
{
E. printstacktrace ();
}
}
Public void reindexall ()
{
Try {
Date start = new date ();
Int num = 0;
Indexwriter writer = new indexwriter (indexdir, new standardanalyzer (), true );
Newsdir dir = newsdir. readbysn ("easyjf ");
Ipagelist plist = newsutil. pagelist (Dir, 1, 50 );
For (INT p = 0; P <plist. getpages (); P ++)
{
Plist = newsutil. pagelist (Dir, P, 100 );
List list = plist. getresult ();
For (INT I = 0; I <list. Size (); I ++)
{
Newsdoc Doc = (newsdoc) list. Get (I );
Writer. adddocument (newsdoc2lucenedoc (DOC ));
Num ++;
}
}
Writer. Optimize ();
Writer. Close ();
Easyjdb.getinstance(cmd.exe cute ("Update newsdoc set needindex = 2 where dirpath like 'easyjf % '");
Date end = new date ();
System. out. print ("all re-indexed, total processed" + num + "information, flowers:" + (end. gettime ()-start. gettime ()/60000 + "minutes! ");
}
Catch (exception E)
{
E. printstacktrace ();
}
}
Private document newsdoc2lucenedoc (newsdoc DOC)
{
Document ldoc = new document ();
Ldoc. Add (new field ("title", Doc. gettitle (), field. Store. Yes, field. Index. tokenized ));
Ldoc. Add (new field ("content", Doc. getcontent (), field. Store. Yes, field. Index. tokenized ));
Ldoc. Add (new field ("url", Doc. getremark (), field. Store. Yes, field. Index. No ));
Ldoc. Add (new field ("CID", Doc. getcid (), field. Store. Yes, field. Index. No ));
Ldoc. Add (new field ("Source", Doc. getsource (), field. Store. Yes, field. Index. No ));
Ldoc. Add (new field ("inputtime", Doc. getinputtime (). tostring (), field. Store. Yes, field. Index. No ));
Return ldoc;
}
Public String getindexdir (){
Return indexdir;
}
Public void setindexdir (string indexdir ){
This. indexdir = indexdir;
}

Public String getindextype (){
Return indextype;
}
Public void setindextype (string indextype ){
This. indextype = indextype;
}
} Ii. Use Lucene for full-text search
The following is the source code of the myrsssearch class. This class mainly uses searcher and queryparser in Lucene to search for keywords from the index library.
Package com. easyjf. Lucene; import java. util. List;
Import org. Apache. Lucene. analysis. Standard. standardanalyzer;
Import org.apache.e.doc ument. Document;
Import org. Apache. Lucene. Index. indexreader;
Import org. Apache. Lucene. queryparser. multifieldqueryparser;
Import org. Apache. Lucene. queryparser. queryparser;
Import org. Apache. Lucene. Search. Hits;
Import org. Apache. Lucene. Search. indexsearcher;
Import org. Apache. Lucene. Search. query;
Import org. Apache. Lucene. Search. searcher; import com. easyjf. Search. myrssutil;
Import com. easyjf. Search. searchcontent;
Import com. easyjf. Web. Tools. ipagelist;
Import com. easyjf. Web. Tools. pagelist; public class myrsssearch {
Private string indexdir;
Indexreader IR;
Searcher search;
Public ipagelist search (string key, int pagesize, int currentpage)
{
Ipagelist plist = new pagelist (New hitsquery (dosearch (key )));
Plist. dolist (pagesize, currentpage, "", "", null );
If (plist! = NULL)
{
List list = plist. getresult ();
If (list! = NULL ){
For (INT I = 0; I <list. Size (); I ++)
{
List. Set (I, lucene2searchobj (document) list. Get (I), key ));
}
}
}
Try {
If (search! = NULL) search. Close ();
If (IR! = NULL) IR. Close ();
}
Catch (exception E)
{
E. printstacktrace ();
}
Return plist;
}
Private searchcontent using e2searchobj (document DOC, string key)
{
Searchcontent searchobj = new searchcontent ();
String title = Doc. getfield ("title"). stringvalue ();
Searchobj. settitle (title. replaceall (key, "<font color = Red>" + key + "</font> "));
Searchobj. settvalue (Doc. getfield ("CID"). stringvalue ());
Searchobj. seturl (Doc. getfield ("url"). stringvalue ());
Searchobj. setsource (Doc. getfield ("Source"). stringvalue ());
Searchobj. setlastupdated (Doc. getfield ("inputtime"). stringvalue ());
Searchobj. setintro (myrssutil. content2intro (Doc. getfield ("content"). stringvalue (), key ));
Return searchobj;
}
Public hits dosearch (string key)
{
Hits hits = NULL;
Try {
IR = indexreader. Open (indexdir );
Search = new indexsearcher (IR );
String fields [] = {"title", "content "};
Queryparser parser = new multifieldqueryparser (fields, new standardanalyzer ());
Query query = parser. parse (key );
Hits = search. Search (query );
}
Catch (exception E)
{
E. printstacktrace ();
}
// System. Out. println ("Search Result:" + hits. Length ());
Return hits;
}

Public String getindexdir (){
Return indexdir;
}
Public void setindexdir (string indexdir ){
This. indexdir = indexdir;
}
} In the code above, the search method returns an ipagelist that encapsulates the paging query results. ipagelist is the paging engine in the easyjweb tools business engine. For ipagelist usage, please refer to my article "design and implementation of service engine paging in easyjweb Tools": we have written an hitsquery for Lucene's query result hits structure. The Code is as follows:
Package com. easyjf. Lucene; import java. util. arraylist;
Import java. util. collection;
Import java. util. List; import org. Apache. Lucene. Search. Hits; import com. easyjf. Web. Tools. IQUERY; public class hitsquery implements IQUERY {
Private int begin = 0;
Private int max = 0;
Private hits;
Public hitsquery ()
{

}
Public hitsquery (hits)
{
If (hits! = NULL)
{
This. Hits = hits;
This. max = hits. Length ();
}
}
Public int getrows (string arg0 ){
// Todo auto-generated method stub
Return (hits = NULL? 0: hits. Length ());
} Public list getresult (string arg0 ){
// Todo auto-generated method stub
List list = new arraylist ();
For (INT I = begin; I <(begin + max) & (I {
Try {
List.add(hits.doc (I ));
}
Catch (exception E)
{
E. printstacktrace ();
}
}
Return list;
} Public void setfirstresult (INT begin ){
// Todo auto-generated method stub
This. Begin = begin;
} Public void setmaxresults (INT max ){
// Todo auto-generated method stub
This. max = max;
} Public void setparavalues (Collection arg0 ){
// Todo auto-generated method stub

} Public list getresult (string condition, int begin, int max ){
// Todo auto-generated method stub
If (begin> = 0) & (begin <max) This. Begin = begin;
If (! (Max> hits. Length () This. max = max;
Return getresult (condition );
}
} Iii. Web call
The following shows how to call the full-text search function of the business logic layer on the web. The following is the source code of the search section in the request action:
Package com. easyjf. News. Action;
Public class searchaction implements iwebaction {public page dosearch (webform form, module) throws exception
{
String key = commutil. null2string (Form. Get ("v "));
Key = urldecoder. Decode (urlencoder. encode (key, "iso8859_1"), "UTF-8 ");
Form. Set ("V", key );
Form. addresult ("V2", urlencoder. encode (key, "UTF-8 "));
If (key. getbytes (). length> 2 ){
String orderby = commutil. null2string (Form. Get ("order "));
Int currentpage = commutil. null2int (Form. Get ("page "));
Int pagesize = commutil. null2int (Form. Get ("pagesize "));
If (currentpage <1) currentpage = 1;
If (pagesize <1) pagesize = 15;
Searchengine search = new searchengine (Key, orderby, pagesize, currentpage );
Search. getincluesearch (). setindexdir (globals. app_base_dir + "/WEB-INF/Index ");
Search. dosearchbylucene ();
Ipagelist plist = search. getresult ();
If (plist! = NULL & plist. getrowcount ()> 0 ){
Form. addresult ("list", plist. getresult ());
Form. addresult ("pages", new INTEGER (plist. getpages ()));
Form. addresult ("rows", new INTEGER (plist. getrowcount ()));
Form. addresult ("page", new INTEGER (plist. getcurrentpage ()));
Form. addresult ("gotopagehtml", commutil. showpagehtml (plist. getcurrentpage (), plist. getpages ()));
}
Else
{
Form. addresult ("notfound", "true"); // No data is found
}
}
Else
Form. addresult ("errmsg", "the keyword you entered is too short! ");
Form. addresult ("hotsearch", searchengine. gethotsearch (20 ));
Return NULL;
} The source code about Lucene in the searchengine class called:
Public class searchengine {
Private myrsssearch elastic esearch = new myrsssearch ();
Public void dosearchbylucene ()
{
Searchkey keyobj = readcache ();
If (keyobj! = NULL ){
Result = maid. Search (Key, pagesize, currentpage );
If (updatestatus ){
Keyobj. setreadtimes (New INTEGER (keyobj. getreadtimes (). intvalue () + 1 ));
Keyobj. Update ();
}
}
Else // The keyword information is not in the cache, And the keyword search result is generated.
{
Keyobj = new searchkey ();
Keyobj. settitle (key );
Keyobj. setlastupdated (new date ());
Keyobj. setreadtimes (New INTEGER (1 ));
Keyobj. setstatus (New INTEGER (0 ));
Keyobj. setsequence (New INTEGER (1 ));
Keyobj. setvdate (new date ());
Keyobj. Save ();
Result = maid. Search (Key, pagesize, currentpage );;

}
}
} Iv. Program demo Effect
This is the running effect of myrss.easyjf.com, which provides Java information search on the official website of easyjf team.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Full-text Lucene search application example and code analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Full-text Lucene search application example and code analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support