Full-text lucene search application example and code analysis

Last Update:2013-12-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Lucene is a sub-project of the jakarta Project Team of the apache Software Foundation. It is an open-source full-text search engine toolkit and architecture that provides a complete query engine and index engine, it implements some common word segmentation algorithms and reserves many lexical analyzer interfaces. This article takes the code for full-text retrieval using Lucene in the myrss.easyjf.com website system as an example to briefly demonstrate the application of Lucene in actual projects.
To use Lucene for full-text search, follow these three steps:
1. Create an index database: Create a Lucene index file based on the existing data in the website news database.
2. search through the index database: with the index, you can use the standard lexical analyzer or direct lexical analyzer for full-text search.
3. Maintain the index database: The information in the website news information database will be constantly changed, including adding, modifying, and deleting the information, which must be further reflected in the Lucene index file.
The following is the code for myrss.easyjf.com!

I. index management (creation and maintenance)
The index management class MyRssIndexManage is mainly used to create indexes and maintain indexes based on the data in the website information library. Because the indexing process takes some time, the index management class implements the Runnable interface, so that we can run it in a new thread in the program.
Package com. easyjf. lucene;
Import java. util. Date;
Import java. util. List;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. Field;
Import org. apache. lucene. index. IndexReader;
Import org. apache. lucene. index. IndexWriter;
Import org. apache. lucene. queryParser. MultiFieldQueryParser;
Import org. apache. lucene. queryParser. QueryParser;
Import org. apache. lucene. search. Hits;
Import org. apache. lucene. search. IndexSearcher;
Import org. apache. lucene. search. Query;
Import org. apache. lucene. search. Searcher;
Import com. easyjf. dbo. EasyJDB;
Import com. easyjf. news. business. NewsDir;
Import com. easyjf. news. business. NewsDoc;
Import com. easyjf. news. business. NewsUtil;
Import com. easyjf. web. tools. IPageList;
Public class MyRssIndexManage implements Runnable {
Private String indexDir;
Private String indexType = "add ";
Public void run (){
// TODO Auto-generated method stub
If ("add". equals (indexType ))
NormalIndex ();
Else if ("init". equals (indexType) reIndexAll ();
}
Public void normalIndex ()
{
Try {
Date start = new Date ();
Int num = 0;
IndexWriter writer = new IndexWriter (indexDir, new StandardAnalyzer (), false );
// NewsDir dir = NewsDir. readBySn ();
String scope = "(needIndex <2) or (needIndex is null )";
IPageList pList = NewsUtil. pageList (scope, 1, 50 );
For (int p = 0; p {
PList = NewsUtil. pageList (scope, p, 100 );
List list = pList. getResult ();
For (int I = 0; I {
NewsDoc doc = (NewsDoc) list. get (I );
Writer. addDocument (newsdoc2lucenedoc (doc ));
Num ++;
}
}
Writer. optimize ();
Writer. close ();
Easyjdb.getinstance(cmd.exe cute ("update NewsDoc set needIndex = 2 where" + scope );
Date end = new Date ();
System. out. print ("New Index" + num + ", total:" + (end. getTime ()-start. getTime ()/60000 + "minutes! ");
}
Catch (Exception e)
{
E. printStackTrace ();
}
}
Public void reIndexAll ()
{
Try {
Date start = new Date ();
Int num = 0;
IndexWriter writer = new IndexWriter (indexDir, new StandardAnalyzer (), true );
NewsDir dir = NewsDir. readBySn ("easyjf ");
IPageList pList = NewsUtil. pageList (dir, 1, 50 );
For (int p = 0; p {
PList = NewsUtil. pageList (dir, p, 100 );
List list = pList. getResult ();
For (int I = 0; I {
NewsDoc doc = (NewsDoc) list. get (I );
Writer. addDocument (newsdoc2lucenedoc (doc ));
Num ++;
}
}
Writer. optimize ();
Writer. close ();
Easyjdb.getinstance(cmd.exe cute ("update NewsDoc set needIndex = 2 where dirPath like easyjf % ");
Date end = new Date ();
System. out. print ("all re-indexed, total processed" + num + "information, flowers:" + (end. getTime ()-start. getTime ()/60000 + "minutes! ");
}
Catch (Exception e)
{
E. printStackTrace ();
}
}
Private Document newsdoc2lucenedoc (NewsDoc doc)
{
Document lDoc = new Document ();
LDoc. add (new Field ("title", doc. getTitle (), Field. Store. YES, Field. Index. TOKENIZED ));
LDoc. add (new Field ("content", doc. getContent (), Field. Store. YES, Field. Index. TOKENIZED ));
LDoc. add (new Field ("url", doc. getRemark (), Field. Store. YES, Field. Index. NO ));
LDoc. add (new Field ("cid", doc. getCid (), Field. Store. YES, Field. Index. NO ));
LDoc. add (new Field ("source", doc. getSource (), Field. Store. YES, Field. Index. NO ));
LDoc. add (new Field ("inputTime", doc. getInputTime (). toString (), Field. Store. YES, Field. Index. NO ));
Return lDoc;
}
Public String getIndexDir (){
Return indexDir;
}
Public void setIndexDir (String indexDir ){
This. indexDir = indexDir;
}

Public String getIndexType (){
Return indexType;
}
Public void setIndexType (String indexType ){
This. indexType = indexType;
}
}

Ii. Use Lucene for full-text search
The following is the source code of the MyRssSearch class. This class mainly uses Searcher and QueryParser in Lucene to search for keywords from the index library.
Package com. easyjf. lucene;

Import java. util. List;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org. apache. lucene. index. IndexReader;
Import org. apache. lucene. queryParser. MultiFieldQueryParser;
Import org. apache. lucene. queryParser. QueryParser;
Import org. apache. lucene. search. Hits;
Import org. apache. lucene. search. IndexSearcher;
Import org. apache. lucene. search. Query;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Full-text lucene search application example and code analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Full-text lucene search application example and code analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support