Full-text lucene search application example and code analysis

Source: Internet
Author: User

Lucene is a sub-project of the jakarta Project Team of the apache Software Foundation. It is an open-source full-text search engine toolkit and architecture that provides a complete query engine and index engine, it implements some common word segmentation algorithms and reserves many lexical analyzer interfaces. This article takes the code for full-text retrieval using Lucene in the myrss.easyjf.com website system as an example to briefly demonstrate the application of Lucene in actual projects.
To use Lucene for full-text search, follow these three steps:
1. Create an index database: Create a Lucene index file based on the existing data in the website news database.
2. search through the index database: with the index, you can use the standard lexical analyzer or direct lexical analyzer for full-text search.
3. Maintain the index database: The information in the website news information database will be constantly changed, including adding, modifying, and deleting the information, which must be further reflected in the Lucene index file.
The following is the code for myrss.easyjf.com!
 
I. index management (creation and maintenance)
The index management class MyRssIndexManage is mainly used to create indexes and maintain indexes based on the data in the website information library. Because the indexing process takes some time, the index management class implements the Runnable interface, so that we can run it in a new thread in the program.
Package com. easyjf. lucene;
Import java. util. Date;
Import java. util. List;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. Field;
Import org. apache. lucene. index. IndexReader;
Import org. apache. lucene. index. IndexWriter;
Import org. apache. lucene. queryParser. MultiFieldQueryParser;
Import org. apache. lucene. queryParser. QueryParser;
Import org. apache. lucene. search. Hits;
Import org. apache. lucene. search. IndexSearcher;
Import org. apache. lucene. search. Query;
Import org. apache. lucene. search. Searcher;
Import com. easyjf. dbo. EasyJDB;
Import com. easyjf. news. business. NewsDir;
Import com. easyjf. news. business. NewsDoc;
Import com. easyjf. news. business. NewsUtil;
Import com. easyjf. web. tools. IPageList;
Public class MyRssIndexManage implements Runnable {
Private String indexDir;
Private String indexType = "add ";
Public void run (){
// TODO Auto-generated method stub
If ("add". equals (indexType ))
NormalIndex ();
Else if ("init". equals (indexType) reIndexAll ();
}
Public void normalIndex ()
{
Try {
Date start = new Date ();
Int num = 0;
IndexWriter writer = new IndexWriter (indexDir, new StandardAnalyzer (), false );
// NewsDir dir = NewsDir. readBySn ();
String scope = "(needIndex <2) or (needIndex is null )";
IPageList pList = NewsUtil. pageList (scope, 1, 50 );
For (int p = 0; p {
PList = NewsUtil. pageList (scope, p, 100 );
List list = pList. getResult ();
For (int I = 0; I {
NewsDoc doc = (NewsDoc) list. get (I );
Writer. addDocument (newsdoc2lucenedoc (doc ));
Num ++;
}
}
Writer. optimize ();
Writer. close ();
Easyjdb.getinstance(cmd.exe cute ("update NewsDoc set needIndex = 2 where" + scope );
Date end = new Date ();
System. out. print ("New Index" + num + ", total:" + (end. getTime ()-start. getTime ()/60000 + "minutes! ");
}
Catch (Exception e)
{
E. printStackTrace ();
}
}
Public void reIndexAll ()
{
Try {
Date start = new Date ();
Int num = 0;
IndexWriter writer = new IndexWriter (indexDir, new StandardAnalyzer (), true );
NewsDir dir = NewsDir. readBySn ("easyjf ");
IPageList pList = NewsUtil. pageList (dir, 1, 50 );
For (int p = 0; p {
PList = NewsUtil. pageList (dir, p, 100 );
List list = pList. getResult ();
For (int I = 0; I {
NewsDoc doc = (NewsDoc) list. get (I );
Writer. addDocument (newsdoc2lucenedoc (doc ));
Num ++;
}
}
Writer. optimize ();
Writer. close ();
Easyjdb.getinstance(cmd.exe cute ("update NewsDoc set needIndex = 2 where dirPath like easyjf % ");
Date end = new Date ();
System. out. print ("all re-indexed, total processed" + num + "information, flowers:" + (end. getTime ()-start. getTime ()/60000 + "minutes! ");
}
Catch (Exception e)
{
E. printStackTrace ();
}
}
Private Document newsdoc2lucenedoc (NewsDoc doc)
{
Document lDoc = new Document ();
LDoc. add (new Field ("title", doc. getTitle (), Field. Store. YES, Field. Index. TOKENIZED ));
LDoc. add (new Field ("content", doc. getContent (), Field. Store. YES, Field. Index. TOKENIZED ));
LDoc. add (new Field ("url", doc. getRemark (), Field. Store. YES, Field. Index. NO ));
LDoc. add (new Field ("cid", doc. getCid (), Field. Store. YES, Field. Index. NO ));
LDoc. add (new Field ("source", doc. getSource (), Field. Store. YES, Field. Index. NO ));
LDoc. add (new Field ("inputTime", doc. getInputTime (). toString (), Field. Store. YES, Field. Index. NO ));
Return lDoc;
}
Public String getIndexDir (){
Return indexDir;
}
Public void setIndexDir (String indexDir ){
This. indexDir = indexDir;
}
 
Public String getIndexType (){
Return indexType;
}
Public void setIndexType (String indexType ){
This. indexType = indexType;
}
}
 
Ii. Use Lucene for full-text search
The following is the source code of the MyRssSearch class. This class mainly uses Searcher and QueryParser in Lucene to search for keywords from the index library.
Package com. easyjf. lucene;

Import java. util. List;
Import org. apache. lucene. analysis. standard. StandardAnalyzer;
Import org.apache.e.doc ument. Document;
Import org. apache. lucene. index. IndexReader;
Import org. apache. lucene. queryParser. MultiFieldQueryParser;
Import org. apache. lucene. queryParser. QueryParser;
Import org. apache. lucene. search. Hits;
Import org. apache. lucene. search. IndexSearcher;
Import org. apache. lucene. search. Query;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.