Detailed description of Lucene highlight

Source: Internet
Author: User
Tags createindex

The Org. Apache. Lucene. Search. highlight package of Lucene provides a tool for highlighting search keywords. Use Baidu,
During Google search, when the search results are displayed, the entries with the same keywords are highlighted in the abstract, while Baidu and Google specify the red highlighted entries.

With the highlighted display tool provided by Lucene, you can easily implement the highlighted display function.

Highlighting is to find the retrieval result file corresponding to the keyword based on the search keyword entered by the user, extract the abstract text corresponding to the file, and then according to the highlighted format, write the format to the corresponding entry that is the same or similar to the keyword in the abstract text, and display it on the webpage, the text related to the keyword in the abstract is displayed in highlighted format.

In Lucene, The Org. Apache. Lucene. Search. Highlight. simplehtmlformatter class can construct a highlighted format. This is the simplest construction method, for example:

Simplehtmlformatter = new simplehtmlformatter ("<font color = 'red'>", "</font> ");

The constructor is declared as public simplehtmlformatter (string pretag, string posttag), because this highlighted format depends on web files, which are identified by tags in HTML files, there is a pretag and a posttag.

The highlighted format constructed above is the keyword displayed in the abstract, which is displayed in red to distinguish other texts.

Construct an org. Apache. Lucene. Search. Highlight. highlighter instance by constructing a highlighted object, and then
Based on the text content of the field obtained from the search result (this refers to the abstract text), locate the entry that is the same or similar to the search keyword, and add the highlighted format to the abstract text, returns a new
And formatted abstract text, which can be highlighted on the web page.

The following is a simple example to show the process of highlighted display.

The test class is as follows:

Package org. shirdrn. Lucene. Learn. Highlight;

Import java. Io. ioexception;
Import java. Io. stringreader;

Import net. teamhot. Lucene. thesaurusanalyzer;

Import org. Apache. Lucene. analysis. analyzer;
Import org. Apache. Lucene. analysis. tokenstream;
Import org.apache.e.doc ument. Document;
Import org.apache.e.doc ument. field;
Import org. Apache. Lucene. Index. corruptindexexception;
Import org. Apache. Lucene. Index. indexwriter;
Import org. Apache. Lucene. queryparser. parseexception;
Import org. Apache. Lucene. queryparser. queryparser;
Import org. Apache. Lucene. Search. Hits;
Import org. Apache. Lucene. Search. indexsearcher;
Import org. Apache. Lucene. Search. query;
Import org. Apache. Lucene. Search. Highlight. highlighter;
Import org. Apache. Lucene. Search. Highlight. queryscorer;
Import org. Apache. Lucene. Search. Highlight. simplefragmenter;
Import org. Apache. Lucene. Search. Highlight. simplehtmlformatter;

Public class myhighlighter {

Private string indexpath = "F: // Index ";
Private analyzer;
Private indexsearcher searcher;

Public myhighlighter (){
Analyzer = new thesaurusanalyzer ();
}

Public void createindex () throws ioexception {// index created by this method

Indexwriter writer = new indexwriter (indexpath, analyzer, true );
Document Doca = new document ();
String filetexta = "because Alibaba Cloud is always burning and disappearing from the sun to the horizon, then there is a peaceful and natural sounds. No one will feel sad in the lenses of such time, because the splendor gives people a quiet comfort. ";
Field fielda = new field ("contents", filetexta, field. Store. Yes, field. Index. tokenized );
Doca. Add (fielda );

Document docb = new document ();
String filetextb = "because the beautiful scenery at the cost of scars is always disturbing, and no one will be able to feel comfortable and comfortable immediately after the attack, whether it is a pain or a disaster, because blur makes people scream. ";
Field fieldb = new field ("contents", filetextb, field. Store. Yes, field. Index. tokenized );
Docb. Add (fieldb );

Document docc = new document ();
String filetextc = "I like traveling alone, burning in the connecting zone between my dream and the ocean. "+
"Because, a lonely fish like the color of the flame, it's really not logical. ";
Field fieldc = new field ("contents", filetextc, field. Store. Yes, field. Index. tokenized );
Docc. Add (fieldc );

Writer. adddocument (Doca );
Writer. adddocument (docb );
Writer. adddocument (docc );
Writer. Optimize ();
Writer. Close ();
}

Public void search (string fieldname, string keyword) throws corruptindexexception, ioexception, parseexception {// retrieval method, and highlighted

Searcher = new indexsearcher (indexpath );
Queryparser queryparse = new queryparser (fieldname, analyzer); // construct queryparser and parse the search keywords entered by the user

Query query = queryparse. parse (keyword );
Hits hits = searcher. Search (query );
For (INT I = 0; I Document Doc = hits.doc (I );
String text = Doc. Get (fieldname );
Simplehtmlformatter = new simplehtmlformatter ("<font color = 'red'>", "</font> ");
Highlighter = new highlighter (simplehtmlformatter, new queryscorer (query ));
Highlighter. settextfragmenter (New simplefragmenter (text. Length ()));
If (text! = NULL ){
Tokenstream = analyzer. tokenstream (fieldname, new stringreader (text ));
String highlighttext = highlighter. getbestfragment (tokenstream, text );
System. Out. println ("★Highlight the "+ (I + 1) +" search result as follows :");
System. Out. println (highlighttext );
}
}
Searcher. Close ();
}

Public static void main (string [] ARGs) {// test the main function

Myhighlighter mhl = new myhighlighter ();
Try {
Mhl. createindex ();
Mhl. Search ("contents", "because ");
} Catch (corruptindexexception e ){
E. printstacktrace ();
} Catch (ioexception e ){
E. printstacktrace ();
} Catch (parseexception e ){
E. printstacktrace ();
}
}

}

Program description:

1. createindex () method: Use the thesaurusanalyzer to create an index for the specified text. Each document has
The field whose name is contents. In practice, you can create another field named path to specify the path of the retrieved file (local path or network
)

2. search based on the created index database. First, you must parse the search keywords entered by the user. If queryparser is used, it must be the same as the Analyzer Used in the background. Otherwise, the parsed query (constructed by the entry) cannot be guaranteed) query retrieves a reasonable result set.

3. search based on the parsed query. The search result set is saved in hits. Traverse to extract the content of each document that meets the conditions. The program directly treats its content
Abstract content for highlighted display. In actual application, a process of extracting abstract (or retrieving the abstract content of the result set file corresponding to the retrieval keyword obtained by the database) should be involved. With the abstract, you can
Add a highlighted format for the abstract content.

4. If you extract the first n strings of the result set file as the abstract, you only need. settextfragmenter (New simplefragmenter (text. length (); specifies the number of characters in the abstract. All text is displayed as the abstract.

Run the program. The result is as follows:

The dictionary has not been initialized. initialize the dictionary.
The initialization dictionary ends. Time: 3906 milliseconds;
A total of 195574 words were added.
★Highlight 1st search results as follows:
<Font color = 'red'> because </font>
Yun is always burning at the moment when the sun is shining down the horizon, and then there is a peaceful and natural teana. No one will feel sad in the lenses of such time, <font color = 'red'> because </font>
Brilliant to give people a quiet comfort.
★Highlight 2nd search results as follows:
<Font color = 'red'> because </font>
The beautiful scenery at the price of scars is always disturbing, and no one will be able to feel comfortable and comfortable immediately after the attack, whether it is a pain or a disaster, <font color = 'red'> because </font>
The Blur is heartbreaking.
★Highlight 3rd search results as follows:
I like traveling alone, burning in the connecting zone between my dream and the ocean. <Font color = 'red'> because </font>
A lonely fish liked the color of the flame, which is really absurd.

The search result above is displayed in the HTML webpage, And the keyword "because" is highlighted in red.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.