Strategy for highlighting in Lucene or SOLR

Last Update:2015-07-29 Source: Internet

Author: User

Tags gettext solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One: Functional background

Recently to do a highlight of the search needs, has been done before, so there is no difficulty, but the original use is Lucene, now to be replaced SOLR, In the lucene4.x, scattered in the previous article also analyzed how to achieve highlighting in the search, there are three ways, the specific content, please refer to the previous 2 articles:
First: How to achieve highlighting in Lucene4.3
http://qindongliang.iteye.com/blog/1953409
Second: How to highlight the service side in Solr4.3
http://qindongliang.iteye.com/blog/2034270

Second: the project inquiry

Overall speaking, there are 2 main ways to achieve, the first is the front desk to display data using JS highlighting, the second is the service side highlighted back to the front desk

The process of the back-end highlighting:

Front-end highlighting process:

Three: Pros and cons analysis

Back-end highlighting:
Performance: In the case of large concurrency, there may be some impact on the performance of the server.
Reliability: High, in the browser disable JS script case, still can display normally
Front-end highlighting:
Performance: rendered by client, slightly higher relative performance
Reliability: Low, in the browser disable JS script case, highlighting failure

Four: Precautions

foreground highlighting, you need to put the sentence after the phrase, back to the foreground JS, easy to replace, about the sentence participle, you can use Lucene can also use SOLR, the way is as follows:
In Lucene:

Java code

/***
*
* @param Analyzer word breaker
* @param text sub-phrase
* @throws Exception
*/
Public Static void Analyzer (Analyzer analyzer,string text)throws exception{
Tokenstream ts = analyzer.tokenstream ("name", text);
Chartermattribute Term=ts.addattribute (chartermattribute. Class);
Ts.reset ();
while (Ts.incrementtoken ()) {
System.out.println (Term.tostring ());
}
Ts.end ();
Ts.close ();
}

/*** *  * @param analyzer Word breaker * @param text sub  -phrase * @throws Exception */public static void Analyzer (Analyzer analyze R,string text) throws exception{        tokenstream ts = analyzer.tokenstream ("name", text);        Chartermattribute Term=ts.addattribute (chartermattribute.class);        Ts.reset ();        while (Ts.incrementtoken ()) {            System.out.println (term.tostring ());        }        Ts.end ();        Ts.close ();}

In Solr, Mode 1:

Java code

/***
* Word segmentation based on field type and print word segmentation results
* @param text
*/
Public Static void showanalysistype (String text)throws exception{
String fieldtype="ik"; //Division of speech Type
//Invoke service
Fieldanalysisrequest request = new fieldanalysisrequest ("/analysis/field");
//set type
Request.addfieldtype (FieldType);
//Set sentences to be participle
Request.setfieldvalue (text);
//sc=private static httpsolrclient sc=new httpsolrclient ("Http://localhost:8983/solr/one");
//Get Results
Fieldanalysisresponse Response =request.process (SC);
//Get the corresponding analysis
Analysis as = Response.getfieldtypeanalysis (FieldType);
list<string> results = new arraylist<string> ();
//Use the Guava library to convert the Iteratro object to a list object
List<analysisphase> list=lists.newarraylist (As.getindexphases (). iterator ());
//Take one of the Fitler's participle results, because a fieldtype is most likely configured with multiple filter, each step through
The result of//filter is different, so here, you want to specify a filter to get the word segmentation result ,
//So the list.size-1 here to write, notice the value here, is not fixed
for (TokenInfo Token:list.get (List.size ()-1). Gettokens ()) {
//Get Word segmentation data Results
Results.add (Token.gettext ());
}
}

/*** * Word segmentation based on field type and print word result * @param text */public static void Showanalysistype (String text) throws Exception{string fieldtype= " IK ";//Division of Speech//Call service Fieldanalysisrequest request = new Fieldanalysisrequest ("/analysis/field ");// Set type Request.addfieldtype (FieldType);//Set the sentence to be participle request.setfieldvalue (text);//sc=private Static Httpsolrclient sc=    New Httpsolrclient ("Http://localhost:8983/solr/one");//Get Results Fieldanalysisresponse response =request.process (SC);    The corresponding analysis as = Response.getfieldtypeanalysis (FieldType) is obtained.    list<string> results = new arraylist<string> (); Using the Guava library, convert the Iteratro object to a List object list<analysisphase> list=lists.newarraylist (as.getindexphases (). Iterator (        )); Take the result of a fitler, because a fieldtype is very likely to configure a plurality of filter, each step through the//filter results are different, so here, to specify a filter to get the result of the word segmentation, and because of the related//So the scattered fairy here Write list.size-1, note that the value here is not a fixed for (TokenInfo Token:list.get (List.size ()-1). Gettokens ()) {//Get word breaker result Results.add (t     Oken.gettext ()); }     }

In Solr, Mode 2:

Java code

/***
* According to the field Word and print the word result
* @param text
*/
Public Static void showanalysis (String text)throws exception{
//Here is the field name
String fieldname="Cpyname";
//fixed wording
Fieldanalysisrequest request = new fieldanalysisrequest ("/analysis/field");
//Add field
Request.addfieldname (FieldName);
//Set sentences that require participle
Request.setfieldvalue (text);
//Request the SOLR service to get results
Fieldanalysisresponse Response =request.process (SC);
//package result, return, business processing that may be used for subsequent calls
list<string> results = new arraylist<string> ();
//Get results based on field name
Analysis As=response.getfieldnameanalysis (fieldName);
//Using Guava Toolkit, turn iterator to list
List<analysisphase> list=lists.newarraylist (As.getindexphases (). iterator ());
//Print word breaker results
for (TokenInfo Token:list.get (List.size ()-1). Gettokens ()) {
System.out.println (Token.gettext ());
}
}

/*** * According to the field and print the word result * @param text * * public static void Showanalysis (string text) throws exception{//Here is the field name String fieldname= "Cpyname";//fixed notation Fieldanalysisrequest request = new Fieldanalysisrequest ("/analysis/field"); Add Field request.addfieldname (FieldName); Set the sentence request.setfieldvalue (text) that requires participle;     Request the SOLR service to get Results fieldanalysisresponse response =request.process (SC);     Encapsulates the result, returned, and possibly for its subsequent invocation of business processing list<string> results = new arraylist<string> ();     Get results based on field name Analysis As=response.getfieldnameanalysis (fieldName);     Using the Guava Toolkit, go iterator to List list<analysisphase> list=lists.newarraylist (as.getindexphases (). iterator ());     Print the word segmentation result for (TokenInfo Token:list.get (List.size ()-1). Gettokens ()) {System.out.println (Token.gettext ()); }     }

Finally welcome everybody sweep the code to follow the public number: I am the Siege division (WOSHIGCS), we study together, the progress and the exchange! (Woshigcs)
The content of this public number is about the search and big data technology and the Internet and other aspects of the sharing of content, but also a warm technical interaction of small homes, what problems can be a message at any time, welcome to visit us!

Strategy for highlighting in Lucene or SOLR

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More