LUCENE05---Highlighter

Source: Internet
Author: User
Tags commit min

In front of the word breaker, but we are in the search is not another effect is highlighted and a piece of text. So here we are to introduce highlighter.

Highlighter:

You can intercept a piece of text and have the keyword highlighted (by specifying a prefix and suffix, because it is displayed in a Web page, specifying <font color= ' Red ' ></font> will appear in red on the page).

FirstLucene03ByHighlighter.java:Java Code   package com.iflytek.lucene;       import java.io.file;       import org.apache.lucene.analysis.analyzer;    import org.apache.lucene.analysis.standard.standardanalyzer;    import org.apache.lucene.document.document;    import org.apache.lucene.index.indexreader;    import org.apache.lucene.index.indexwriter;    import org.apache.lucene.index.indexwriterconfig;    import org.apache.lucene.queryparser.multifieldqueryparser;    import org.apache.lucene.queryparser.queryparser;    import org.apache.lucene.search.filter;    import org.apache.lucene.search.indexsearcher;    import org.apache.lucene.search.query;    import org.apache.lucene.search.scoredoc;    import org.apache.lucene.search.topdocs;    import org.apache.lucene.search.highlight.formatter;    import org.apache.lucene.search.highlight.fragmenter;    import org.apache.lucene.search.highlight.highlighter;    import org.apache.lucene.search.highlight.queryscorer;    import org.apache.lucene.search.highlight.scorer;    import org.apache.lucene.search.highlight.simplefragmenter;    import org.apache.lucene.search.highlight.simplehtmlformatter;    import org.apache.lucene.store.directory;    import org.apache.lucene.store.fsdirectory;    import org.apache.lucene.store.ramdirectory;    import org.apache.lucene.util.version;      /**    *  @author  xudongwang 2012-2-10    *    & nbsp;*         email:xdwangiflytek@gmail.com    */    PUBLIC&NBSP;CLASS&NBSP firstlucene03byhighlighter {          /**         *  source file path        */       private  String filePath01 =  "f:\\workspaces\\workspacese\\blogdemo\\lucenedatasource\\ HelloLucene01.txt ";           /**        *  index path         */       private String indexPath =  " F:\\workspaces\\workspacese\\blogdemo\\luceneindex ";           /**        *  word breaker, here we use the default word breaker , the standard Analyzer (several, but the support for Chinese is not good)        */       private  Analyzer analyzer = new standardanalyzer (version.lucene_35);           private directory ramdir = null;           /**        *  search         *         *  @param  querystr         *             Search Keywords        *  @throws  exception        */       public void search (STRING&NBSP;QUERYSTR)  throws exception {               // 1, parsing the text to be searched into a query object             String[] fields = {  "name",  " Content " };            QueryParser queryParser = new  Multifieldqueryparser (Version.lucene_35, fields, analyzer);            query query = queryparser.parse ( QUERYSTR);               // 2, querying             indexreader indexreader = indexreader.open (RamDir);            indexsearcher indexsearcher = new  indexsearcher (Indexreader);            Filter filter = null;            TopDocs topDocs =  Indexsearcher.search (query, filter, 10000);            system.out.println ("A total of" " +  topdocs.totalhits +  "" matches results ");//  note that the match result here is the number of documents, not the number of search results included in the document               //  Preparing the Highlighter &NBsp;          Formatter formatter = new  Simplehtmlformatter ("<font color= ' Red ' >",  "</font>");            Scorer scorer = new  Queryscorer (query);            Highlighter highlighter = new  Highlighter (formatter, scorer);               fragmenter fragmenter =  new simplefragmenter (x);//  specify 10 characters             highlighter.settextfragmenter (Fragmenter);//  decide whether to generate a summary, and how long the summary                // 3, take out data, and print results             for  (Scoredoc scoredoc : topdocs.scoredocs)  {                int docSn = scoreDoc.doc;//  Document Internal numbering                 document document =  indexsearcher.doc (DOCSN);//  remove the corresponding document according to the document number                    //  Highlight processing                 //  returns the highlighted result, which returns null if no keyword appears in the current property value                 String highlighterStr =  Highlighter.getbestfragment (analyzer,  "Content",  document.get ("content"));                                 if  (highlighterstr == null)  {                    string content = document.get ("content");                     Int endindex = math.min (20, content.length ());                     Highlighterstr=content.substring (0, endindex);//Up to the first 20 characters                 }                 document.getfield ("Content"). SetValue (HIGHLIGHTERSTR);                 File2document.printdocumentinfo (document);//  print out documentation Information             }           }            /**   &Nbsp;    *  optimization creates an index that exists in memory and disk with the use of        *         *  @throws  exception        */       public void createindexbyyouhua ()  throws exception {            file indexfile = new file (IndexPath) ;            directory fsdir = fsdirectory.open ( Indexfile);               // 1, on startup, reads the index from the disk into memory             ramdir = new ramdirectory (FsDir);            indexwriterconfig ramconf = new  indexwriterconfig (Version.lucene_35, analyzer);               //  operating in-memory indexes            indexwriter when running programs  ramindexwriter = new indexwriter (ramdir, ramconf);            Document document =  File2document.file2document (FILEPATH01);            ramindexwriter.adddocument (document);            ramindexwriter.close ();               // 2, saving an in-memory index to disk on exit            IndexWriterConfig fsConf = new  Indexwriterconfig (Version.lucene_35, analyzer);            indexwriter fsindexwriter = new  indexwriter (fsdir, fsconf);            fsindexwriter.addIndexes (ramdir);//  Merge all index data from several other index libraries into the current index library             Fsindexwriter.commit ();            // fsindexwriter.optimize ();//optimize the index file, Thus reducing IO operation            fsindexwriter.forcemerge (1);            fsindexwriter.close ();        }           public static  Void main (String[] args)  throws exception {            firstlucene03byhighlighter lucene = new firstlucene03byhighlighter ();            lucene.createindexbyyouhua ();            lucene.search ("Iteye");        }      }  

Package Com.iflytek.lucene;import Java.io.file;import Org.apache.lucene.analysis.analyzer;import Org.apache.lucene.analysis.standard.standardanalyzer;import Org.apache.lucene.document.document;import Org.apache.lucene.index.indexreader;import Org.apache.lucene.index.indexwriter;import Org.apache.lucene.index.indexwriterconfig;import Org.apache.lucene.queryparser.multifieldqueryparser;import Org.apache.lucene.queryparser.queryparser;import Org.apache.lucene.search.filter;import Org.apache.lucene.search.indexsearcher;import Org.apache.lucene.search.query;import Org.apache.lucene.search.scoredoc;import Org.apache.lucene.search.topdocs;import Org.apache.lucene.search.highlight.formatter;import Org.apache.lucene.search.highlight.fragmenter;import Org.apache.lucene.search.highlight.highlighter;import Org.apache.lucene.search.highlight.queryscorer;import Org.apache.lucene.search.highlight.scorer;import Org.apache.lucene.search.highlight.simplefragmenter;import Org.apache.lucene.search.highligHt. Simplehtmlformatter;import Org.apache.lucene.store.directory;import Org.apache.lucene.store.fsdirectory;import         Org.apache.lucene.store.ramdirectory;import org.apache.lucene.util.version;/** * @author Xudongwang 2012-2-10 * * Email:xdwangiflytek@gmail.com */public class Firstlucene03byhighlighter {/** * source file path */private String filePath01 = "F	: \\Workspaces\\workspaceSE\\BlogDemo\\luceneDatasource\\HelloLucene01.txt ";	/** * Index Path */private String Indexpath = "F:\\workspaces\\workspacese\\blogdemo\\luceneindex";	/** * Word breaker, here we use the default word breaker, the standard Analyzer (several, but the support for Chinese is not good) */Private Analyzer Analyzer = new StandardAnalyzer (version.lucene_35);	Private Directory ramdir = null; /** * Search * * * @param querystr * Search keywords * @throws Exception * * * public void Search (String querystr) throws		Exception {///1, parse the text to be searched into the query object string[] fields = {"Name", "content"};		Queryparser queryparser = new Multifieldqueryparser (version.lucene_35, fields, analyzer); Query query = Queryparser. Parse (QUERYSTR);		2, the query Indexreader Indexreader = Indexreader.open (Ramdir);		Indexsearcher indexsearcher = new Indexsearcher (Indexreader);		Filter filter = NULL;		Topdocs Topdocs = indexsearcher.search (query, filter, 10000); System.out.println ("A total of" "+ Topdocs.totalhits +" "matches the result");//Note that the matching result here refers to the number of documents, not the number of search results contained in the document//Prepare the highlighter Formatter forma		tter = new Simplehtmlformatter ("<font color= ' Red ' >", "</font>");		Scorer scorer = new Queryscorer (query);		Highlighter highlighter = new Highlighter (formatter, scorer); Fragmenter fragmenter = new Simplefragmenter (10);//Specify 10 characters highlighter.settextfragmenter (fragmenter);//decide whether to generate a summary, As well as the summary how long//3, take out the data and print the result for (Scoredoc scoreDoc:topDocs.scoreDocs) {int DOCSN = scoredoc.doc;//Document Internal number Cument = Indexsearcher.doc (DOCSN);//The corresponding document is taken out according to the document number//to highlight//return the highlighted result, if the current attribute value does not appear in the keyword, will return a null String highlighters						TR = highlighter.getbestfragment (Analyzer, "content", Document.get ("content")); if (Highlighterstr = = NULL) {String content = document.get ("content");				int endIndex = Math.min (+, content.length ());			Highlighterstr=content.substring (0, EndIndex);//Up to the first 20 characters} Document.getfield ("Content"). SetValue (HIGHLIGHTERSTR); File2document.printdocumentinfo (documents);//Print Out Document Information}}/** * Optimize the creation of indexes, the index exists in memory and disk with the use of * * @throws Exception */publi		c void Createindexbyyouhua () throws Exception {file Indexfile = new File (Indexpath);		Directory Fsdir = Fsdirectory.open (indexfile);		1. When starting, read the index in the disk into memory Ramdir = new Ramdirectory (fsdir);		Indexwriterconfig ramconf = new Indexwriterconfig (version.lucene_35, analyzer);		Run the program when the index in memory is indexwriter ramindexwriter = new IndexWriter (Ramdir, ramconf);		Document document = File2document.file2document (FILEPATH01);		Ramindexwriter.adddocument (document);		Ramindexwriter.close ();		2. Save the In-memory index to disk when exiting Indexwriterconfig fsconf = new Indexwriterconfig (version.lucene_35, analyzer);		IndexWriter fsindexwriter = new IndexWriter (Fsdir, fsconf); FsINdexwriter.addindexes (Ramdir);//merge all index data from several other index libraries into the current index library fsindexwriter.commit ();		Fsindexwriter.optimize ();//optimize the index file to reduce the IO operation Fsindexwriter.forcemerge (1);	Fsindexwriter.close (); } public static void Main (string[] args) throws Exception {Firstlucene03byhighlighter Lucene = new Firstlucene03byhighli		Ghter ();		Lucene.createindexbyyouhua ();	Lucene.search ("Iteye"); }}

Operation Result:

There are a total of "1" matching results

Name-->hellolucene01.txt

Content-in <font color= ' Red ' >iteye</font> blog

Path-->f:\workspaces\workspacese\blogdemo\lucenedatasource\hellolucene01.txt

Size-->84

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.