Search engine construction based on heritrix + Lucene (5) -- search part

Source: Internet
Author: User

Lucene provides a retrieval tool. When using Lucene for retrieval, we mainly use the following classes (or interfaces ):
1) indexsearcher: it is the most basic retrieval tool in Lucene. indexsearcher is used for all searches;
2) query: Query. Lucene supports fuzzy query, semantic query, phrase query, and combined query, such as termquery, booleanquery, rangequery, and wildcardquery.
3) queryparser: a tool used to parse user input. You can scan user input strings to generate query objects.
4) hits: After the search is complete, you need to return the search result and display it to the user. Only in this way can the search be completed. In Lucene, the set of search results is represented by instances of the hits class.
5) Analyzer: analyzer, mainly used to analyze various texts encountered by search engines. Commonly used include standardanalyzer, stopanalyzer, and whitespaceanalyzer.

The following is an example of search. Java for created indexes:

 1   Package  Gesearcher. search;  2   3   Import  Java. Io. ioexception;  4  5   Import  Jeasy. analysis. mmanalyzer;  6   7   Import  Org. Apache. Lucene. analysis. analyzer;  8   Import  Org. Apache. Lucene. analysis. Standard. standardanalyzer;  9   Import  Org. Apache. Lucene. Index. term;  10   Import Org. Apache. Lucene. queryparser. multifieldqueryparser;  11   Import  Org. Apache. Lucene. queryparser. parseexception;  12   Import  Org. Apache. Lucene. Search. Hits;  13   Import  Org. Apache. Lucene. Search. indexsearcher;  14   Import  Org. Apache. Lucene. Search. query;  15  Import  Org. Apache. Lucene. Search. Sort;  16   Import  Org. Apache. Lucene. Search. termquery;  17   18   Import  Gesearcher. systemconfig. config;  19   20   Public   Class  Search {  21   22      Private  String indexpath;  23       Private  Hits hits;  24       Private  String searchkey;  25       26       Public  Search (string searchkey)  27   {  28           // This. indexpath = indexpath;  29           This . Searchkey = Searchkey;  30           This . Indexpath = New Config (). getindexpath (); //  Directory where the index file is located  31   Dosearch ();  32   }  33       34      Public   Void  Dosearch ()  35   {  36           Try  {  37 Indexsearcher searcher = New Indexsearcher (indexpath ); //  Retrieval Tools  38 String keyval = searchkey. tolowercase (). replaceall ("(or | and )","" ) 39 . Trim (). replaceall ("\ s +", "and" );  40 Analyzer analyzer = New Mmanalyzer (); //  Word Segmentation analysis tools  41               //  *  //  42               //  Analyzer analyzer = new standardanalyzer ();  43 String [] fields = {"title", "content"};  44 Multifieldqueryparser Mt = New  Multifieldqueryparser (fields, analyzer );  45 Query query = Mt. parse (keyval ); //  Query  46               //  */  47               //  Term T = new term ("content", keyval );  48              //  Query query = new termquery (t );  49 Hits = searcher. Search (query, sort. relevance ); //  Get matched documents sorted by document score  50   }  51           52               Catch  (Parseexception e ){  53                   //  Todo auto-generated Catch Block  54  E. printstacktrace ();  55   }  56                Catch  (Ioexception e ){  57               //  Todo auto-generated Catch Block  58   E. printstacktrace ();  59   }  60   }  61      62       Public  Hits gethits ()  63   {  64           Return  Hits;  65   }  66 }

 

 

AboveCodeThe query interface uses the multifieldquery object returned by multifieldqueryparser, that is, multi-field query implementation.

Analyzer uses the mmanalyzer class objects of je-easyanalyzer.

The code string keyval = searchkey. tolowercase (). replaceall ("(or | and )",""). trim (). replaceall ("\ s +", "and"); simply removes the logical connection words such as "or, and" used on the search interface.

A retrieval result is an image of the hits class, And the hits Class Object encapsulates information that matches the document set.

A simple search instance can be implemented through these classes (interfaces). Of course, more powerful functions and more intelligent retrieval are required for research and development.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.