Lucene search function

Source: Internet
Author: User

Search process Plots:

Main API:
    • Indexsearcher://All searches are made through indexsearcher and they will call the overloaded search () method in that class
    • Query://Encapsulates a specific subclass of a query type, and the query instance will be passed to the Indexsearcher search () method
    • Queryparser://To process user-entered query expressions into a variety of concrete query objects
    • Topdocs://Save a document with a higher rating returned by the Indexsearcher.search () method
    • Scoredoc://provides access to each search result in the Topdocs

Below, we will discuss these types of APIs separately.

Using the Indexsearcher class
//To create an instance of the Indexsearcher class:Directory dir = Fsdirectory.open (/path/to/indices); Indexreader reader =Indexreader.open (dir); Indexsearcher Searcher=NewIndexsearcher (reader);//Implement search function Indexsearcher.search ():Topdocs Search (query query,intN//———— search directly to return the highest rated n documentsTopdocs Search (query query, filter filter,intN//———— Search is constrained by a subset of documents, and constraints are based on filter conditionsTopfielddocs Search (query query, filter filter,intN, sort sort)//———— Sort    voidSearch (query query, Collector results)//———— using a custom document access policy    voidSearch (query query, filter filter, Collector results)

Using the Topdocs class

Call the Indexsearcher.search () method to return the Topdocs object.

    Topdocs.totalhits ()          // number of documents matching the search criteria    topdocs.scoredocs ()          //  An array of Scoredoc objects containing the search results    topdocs.getmaxscore ()        // If the sorting is completed this returns the maximum score

Using the Query class

Using the query class is to query directly using Lucene's various query APIs.

Subclasses of Query can be instantiated either directly or through the Queryparser class. Queryparser instantiation will first convert free text to a variety of Query types, which will be described in the "Using Queryparser Class" subsection.

Termquery:

The query is case-sensitive and matches the case of the indexed item before the search

  termquery:    // The query is case-sensitive and matches the case of an indexed item before searching for    new term ("contents", "Java");     New Termquery (t);
Termrangequery:

Each term recall in the index is sorted by dictionary and allows direct search of text items within the scope provided by Lucene's Termquery object.

A two Boolean object parameter indicates whether to include the start/end of the search scope.

  termrangequery:    // using two Boolean object parameters to indicate whether to include the start/end of the search scope    newTrue  true);  // Search for documents with the title field starting from ' d ' to ' J '
Numericrangequery:

A two Boolean object parameter indicates whether to include the start/end of the search scope.

  numericrangequery:    // using two Boolean object parameters to indicate whether to include the start/end point of the search scope    New  truetrue);
Prefixquery:

Searches for documents that contain items that begin with the specified string.

  prefixquery:    // Search for documents containing items beginning with the specified string    new term ("category", "/ Technology/computers/programing ");     New Prefixquery (term);  // Search for programming books, including their sub-categories (subdirectories)    New Termquery (term);          // Search for programming books, excluding sub-categories (subdirectories)
Booleanquery:

Various query types can be combined by booleanquery.

  booleanquery:    // through Booleanquery You can combine various query types into    new termquery (  New term ("subject", "Search"));     true,true);     New booleanquery ();    Searchingbooks2010.add (Searchingbooks, BooleanClause.Occur.MUST);    Searchingbooks2010.add (Books2010,booleanclause.occur.must)
Phrasequery:

The Phrasequery class locates documents for items within a distance range based on the location information of the item.

  phrasequery:    //ThePhrasequery class locates the document that corresponds to an item in a distance range based on the position information of the item    new  Phrasequery ();    Query.setslop (slop);
Wildcardquery:

Query with incomplete, missing certain letters.

wildcardquery:   // Querying with incomplete , missing certain letters  New Wildcardquery (new term ("contents", "? ild*"));
Fuzzyquery:

Used to query items similar to the specified item (for example: Three/tree edit distance of 1).

fuzzyquery:   // used to query for items similar to the specified item (Three/tree edit distance is 1)  New Fuzzyquery (new term ("contents", "Wuzza"));
Matchalldocsquery:
matchalldocsquery:   New Matchalldocsquery ();      // fixed ratings  for matching allocations New // the document is graded according to the specified field

Class Queryparser object using Queryparser classes.

Used with matchversion, a domain name, and a parser to split the input text into Terms objects:

New New Simpleanalzer ());
Queryparser Application Examples
      New Indexsearcher (dir);       New New Simpleanalzer ());       // Parse "+junit +ant-mock" as the query object      Topdocs docs = searcher.search (query, 10);
Queryparser Parsing expressions

1. Query.tostring:

    • When the query expression is resolved to a change, the ToString method allows you to view the parsed statement.
// toString:query.add ( new Fuzzyquery ( "field", "Kountry"), BooleanClause.Occur.MUST); Assertequals ("+kountry~0.5" query.tostring (field));

2. Item Query:

    • By default, if a single word is not recognized as part of a longer expression of another query type, it will be parsed as a Termquery object.
// Termquery New Queryparser (version.lucene_30, "subject"// Default Domain SYSTEM.OUT.PRINTFLN ("term:" + query); // Output Term:subject:computers

3. Item Scope Query:

    • A range query for text or date is represented by parentheses, and you only need to connect between items at the end of the query scope with a to (uppercase).
    • Use brackets: [] indicates that the search scope is included
    • Curly braces: {} indicates that the search scope is excluded
    • Unlike programming constructs such as Termrangequery or numericrangequery, search scopes cannot be included and excluded at the same time, and can only be included or excluded.
// Item Scope Query New Queryparser (version.lucene_30, "subject"= Parser.parse ("Title2:[q to V]"instanceof  = Parser.parse ("Title2:{q to \" Tapestry in Action\ "}");

4. Phrase Search:

    • The items enclosed in the median double quotation mark of a query statement are converted to Phrasequery objects. Using "wildcard characters" in quotation marks will not take effect.
    • The single Phrase object will be converted to the Termquery object.
// phrase Query New New  = parser.parse ("\" This is Some phrase*\ ""); Assertequals ("analyzed", "\"?? Some phrase\ "", Q.tostring ("field"new queryparser (version.lucene_30, "field"= Parser.parse ("\" term\ ""); Asserttrue (instanceof termquery);

5. Boolean operators:

    • You can use and, or, and not, and Boolean operators must all be uppercase. If you do not specify a Boolean operator between items, the default is "OR".
    • A and B shortcut syntax +a +b
    • A OR B Quick Grammar ab
    • A and not B shortcut syntax +a-b
// Boolean operator New Queryparser (version.lucene_30, "contents", Analyzer);p Arser.setdefaultoperator (queryparser.and_operator);

6. Prefix query and wildcard query:

    • If an item contains an asterisk or Hello, the item is considered to be a wildcard query object wildcardquery.
    • When the query has only one asterisk at the end, the Queryparse class optimizes it to a prefix query.
    • Whether it is a prefix or a wildcard query, its objects are converted to lowercase (controllable).
// prefix and wildcard query New Queryparser (version.lucene_30, "filed"= Parser.parse ("prefixquery*"); Assertequals ("lowercased") , "prefixquery*", q.tostring ("field"))

7. Numeric range search and date range search:

    • The Queryparser class does not establish a numericrangequery class.

8. Fuzzy query:

New Queryparser (version.lucene_30, "subject"= Parser.parse ("kountry~"); SYSTEM.OUT.PRINTFLN ("fuzzy:" + query); // fuzzy:subject:kountry~0.5query = Parser.parse ("kountry~0-7"); SYSTEM.OUT.PRINTFLN ("Fuzzy 2:" + query); // fuzzy:subject:kountry~0.7

9. Matchalldocsquery:

    • When input *:* will be analyzed as Matchalldocsquery

10. Group query:

    • Queryparser supports nesting by using query expressions for grouped text types.
// queryparser using query expressions for grouped text types to support nested booleanquery clause queries New Queryparser (version.lucene_30, "subject"= Parser.parse ("Agile OR Extreme) and methodology");

11. Domain Selection:

    • The default domain name is created when the Queryparser is created.
    • If you use domain selector notation, you can specify items in non-default domains.

12. Sub-query Settings weighted:

    • Precede the floating-point number with a ^ symbol to set the weighting factor for the query.
    • JUNIT^2.0 testing sets the weighting factor of the JUnit Termquery to 2.0 and maintains testing termquery as the default 1.0.
Java//———— Default domain includes a document for Java itemsJava JUnit//———— Default domain includes documents for one or two items in Java and JUnitJava OR JUnit//———— Default domain includes documents for one or two items in Java and JUnit+java +junit//———— Default domain includes documentation for Java and junit two itemsTitle:java//———— A document that contains ant items in the title fieldTitle:extreme-subject:sports//———— Title field contains extreme and subject documents that do not contain sports in the domainTielt:extreme and not Subject:sports//———— Title field contains extreme and subject documents that do not contain sports in the domain(Agile OR Extreme) and Methodogy//———— The Default domain contains Methodogy and contains one or two documents in agile and extremeTitle: "JUnit in Action"//Document ———— Title field is JUnit in actionTitle: "JUnit Action"-5//———— document with a distance less than 5 between JUnit and action in the title fieldjava*//———— contains documents that begin with Javajava-//———— contain words similar to the word Javalastmodified:[1/1/09 to 12/31/09]//document ———— lastmodified field values in this time interval

Queryparser Special Characters

Queryparse uses backslashes to represent escapes in various items, and the characters that need to be escaped are:

+-! ():^]{}_*?

Search results page out

The multi-page results obtained from the first search are saved in Scoredocs and Indexsearcher instances.

Re-query every time a user browses a page

Near real-time search

Use an open IndexWriter to quickly search for changes to the index without first closing the writer or submitting it to the writer.

Lucene search function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.