Lucene various Search examples

Source: Internet
Author: User
Tags constructor dname

The second step in the search process is to build a query. The following is an introduction to query and its construction.

When the user enters a keyword, the search engine receives, does not immediately put it into the background to start a keyword retrieval, but should first of all the keyword analysis and processing, so that it becomes a kind of background can understand the form, only in this way, can improve the efficiency of retrieval, while retrieving more effective results. So, in Lucene, the process is actually building a query object.

In terms of the query object itself, it is just an abstract class in Lucene's search package, which has many subclasses that represent different types of retrieval. A common termquery is to encapsulate a simple keyword, similar to the Booleanquery, a Boolean lookup.

The search method of the Indexsearcher object always requires a query object (or an object of the query subclass), and this section describes the various query classes.

11.4.1 Search by Entry -termquery

Termquery is the simplest and most commonly used query. Termquery can be understood as an "entry search", and the most basic search in a search engine is to search for an entry in the index, and termquery is used to do the job.

In Lucene, the entry is the most basic search unit, in essence, an entry is actually a name/value pair. Only the name is the field name, and the value represents a keyword contained in the field.

To search using Termquery, you first need to construct a term object, with the sample code as follows:

Term aterm = new Term ("contents", "Java");

Then use the Aterm object as a parameter to construct a Termquery object, the code is set as follows:

Query query = new Termquery (aterm);

This way, all documents that contain "Java" in the "Contents" field are returned as results that meet the query criteria when querying with termquery.

The following is a code 11.4来 to introduce the specific implementation of the termquery process.

Code 11.4 Termquerytest.java

Package ch11; Import Org.apache.lucene.analysis.standard.StandardAnalyzer; Import org.apache.lucene.document.Document; Import Org.apache.lucene.document.Field; Import Org.apache.lucene.index.IndexWriter; Import Org.apache.lucene.index.Term; Import org.apache.lucene.search.Hits; Import Org.apache.lucene.search.IndexSearcher; Import Org.apache.lucene.search.Query; Import Org.apache.lucene.search.TermQuery; public class Termquerytest {public static void main (string[] args) throws Exception {//Generate Document Object Document DOC1 = new Document (); Add the contents of the "Name" Field Doc1.add (Field.text ("name", "Word1 word2 word3")); Add the contents of the "title" Field Doc1.add (Field.keyword ("title", "Doc1")); Generate index writer IndexWriter writer = new IndexWriter ("C://index", New StandardAnalyzer (), true); Add a document to the index writer.adddocument (DOC1); Close index Writer.close (); Build Query Object query query = NULL; Generates the Hits result object, saving the returned result Hits Hits = null; Generate the Retriever Indexsearcher searcher = new Indexsearcher ("C://index"); Constructs a Termquery object query = new Termquery(New term ("name", "Word1")); Starts the retrieval and returns the result of the search to hits hits = Searcher.search (query); Output the relevant information in the search results Printresult (hits, "Word1"); Constructs a Termquery object again, except that the queried field becomes the "title" Query = new Termquery (New term ("title", "Doc1")); Start the second retrieval and return the results to hits hits = Searcher.search (query); Output the relevant information in the search results Printresult (hits, "Doc1"); } public static void Printresult (Hits Hits, String key) throws Exception {System.out.println ("Find/" "+ Key +"/":"); if (hits! = null) {if (hits.length () = = 0) {System.out.println ("No results Found"),} else {System.out.println ("found" + Hits.leng Th () + "results"); for (int i = 0; i < hits.length (); i++) {Document d = hits.doc (i); String dname = D.get ("title"); System.out.print (Dname + "");} System.out.println (); System.out.println (); } } } }

Figure 11-9 Booleanquery Test 1 Figure 11-10 Booleanquery Test 2

Because Boolean queries can be nested, they can represent a combination of multiple conditions. However, if the number of clauses is too large, it may result in a decrease in lookup efficiency. Therefore, Lucene gives a default limit, which is that the number of clauses in a Boolean query cannot exceed 1024.

11.4.3 search within a range -rangequery

Sometimes a user will need to find a document in one scope, such as finding all documents in a time period, and Lucene provides a class called Rangequery to meet this requirement.

Rangequery represents a search condition within a range that implements the search function from a start entry to an end entry, where the "Start entry" and "End entry" can be included or not included in the query. Its specific usage is as follows:

Rangequery query = new Rangequery (begin, end, included);

In the argument list, the last Boolean value indicates whether the boundary condition itself is included, that is, when it is true, that contains the boundary value, which can be expressed as "[begin to end]", and when it is false, the representation does not contain a boundary value, and the character can be represented as "{begin to End} ".

The methods used by Rangequery are described in code 11.6 below.

Code 11.6 Rangequerytest.java

Package ch11; Import Org.apache.lucene.analysis.standard.StandardAnalyzer; Import org.apache.lucene.document.Document; Import Org.apache.lucene.document.Field; Import Org.apache.lucene.index.IndexWriter; Import Org.apache.lucene.index.Term; Import org.apache.lucene.search.Hits; Import Org.apache.lucene.search.IndexSearcher; Import Org.apache.lucene.search.RangeQuery; public class Rangequerytest {public static void main (String [] args) throws Exception {//Generate Document Object Doc1 = new Document (); Add the contents of the "Time" field, below Doc1.add (Field.text ("Time", "200001"); Add the contents of the "title" field, below Doc1.add (Field.keyword ("title", "Doc1")); Document DOC2 = new document (); Doc2.add (Field.text ("Time", "200002")); Doc2.add (Field.keyword ("title", "Doc2")); Document DOC3 = new document (); Doc3.add (Field.text ("Time", "200003")); Doc3.add (Field.keyword ("title", "Doc3")); Document DOC4 = new document (); Doc4.add (Field.text ("Time", "200004")); Doc4.add (Field.keyword ("title", "Doc4")); Document DOC5 = new document (); Doc5.add (Field.text ("Time", "200005")); Doc5.add (Field.keyword ("title", "Doc5")); Generate index writer IndexWriter writer = new IndexWriter ("C://index", New StandardAnalyzer (), true); Set to mixed index format writer.setusecompoundfile (TRUE); Add a Document object to the index writer.adddocument (DOC1); Writer.adddocument (DOC2); Writer.adddocument (DOC3); Writer.adddocument (DOC4); Writer.adddocument (DOC5); Close index Writer.close (); Generate Index finder Indexsearcher searcher = new Indexsearcher ("C://index"); Construct entry Term BeginTime = new terms ("time", "200001"); Term endTime = new term ("Time", "200005"); Hits Hits = null for saving the results of the search; Generates a Rangequery object, initialized to null rangequery query = NULL; Constructs a Rangequery object that does not contain a boundary value in the search condition query = new Rangequery (BeginTime, EndTime, false); Starts the retrieval and returns the result of the search hits = searcher.search (query); Output information about the results of the search Printresult (hits, "Documents from 200001~200005, excluding 200001 and 200005"); Constructs a Rangequery object, the search condition contains the boundary value of query = new Rangequery (BeginTime, EndTime, true); Start the second retrieval of hits = Searcher.search (query); Information about the output of the search results Printresult (hits, "Documents from 200001~200005, including 200001 and 200005 "); } public static void Printresult (Hits Hits, String key) throws Exception {System.out.println ("Find/" "+ Key +"/":"); if (hits! = null) {if (hits.length () = = 0) {System.out.println ("No results Found"),} else {System.out.print ("found"); for (int i = 0; I < hits.length (); i++) {Document d = hits.doc (i); String dname = D.get ("title"); System.out.print (Dname + "");} System.out.println (); System.out.println (); } } } }

In the above code, two term entries are constructed first, and a Rangequery object is constructed. When initializing the Rangequery object, use the constructed two term entries as arguments to the Rangequery constructor. As already mentioned, the two parameters in Rangequery's constructor are called "Start entry" and "End term", which means finding all the document between the two.

The value of the "Time" field for the document being built is between 200001~200005, and its results are as shown in Figure 11-11.

Figure 11-11 Rangequery Test Results

As can be seen from Figure 11-11, in code 11.6 using Rangequery a total of two searches, the first search condition does not include the boundary value, the second retrieval condition includes the boundary value.

As can be seen from code 11.6 and Figure 11-11, the 1th time the Rangequery object constructed with the false argument does not include 2 boundary values, so only 3 document is returned, and the 2nd Rangequery constructed with the true parameter includes 2 boundary values, so 5 document all Returned.

11.4.4 Search using prefixes -prefixquery

Prefixquery is the use of prefixes to find. Typically, a term is defined first. The entry contains the name of the field to look up and the prefix of the keyword, and then constructs a Prefixquery object from that entry, which can be searched by prefix.

Below is an example of code 11.7 using prefix

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.