Lucene Query Method Introduction

Source: Internet
Author: User

This paper first introduces some of the production of Lucene entity class introduction. This paper focuses on the centralized query method of Lucene.

1, Analysis: Word breaker

The analysis includes some built-in parsers, such as the whitespaceanalyzer of Word segmentation by whitespace, adding stopwrod filtered Stopanalyzer, the most commonly used standardanalyzer.

2, Documet: Documents

Is our source data encapsulation structure, we need to divide the source data into different domains, put into the documet inside, when the search can also specify which fields (field).

3. Directory: Catalogue

This is an abstraction of the directory, which can be a dir (fsdirectory) on the file system, or a piece of memory (ramdirectory), mmapdirectory an index that uses memory mappings. If you put it in memory, you will avoid the IO operation and choose it as needed.

4, IndexWriter : Index writer, that is, the maintenance of the index to read and delete operations of the class

5, Indexreader : Index Reader, for reading the index of the specified directory.

6, Indexsearcher : Index of the search engine, is the user input to the index list to search for a class

It should be noted that this search is the (topdocs) index number, is not a real article.

7, query: Search statements, we need to our query string encapsulated into query can be handed to searcher to search, the smallest unit of inquiry is Term,lucene query there are many kinds of, According to different needs of different query is the choice.

I. Termquery:

If you want to execute a query like: "Include the document of Lucene in the Content field," You can use Termquery:

Term t = new Term ("Content", "Lucene"); Query query = new Termquery (t);

Ii. booleanquery: Queries for "and or" relationships in multiple query

If you want to query this: "Include Java or Perl document in the Content field," You can create two termquery and connect them with Booleanquery:

Termquery termQuery1 = new Termquery (new Term ("content", "Java"); Termquery termquery 2 = new Termquery (new Term ("Content", "Perl"); Booleanquery booleanquery = new Booleanquery (); Booleanquery.add (TermQuery1, BooleanClause.Occur.SHOULD); Booleanquery.add (TermQuery2, BooleanClause.Occur.SHOULD);

Iii. wildcardquery: Wildcard Query

If you want to make a wildcard query on a word, you can use Wildcardquery, wildcard characters include '? ' Match an arbitrary character and ' * ' match 0 or more arbitrary characters, such as you search ' use* ', you may find ' useful ' or ' useless ':

Query query = new Wildcardquery (new Term ("Content", "use*");

Iv. phrasequery: query for words appearing within the specified text distance

You may be interested in the relationship between China and Japan, to find the ' middle ' and ' Day ' close (5 words within the distance) of the article, beyond this distance is not considered, you can:

Phrasequery query = new Phrasequery ();

Query.setslop (5);

Query.add (New Term ("Content", "medium"));

Query.add (New Term ("Content", "Day"));

Then it may search "Sino-Japanese cooperation ...", "China and Japan ...", but not found "a senior Chinese leader said Japan is not flat".

V. Prefixquery: The query word begins with a character

If you want to search for words that start with ' in ', you can use Prefixquery:

Prefixquery query = new Prefixquery (new Term ("Content", "medium");

Vi. Fuzzyquery: Similar search

Fuzzyquery is used to search for similar term, using the Levenshtein algorithm. If you want to search for words similar to ' Wuzza ', you can:

Query query = new Fuzzyquery (new Term ("Content", "Wuzza");

You may get ' fuzzy ' and ' Wuzzy '.

Vii. termrangequery: In-scope search

You may want to search the document from 20060101 to 20060130 in the time domain, and you can use Termrangequery:

Termrangequery Query2 = Termrangequery.newstringrange ("Time", "20060101", "20060130", true, true);

The last true indicates a closed interval.


8, Topdocs: result set, is the result of searcher search, inside is some scoredoc, this object's DOC member is this ID.

To get an article, you need to use this ID to fetch the article, Searcher provides a way to obtain the document with ID, and then you have the data.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.