Understand the core classes in the Lucene index and search process

Source: Internet
Author: User

Understanding the core classes in the indexing process

The classes you need to perform a simple index are:

IndexWriter,? Directory,? Analyzer,? Document,? Field

1, IndexWriter

IndexWriter (write index) is the core component of the indexing process, which is responsible for creating new indexes, or opening existing indexes, and adding, deleting, or updating the indexed documents to the index, but not reading or searching the index. IndexWriter need to open up some space to store the index, which is done by directory

2. Directory

/** a Directory is a flat list of files. Files may be written once, when they
* is created. Once A file is created it may are opened for read, or
* deleted. The Random access is permitted both when reading and writing.
*
* <p> Java s I/O APIs not used directly, but rather all I/O is
* Through this API. This permits things such as: <ul>
* <li> implementation of ram-based indices;
* <li> implementation indices stored in a database, via JDBC;
* <li> implementation of an index as a single file;
* </ul>
*
* Directory locking is implemented by an instance of {@link
* Lockfactory}, and can is changed for each Directory
* instance using {@link #setLockFactory}.
*
*/

Directory describes the location of the index, which is an abstract class whose subclasses are responsible for specifying the storage path of the index

3. Analyzer

Analyzer is specified by the IndexWriter construction method, which is responsible for extracting words from the indexed text file, and Analyzer is an abstract class that accomplishes the related functions by its subclasses.

4. Document

Represents a collection of fields (field), Lucene can only extract text from a binary document as a field instance

5. Field

A document contains different types of information that can be indexed separately, such as title, Time, body, author, etc., and can be stored in different domains.

Understanding the core classes in indexing and searching

The search interface provided by Lucene is as simple as it is easy to understand:

Indexsearcher, term, Query, Termquery, Topdocs

1, Indexsearcher

Indexsearcher is used to search for an index created by the IndexWriter class, which requires a directory instance to control the indexes that were created earlier before a large number of search methods can be provided. The simplest way to search is to use a single Query object and an int topn count as the parameter of the method and return a Topdocs object, a typical application of this method is as follows:

Directory dir = fsdirectory.open (new File ("/tmp/index"newnew termquery (new term ("contents", "Lucene"= Searcher.search (q, ten); Searcher.close ();

2. Term

The term object is the basic unit of the search function. During the search process, you can create a term object to use with the Termquery object:

New Termquery (new term ("contents", "Lucene"= Searcher.search (q, 10);

The meaning of the above code is to look for the first 10 documents containing Lucene in the Content field and arrange them in descending order

3. Query

Lucene contains a number of specific query (query) subclasses. Termquery, Booleanquery, Phrasequery, Prefixquery, Phraseprefixquery, Termrangequery, Numericrangequery, Filteredquery, Spanquery
4, Termquery

Termquery is the most basic type of query in Lucene to match a document with a specific item in the specified domain

5, Topdocs

The Topdocs class is a simple pointer container in which the pointer generally refers to the search results of a forward n rank, and the search results are documents that match the query criteria

Understand the core classes in the Lucene index and search process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.