Understanding the core classes in the indexing process
The classes you need to perform a simple index are:
IndexWriter,? Directory,? Analyzer,? Document,? Field
1, IndexWriter
IndexWriter (write index) is the core component of the indexing process, which is responsible for creating new indexes, or opening existing indexes, and adding, deleting, or updating the indexed documents to the index, but not reading or searching the index. IndexWriter need to open up some space to store the index, which is done by directory
2. Directory
/** a Directory is a flat list of files. Files may be written once, when they
* is created. Once A file is created it may are opened for read, or
* deleted. The Random access is permitted both when reading and writing.
*
* <p> Java s I/O APIs not used directly, but rather all I/O is
* Through this API. This permits things such as: <ul>
* <li> implementation of ram-based indices;
* <li> implementation indices stored in a database, via JDBC;
* <li> implementation of an index as a single file;
* </ul>
*
* Directory locking is implemented by an instance of {@link
* Lockfactory}, and can is changed for each Directory
* instance using {@link #setLockFactory}.
*
*/
Directory describes the location of the index, which is an abstract class whose subclasses are responsible for specifying the storage path of the index
3. Analyzer
Analyzer is specified by the IndexWriter construction method, which is responsible for extracting words from the indexed text file, and Analyzer is an abstract class that accomplishes the related functions by its subclasses.
4. Document
Represents a collection of fields (field), Lucene can only extract text from a binary document as a field instance
5. Field
A document contains different types of information that can be indexed separately, such as title, Time, body, author, etc., and can be stored in different domains.
Understanding the core classes in indexing and searching
The search interface provided by Lucene is as simple as it is easy to understand:
Indexsearcher, term, Query, Termquery, Topdocs
1, Indexsearcher
Indexsearcher is used to search for an index created by the IndexWriter class, which requires a directory instance to control the indexes that were created earlier before a large number of search methods can be provided. The simplest way to search is to use a single Query object and an int topn count as the parameter of the method and return a Topdocs object, a typical application of this method is as follows:
Directory dir = fsdirectory.open (new File ("/tmp/index"newnew termquery (new term ("contents", "Lucene"= Searcher.search (q, ten); Searcher.close ();
2. Term
The term object is the basic unit of the search function. During the search process, you can create a term object to use with the Termquery object:
New Termquery (new term ("contents", "Lucene"= Searcher.search (q, 10);
The meaning of the above code is to look for the first 10 documents containing Lucene in the Content field and arrange them in descending order
3. Query
Lucene contains a number of specific query (query) subclasses. Termquery, Booleanquery, Phrasequery, Prefixquery, Phraseprefixquery, Termrangequery, Numericrangequery, Filteredquery, Spanquery
4, Termquery
Termquery is the most basic type of query in Lucene to match a document with a specific item in the specified domain
5, Topdocs
The Topdocs class is a simple pointer container in which the pointer generally refers to the search results of a forward n rank, and the search results are documents that match the query criteria
Understand the core classes in the Lucene index and search process