Now we can use Lucene to create indexes.
The following describes several functions to improve them:
1. Index format
In fact, the index directory has two formats. One is that, apart from the configuration file, each document becomes a file independently (this search will affect the speed ). The other is to make all the documents into one file, which is faster in composite mode.
2. index file location:
Indexes can be stored in two locations: 1. Hard Disk, 2. Memory
You can use fsdirectory () on the hard disk and ramdirectory () in the memory, but it will be gone once it is shut down.
Fsdirectory. getdirectory (File file, Boolean create)
Fsdirectory. getdirectory (string path, Boolean create) Two factory methods return directory
New ramdirectory ().
Then match indexwriter (Directory D, analyzer A, Boolean create ).
For example:
Indexwrtier indexwriter = new indexwriter (fsdirectory. getdirectory ("C: // Index", true), new standardanlyazer (), true );
Indexwrtier indexwriter = new indexwriter (New ramdirectory (), new standardanlyazer (), true );
3. Merge Indexes
Indexwriter. addindexes (directory [] dirs) can be used to add the directory.
Let's look at an example:
Public void uniteindex () throws ioexception
{
Indexwriter writerdisk = new indexwriter (fsdirectory. getdirectory ("C: // indexdisk", true), new standardanalyzer (), true );
Document docdisk = new document ();
Docdisk. Add (new field ("name", "programmer's house", field. Store. Yes, field. Index. tokenized ));
Writerdisk. adddocument (docdisk );
Ramdirectory ramdir = new ramdirectory ();
Indexwriter writerram = new indexwriter (ramdir, new standardanalyzer (), true );
Document docram = new document ();
Docram. Add (new field ("name", "Programmer Magazine", field. Store. Yes, field. Index. tokenized ));
Writerram. adddocument (docram );
Writerram. Close (); // This method is very important and must be called.
Writerdisk. addindexes (new directory [] {ramdir });
Writerdisk. Close ();
}
Public void unitesearch () throws parseexception, ioexception
{
Queryparser = new queryparser ("name", new standardanalyzer ());
Query query = queryparser. parse ("programmer ");
Indexsearcher = new indexsearcher ("C: // indexdisk ");
Hits hits = indexsearcher. Search (query );
System. Out. println ("found" + hits. Length () + "result ");
For (INT I = 0; I
{
Document Doc = hits.doc (I );
System. Out. println (Doc. Get ("name "));
}
}
This example combines the indexes in the memory into the hard disk.
Note: When merging, you must call the close () method of the merged indexwriter.
4. Other indexing operations:
The indexreader class is used to operate indexes. It has operations such as deleting documents and fields.
The following content is: full-text search
Full-text search is mainly used: indexsearcher, query, hits, and document (all subcategories of query), and sometimes queryparser
Main steps:
1. New queryparser (field, new analyzer)
2. query = queryparser. Parser ("the string to be queried"); in this case, we can use the reflection API to see what type of query is.
3. New indexsearcher (index directory). Search (query); Return hits
4.use hits.doc (n); you can traverse the document
5. Use document to obtain the field details.
In fact, steps 1 and 2 are used to get a query instance. What type of analyzer is there. Take the previous example as an example.
Queryparser = new queryparser ("name", new standardanalyzer ());
Query query = queryparser. parse ("programmer ");
/* Org. Apache. Lucene. Search. phrasequery */
Indexsearcher = new indexsearcher ("C: // indexdisk ");
Hits hits = indexsearcher. Search (query );
No matter what type, the return is nothing more than the query subclass. We can skip these two steps to directly create an instance of the query subclass, however, these two steps are usually used because phrasequery is returned, which is a very powerful query subclass. It can be used for multi-word search. queryparser can be used to set the relationship between various keywords. This is the most common practice.
Indexsearcher:
In fact, indexsearcher has an internal indexreader used to read the index. indexsearcher has a close () method. This method is not used to disable indexsearcher, but to disable built-in indexreader.
For queryparser, you can use parser. setoperator () to set the relationship (and or) between each keyword. It can automatically separate the keyword from the string using spaces.
Note: When queryparser is used for search, the analyzer must be the same as the analyzer used for index creation.
Query:
You can see that the help document of javase2.0 has many sub-classes:
Booleanquery, constantscorequery, constantscorerangequery, disjunctionmaxquery, filteredquery, matchalldocsquery, multiphrasequery, multitermquery, phrasequery, prefixquery, rangequery, spanquery
You can refer to the document to learn their usage.
The following section describes the Lucene Analyzer:
Analyzer consists of a word divider and a filter. In English, a word divider separates words by spaces. A filter removes words such as the, to, and of from search and indexing.
Standardanalyzer () is the Lucene standard analyzer that integrates many internal analyzer.
Last part: Lucene Advanced Search
1. Sort
Lucene uses indexsearcher. Search (query, sort) for built-in sorting, but the function is not ideal. We need to customize the sorting by ourselves.
In this case, two interfaces are implemented: scoredoccomparator and sortcomparatorsource.
Use indexsearcher. Search (query, new sort (New sortfield (string field, sortcomparatorsource )));
Let's look at an example:
This is an example of creating an index:
Public void indexsort () throws ioexception
{
Indexwriter writer = new indexwriter ("C: // indexstore", new standardanalyzer (), true );
Document Doc = new document ();
Doc. Add (new field ("sort", "1", field. Store. Yes, field. Index. tokenized ));
Writer. adddocument (DOC );
Doc = new document ();
Doc. Add (new field ("sort", "4", field. Store. Yes, field. Index. tokenized ));
Writer. adddocument (DOC );
Doc = new document ();
Doc. Add (new field ("sort", "3", field. Store. Yes, field. Index. tokenized ));
Writer. adddocument (DOC );
Doc = new document ();
Doc. Add (new field ("sort", "5", field. Store. Yes, field. Index. tokenized ));
Writer. adddocument (DOC );
Doc = new document ();
Doc. Add (new field ("sort", "9", field. Store. Yes, field. Index. tokenized ));
Writer. adddocument (DOC );
Doc = new document ();
Doc. Add (new field ("sort", "6", field. Store. Yes, field. Index. tokenized ));
Writer. adddocument (DOC );
Doc = new document ();
Doc. Add (new field ("sort", "7", field. Store. Yes, field. Index. tokenized ));
Writer. adddocument (DOC );
Writer. Close ();
}