The caching mechanism and implementation scheme of Lucene

Source: Internet
Author: User
Tags bitset int size locale

Lucene caching mechanisms and solutions

Overview... 1

1, Filter Cache. 1

2, Field cache ... 2

3. Conclusion ... 6

4.LuceneBase Cache Solution ... 6

 

 

Overview

Lucene's caching can be divided into two categories: Filter cache and field cache.

The implementation class for filter cache is Cachingwrapperfilter, which is used to cache query results for other filter.

The implementation class for field cache is Fieldcache, which caches the values of the field used for sorting.

In simple terms, the filter cache is used for query caching, and field cache is used for sorting.

The lifetime of both caches is within a indexreader instance, so the key to improving Lucene query performance is how to maintain and use the same indexreader (that is, indexsearcher).

Filter Cache

In the strictest sense, Lucene does not query data caching similar to the database server. Lucene's filter cache implementation class is cachingwrapperfilter, which caches bits that are found out. In addition, Lucene provides Filtermanager, a single instance object, to cache the filter itself.

The following are the specific implementations of Cachingwrapperfilter:

public class Cachingwrapperfilter extends Filter {

protected filter filter;

Protected transient map cache;//This is a map used as a cache

Public Cachingwrapperfilter (filter filter) {

This.filter = filter;

}

Public Bitset bits (Indexreader reader) throws IOException {

if (cache = null) {

cache = new Weakhashmap ()///Weakhashmap implemented with JVM reclaim memory

}

Synchronized (cache) {//Check cache

Bitset cached = (bitset) cache.get (reader)//key Indexreader,value is Bitset, so the cache lifetime is within a indexreader

if (cached!= null) {

return cached;

}

}

If the cache is not found, reread the

Final Bitset bits = filter.bits (reader);

Synchronized (cache) {//update cache

Cache.put (reader, bits);

}

return bits;

}

In Filtermanager, the Filter.hashcode () is used as the key, so the Hashcode () method should be overloaded in the custom filter class.

Example: Filter filter=filtermanager.getinstance (). GetFilter (New Cachingwrapperfilter (new Myfilter)), if the filter already exists, Returns the filter's cache (with bit caching) in Filtermanager, otherwise returns itself (without bit caching).

There is a timed thread in the Filtermanager that periodically cleans up the cache in case of a memory overflow error.

Field Cache

The field cache is used for sorting purposes. Lucene reads the fields that need to be sorted into memory for sorting, which is related to the amount of memory and the number of documents. A lot of people use Lucene to do the sort of memory overflow problem, generally because each query starts a new searcher instance to query, when the concurrency is large, resulting in multiple searcher instances loading sort fields, causing memory overflow.

The implementation class for the field cache is Fieldcacheimpl, so let's look at how we use the field cache for sorting:

In the Indexsearcher class, queries about sorting are called to this method:

Public Topfielddocs Search (Weight Weight, filter filter, final int ndocs,sort Sort) throws IOException {

Topfielddoccollector collector =

New Topfielddoccollector (reader, sort, ndocs);//Sort operations implemented by Topfielddoccollector

Search (weight, filter, collector)//Start query, query results callback Collector.collect () method to implement sorting

Return (Topfielddocs) Collector.topdocs ()//Returns the Topfielddocs object, the difference between this object and Topdocs is the information in the Topfielddocs that contains the sorted field, including the field name and field value. One example of scoredoc[] in Topfielddocs is fielddoc[]

}

Here's a look at how Topfielddoccollector.collect () is implemented:

public void Collect (int doc, float score) {

if (Score > 0.0f) {

totalhits++;

if (REUSABLEFD = null)

REUSABLEFD = new Fielddoc (doc, score); s

else {

Reusablefd.score = score;

Reusablefd.doc = doc;

}

REUSABLEFD = (Fielddoc) hq.insertwithoverflow (REUSABLEFD);//hq is a Fieldsortedhitqueue object, a subclass of Priorityqueue, Insertwithoverflow () implements a fixed-size sort queue in which the sorted object is extruded

}

}

Fieldsortedhitqueue is accomplished by overloading the LessThan () method:

*/

Protected Boolean LessThan (Final object A, final object B) {

Final Scoredoc Doca = (Scoredoc) A;

Final Scoredoc DOCB = (scoredoc) b;

Run comparators

Final int n = comparators.length;

int c = 0;

for (int i=0; i<n && c==0; ++i) {

c = (fields[i].reverse)? Comparators[i].compare (DOCB, Doca)

: Comparators[i].compare (Doca, DOCB),//through comparators[] to sort, the rest of our task is to see how these comparator[] are constructed, how to use the Fieldcache

}

Avoid random sort order that could leads to duplicates (bug #31241):

if (c = = 0)

return doca.doc > Docb.doc;

return c > 0;

}

Comparators is created in a Fieldsortedhitqueue constructor:

Public Fieldsortedhitqueue (Indexreader Reader, sortfield[] fields, int size) throws IOException {

Final int n = fields.length;

comparators = new Scoredoccomparator[n];

This.fields = new Sortfield[n];

for (int i=0; i<n; ++i) {

String fieldname = Fields[i].getfield ();

Comparators[i] = getcachedcomparator (reader, FieldName, Fields[i].gettype (), Fields[i].getlocale (), Fields[i]. GetFactory ());//Call getcachedcomparator method to get cached Comparators,comparator is an instance of Scoredoccomparator

if (comparators[i].sorttype () = = sortfield.string) {

This.fields[i] = new SortField (fieldname, Fields[i].getlocale (), Fields[i].getreverse ());

} else {

This.fields[i] = new SortField (fieldname, Comparators[i].sorttype (), Fields[i].getreverse ());

}

}

Initialize (size);

}

Here's a look at the implementation of Getcachedcomparator ():

Static final Fieldcacheimpl.cache comparators = new Fieldcacheimpl.cache () {

。。。

}

Static Scoredoccomparator Getcachedcomparator (Indexreader Reader, String field, int type, Locale Locale, sortcomparators Ource Factory) throws IOException {

The following two types do not require a read field

if (type = = Sortfield.doc) return scoredoccomparator.indexorder;//sorted by index order

if (type = = Sortfield.score) return scoredoccomparator.relevance;//sorted by relevance

Fieldcacheimpl.entry Entry = (factory!= null)? New Fieldcacheimpl.entry (field, factory)

: New Fieldcacheimpl.entry (field, type, locale);

Other types of sorting require reading fields to the cache

Return (Scoredoccomparator) comparators.get (reader, entry);//comparators is an instance of Fieldcache

}

The Comparators.get () method returns the different implementations of scoredoccomparator depending on the sort field type, and let's look at the implementation of the string type to know when to call Fieldcache:

Static Scoredoccomparator comparatorstring (final Indexreader reader, final String fieldname)

Throws IOException {

Final String field = Fieldname.intern ();

The following code reads the cache, gets the corresponding relationship between the field value and the document ID, and reads the index file if the cache does not exist. The life cycle of the cache is the same as that of indexreader, so different queries use the same searcher to ensure that there is only one sort cache and that there is no memory overflow problem

Final Fieldcache.stringindex index = FieldCache.DEFAULT.getStringIndex (reader, field);

return new Scoredoccomparator () {

Public final int Compare (final scoredoc I, final Scoredoc j) {

The value of final int fi = index.order[i.doc];//index.order[] is sorted by custom field, and the index of the array is Lucene docid , you can look at the specific implementation of Getstringindex to see how these values are read, here is not a detailed description of the

Final int fj = Index.order[j.doc];

if (Fi < FJ) return-1;

if (Fi > FJ) return 1;

return 0;

}

Public comparable Sortvalue (final Scoredoc i) {

return Index.lookup[index.order[i.doc]];

}

public int SortType () {

return sortfield.string;

}

};

}

Conclusions

Lucene has been able to solve most of the problems with the above two caching mechanisms. SOLR is encapsulated on Lucene and adds additional caching, but it should be said to be less useful, making the code very complicated.

Caching Solutions

The lifetime of the Lucene cache is within a Indexreader instance, so the key to improving Lucene query performance is how to maintain and use the same indexreader (i.e. indexsearcher).

So we need to write a new Singleindexsearcher (under the source code) class, which inherits Indexsearcher and acts as a single case pattern for indexsearcher implementation.

Lucenebase joins the class Singleindexsearcher and uses Singleindexsearcher to generate Indexsearcher objects. GetInstance () method.

Cache Filter Usage: Filter filter = new Cachingwrapperfilter (New Fieldfilter (field, value));

Or

Filter filter = Filtermanager.getinstance (). GetFilter (New Cachingwrapperfilter (New Fieldfilter (field, value));

/**

* Indexsearcher Single example mode is implemented to take advantage of Lucene cache while preventing multiple Indexsearcher objects from causing memory overflow and concurrency problems

*

* @author Lu Weijie

* @version 1.0, 2010-8-4

* @see Indexsearcher

*/

public class Singleindexsearcher extends Indexsearcher {

/** private Static Singleindexsearcher Object * *

private static Indexsearcher instance;

static{

try {

Instance = new Singleindexsearcher (Configure.getproperties (). GetProperty ("Zkanalyzerpath"));

System.out.println ("construction");

catch (Corruptindexexception e) {

E.printstacktrace ();

catch (IOException e) {

E.printstacktrace ();

}

<

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.