SOLR Cache usage Introduction and analysis

Source: Internet
Author: User
Tags hash int size requires solr

This article describes the cache usage and related implementations that are involved in SOLR queries. The core class of SOLR queries is Solrindexsearcher, Each core is usually used at the same time only by the current solrindexsearcher for the upper handler (when switching solrindexsearcher there may be two simultaneous services), The various caches of SOLR are attached to the Solrindexsearcher, Solrindexsearcher in the cache, and solrindexsearcher the cache is emptied close off. The application cache in SOLR has Filtercache, Queryresultcache, Documentcache, and so on, all of which are Solrcache implementation classes and are Solrindexsearcher member variables. Each has a different logic and mission, the following are introduced and analyzed separately.

1. Solrcache Interface Implementation Class

SOLR provides two types of Solrcache interface implementations: Solr.search.LRUCache and Solr.search.FastLRUCache. Fastlrucache was introduced in version 1.4, and its speed is generally more fast than LRUCache.

The following is a comment on the main methods of the Solrcache interface:

Public interface Solrcache {/** * SOLR initializes various cacheconfig in the configuration when parsing a configuration file construction Solrconfig instance, * When constructing solrindexsearcher through SOLRC Onfig instance to newinstance Solrcache, * This invokes the Init method. Parameter args is the parameter map associated with the specific implementation (LRUCache and Fastlrucache), parameter persistence is a global thing, LRUCache and Fastlrucache use it to count * Cache access (because the cache is bound to solrindexsearcher, so this statistic requires a * Global injection parameter), the parameter regenerator is autowarm how to reload the cache, * The Cacheregenerator interface has only one method that is Solrcache warm method callback: * Boolean Regenerateitem (Solrindexsearcher newsearcher, * solrcache NE Wcache, Solrcache Oldcache, Object Oldkey, Object oldval) */public Object init (Map args, object persistence, Cachere
  Generator regenerator);
  /**: Todo:copy from MAP */public int size ();
  /**: Todo:copy from Map */public object put (object key, object value);
  /**: Todo:copy from Map */public object get (object key);
  /**: Todo:copy from Map */public void clear (); /** * Newly created Solrindexsearcher Autowarm method, the implementation of this method is to traverse the existing cache in the appropriate * range (because the old cache is usually not all the items are reloaded again), for each item called REGenerator The * Regenerateitem method to load the new cache entry for searcher.
  */void Warm (Solrindexsearcher searcher, Solrcache old) throws IOException;
/** frees any non-memory resources */public void close (); }
1.1, Solr.search.LRUCache

LRUCache configurable parameters are as follows:

1) The maximum number of items that can be saved in the Size:cache, the default is 1024

2) The size of the Initialsize:cache when initialized, the default is 1024.

3) Autowarmcount: When switching solrindexsearcher, the newly generated solrindexsearcher can be Autowarm (preheated). Autowarmcount represents how many items are taken from the old solrindexsearcher to be regenerated in the new Solrindexsearcher, and how the Cacheregenerator implementation is regenerated. In the current 1.4 version of SOLR, this autowarmcount can only take the number of warm-up items, and the future 4.0 version can be specified as a percentage of the number of cache entries in order to better balance the overhead and effect of autowarm. If this parameter is not specified, autowarm processing is not done.

Implementation, LRUCache directly using Linkedhashmap to cache data, by InitialSize to limit the size of the cache, the elimination strategy is also the use of linkedhashmap built-in LRU method, read and write operations are the global lock map, So the concurrency effect is slightly worse. 1.2, Solr.search.FastLRUCache

In terms of configuration, Fastlrucache can optionally specify the following parameters in addition to the LRUCache parameters:

1) MinSize: When the cache reaches its maximum number, the elimination strategy drops it to minSize size, the default is 0.9*size.

2) Acceptablesize: When the elimination of data, expectations can be reduced to minsize, but may not be able to do, it can be reluctant to reduce to acceptablesize, the default is 0.95*size.

3) Cleanupthread: Compared to LRUCache is in the put operation in the synchronization of the elimination of work, Fastlrucache can be chosen by a separate thread to do, that is, the time to configure Cleanupthread. When the cache size is large, each time the elimination of data may take a long time, which is not appropriate for the thread that provides the query request, it is necessary to do it by a separate background thread.

Implementation, Fastlrucache internal use of Concurrentlrucache to cache data, it is an LRU elimination strategy concurrenthashmap, so its concurrency is much better, which is the most typical Java version of the cache implementation. 2, Filtercache

Filtercache stores the unordered set of Lucene document IDs, which has 3 uses:

1) Filtercache Stores the result of the document ID collection obtained by the filter queries ("FQ" parameter). There are two types of query parameters in SOLR, namely Q and FQ. If FQ exists, SOLR queries FQ first (because FQ can be multiple, so multiple FQ queries are the process of taking result intersections), and then the FQ results and Q results are taken. In this process, Filtercache is the key for a single FQ (type of query), and value is the cache of the Document ID collection (type docset). For FQ as range query, Filtercache shows a valuable side.

2) Filtercache can also be used for facet queries (Http://wiki.apache.org/solr/SolrFacetingOverview), The count of facets in a facet query is obtained by processing the document ID collection that satisfies the query criteria (which can involve Filtercache). Because statistical facet counts may involve all doc IDs, the size of the filtercache needs to be able to accommodate the number of documents indexed.

3) If <USEFILTERFORSORTEDQUERY/> is configured in the Solfconfig.xml, then if the query has a filter (this filter is a docset that needs filtering, not FQ, I don't know what it does), The Filtercache is used.

The following is a sample configuration for Filtercache:

    <!--Internal cache used by Solrindexsearcher for filters (docsets),
         unordered sets of *all* documents that match a Query.
         When a new searcher was opened, its caches could be prepopulated
         or "autowarmed" using data from caches in the old search Er.
         Autowarmcount is the number of items to prepopulate.  For LRUCache, the prepopulated items would be the most
         recently accessed items.
      --<filtercache class= "SOLR. LRUCache "
      size=" 16384 "
      initialsize=" 4096 "autowarmcount=" 4096 "
      />

For the use of Filtercache and how to configure Filtercache size, it needs to be evaluated according to the application characteristics, statistics, effects, experience and so on. For the use of FQ, facet of the application, the tuning of the filtercache is very necessary. 3, Queryresultcache

As the name implies, Queryresultcache is a cache of query results (the cache cached in Solrindexsearcher is the document ID set), which results in a fully ordered result for the query condition. Here is an example of its configuration:

  <!--Queryresultcache Caches results of searches-ordered lists of
 document IDs (DocList) based on a query, a sort , and the range of
         documents requested.
      --<queryresultcache class= "SOLR. LRUCache "
      size=" 16384 "
      initialsize=" 4096 "autowarmcount="
      1024x768 "/>

What structure is a cached key? is the following class (the hashcode of key is the member variable HC of Queryresultkey):

public queryresultkey (query query, list<query> filters, sort sort, int nc_flags) {This.query
    = query;
    This.sort = sort;
    This.filters = filters;
    This.nc_flags = Nc_flags;
    int h = Query.hashcode ();
    if (filters! = null) H ^= filters.hashcode (); Sfields = (This.sort!=null)?
    This.sort.getSort (): DEFAULTSORT; for (SortField sf:sfields) {//Mix the bits so that sortfields is position dependent//So, A, B won ' t Hash to the same value as B,a H ^= (H << 8) |   (h >>> 25);
      Reversible hash if (sf.getfield () = null) H + = Sf.getfield (). Hashcode ();
      H + = Sf.gettype ();
      if (Sf.getreverse ()) h=~h;
      if (Sf.getlocale ()!=null) H+=sf.getlocale (). Hashcode ();
    if (Sf.getfactory ()!=null) h+=sf.getfactory (). Hashcode ();
  } HC = h; }
Because the query parameter has start and rows, a queryresultkey may have hit the cache, but start and rows are not in the cache's document ID set range. Of course, the greater the probability that the document ID set is the larger the hit, but this can also be a waste of memory, which requires a parameter: Queryresultwindowsize to specify the size of the document ID set. The default value in SOLR is 50, configurable, and the explanation on the WIKI is deep and straightforward:
  <!--an optimization for use with the Queryresultcache.  When a search
         was requested, a superset of the requested number of document IDs is
         collected.  For example, the a search for a particular query
         requests matching documents through, and Querywindowsize are 50,< C6/>then documents 0 through is collected and cached.  Any further
         requests in that range can be satisfied via the cache.
    -
    <queryResultWindowSize>50</queryResultWindowSize>

Queryresultcache has less memory to use than Filtercache, but it's hard to say how it works. In terms of indexed data, we usually just store the application primary key ID on the index, and then get the other required fields from the data sources such as the database. This makes the query process become, first through SOLR get the document ID set, and then by SOLR to get the Application ID collection, finally from the external data source to complete the query results. If the query results are not stringent requirements, can be independent of SOLR outside of the full query results (timed void), then Queryresultcache is not very necessary, otherwise you can consider using Queryresultcache. Of course, if you find that query overlap is very low during the Queryresultcache life cycle, it is not necessary to open it. 4, Documentcache

As the name implies, Documentcache is used to keep <doc_id,document> right. If you use Documentcache, be as large as possible, at least larger than <max_results> * <max_concurrent_queries>, or else because the cache is eliminated, You will also need to retrieve the document once during a request. Also pay attention to the number of fields stored in the document, avoiding a lot of memory consumption.

The following is a sample configuration for Documentcache:

    <!--Documentcache caches Lucene Document objects (the stored fields for each document).
      --<documentcache class= "SOLR. LRUCache "
      size=" 16384 "
      initialsize=" 16384 "/>

5, User/generic Caches

SOLR supports custom caches, just to implement a custom regenerator, here is a sample configuration:

  <!--Example of a generic cache.  These caches may is accessed by name
         through Solrindexsearcher.getcache (), Cachelookup (), and CacheInsert ().
         The purpose is to enable easy caching of user/application level data.
         The regenerator argument should be specified as a implementation of
         Solr.search.CacheRegenerator if Autowarming is de Sired.
    -<!--
    <cache name= "Yourcachenamehere"
      class= "SOLR. LRUCache "
      size=" 4096 "
      initialsize=" 2048 "
      autowarmcount=" 4096 "
      regenerator=" Org.foo.bar.YourRegenerator "/>--
    


6. The Lucene fieldcache

Lucene has a relatively low level of FIELDCACHE,SOLR does not manage it, so Lucene's Fieldcache is still made by Lucene's indexsearcher. 7, Autowarm

There are two times when the first searcher is created (Firstsearcher), and the other is to create a new searcher (Newsearcher) instead of the current searcher, as mentioned in the Autowarm,autowarm trigger. Before searcher provides the request service, each cache in searcher can do warm processing, which is usually the Solrcache init method, and the warm strategy for different caches is not the same.

1) Filtercache:filtercache registered the following cacheregenerator, that is, the old key query index to get the new value put to the new cache.

   SolrConfig.filterCacheConfig.setRegenerator (
              new Cacheregenerator () {public
                Boolean regenerateitem ( Solrindexsearcher Newsearcher, Solrcache Newcache, Solrcache oldcache, Object Oldkey, Object Oldval) throws IOException { C3/>newsearcher.cachedocset ((Query) Oldkey, NULL, false);
                  return true;
                }
              }
      );


2) Queryresultcache:queryresultcache's autowarm is not in the Solrcache init (that is, it is not going to traverse the existing Queryresultcache query key to execute queries), Instead, the void Newsearcher (Solrindexsearcher newsearcher, Solrindexsearcher Currentsearcher) method through the Solreventlistener interface, To perform specific query queries in the configuration, to achieve the effect of the displayed preheat Lucene Fieldcache.

The configuration examples for Queryresultcache are as follows:

       <listener event= "Newsearcher" class= "SOLR. Querysenderlistener "> <arr name=" Queries "> <!--seed Common sort fields--and &LT;LST&G T <str name= "q" >anything</str> <str name= "sort" >name desc price desc populartiy desc</str> </ lst> </arr> </listener> <listener event= "Firstsearcher" class= "SOLR. Querysenderlistener "> <arr name=" Queries "> <!--seed Common sort fields--and &LT;LST&G T <str name= "q" >anything</str> <str name= "sort" >name desc, price desc, populartiy desc</str> < /lst> <!--seed common facets and filter queries--<lst> <str name= "q" >anything< /str> <str name= "Facet.field" >category</str> <str name= "FQ" >instock:tru e</str> <str name= "Fq" >price:[0 to 100]</str> </lst> </arr>
    </listener> 

3) Documentcache: Because the document ID of the new index and the corresponding relationship between the index documents change, so documentcache no warm process, end up in the whiteness is really clean.

Although the autowarm is very good, but also pay attention to the cost of autowarm, which requires in practice the cost of its warm, but also pay attention to the searcher switching frequency, to avoid the warm and switching effects searcher to provide normal query services.


This article transferred from: http://www.cnblogs.com/phinecos/archive/2012/05/24/2517018.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.