The implementation mechanism of block cache for analytic hbase

Source: Internet
Author: User

This article unifies the HBase 0.94 1 version source code, carries on the analysis to the HBase block cache realization mechanism, summarizes learns its cache design core thought.

1. Overview

HBase regionserver memory is divided into two parts, part as Memstore, mainly used for writing, and the other part as Blockcache, mainly for reading.

Write requests will be written first memstore,regionserver will give each region a memstore, and when Memstore full 64MB, will start flush refresh to disk. When the total size of the memstore exceeds the limit (heapsize * hbase.regionserver.global.memstore.upperLimit * 0.9), the flush process is forced to start. Start flush from the largest memstore until below the limit.

Read the request first to Memstore to check the data, can not find the Blockcache to check, and then find the disk will be read, and the results of the reading into the Blockcache. Since Blockcache adopts the LRU strategy, Blockcache reaches the upper limit (heapsize * hfile.block.cache.size * 0.85), it will start the elimination mechanism, eliminating the oldest batch of data.

A regionserver has a blockcache and N Memstore, and their size cannot be greater than or equal to heapsize * 0.8, otherwise the hbase will not start normally.

By default, Blockcache is 0.2, and Memstore is 0.4. In the focus on read response time scenarios, you can set the Blockcache larger, Memstore set smaller, to increase the cache hit rate.

HBase Regionserver contains three levels of block priority queues:

Single: If a block is accessed for the first time, it is placed in this priority queue;

Multi: If a block is accessed multiple times, move from the single queue to the Multi queue;

InMemory: If a block is inmemory, it is placed in this queue.

The advantage of the idea that the cache is graded is:

First, by InMemory type cache, the In-memory column families can be selectively placed in regionserver memory, such as meta metadata information;

By distinguishing between single and multi type cache, the least used block is added to the elimination algorithm because of frequent cache bumps caused by the scan operation.

By default, for the entire Blockcache memory, the following percentages are assigned to single, Multi, inmemory use: 0.25, 0.50, and 0.25.

Note that the InMemory queue is used to hold hbase meta data information, so if you set the user table with a large amount of data to inmemory, the meta table cache may fail, which can have an impact on the performance of the entire cluster.

2. Source Code Analysis

The following is an analysis of the related source code (org.apache.hadoop.hbase.io.hfile.LruBlockCache) in HBase 0.94.1.

2.1 Add block Cache

/** Concurrent Map (the cache) * * Private final concurrenthashmap<blockcachekey,cachedblock> map;
   /** * Cache The block with the specified name and buffer.  * <p> * It is assumed this'll NEVER be called on a already cached block.
   If * is done, an exception would be thrown.  * @param cachekey block ' s cache key * @param buf blocks buffer * @param inmemory if block is in-memory/public
    void Cacheblock (Blockcachekey cachekey, cacheable buf, Boolean inmemory) {Cachedblock cb = Map.get (CacheKey);
    if (CB!= NULL) {throw new RuntimeException ("Cached an already Cached block");
    cb = new Cachedblock (CacheKey, buf, Count.incrementandget (), inmemory);
    Long newsize = Updatesizemetrics (cb, FALSE);
    Map.put (CacheKey, CB);
    Elements.incrementandget ();
    if (NewSize > Acceptablesize () &&!evictioninprogress) {runeviction (); }/** * Cache The block with the specified name and BUFFer.  * <p> * It is assumed this'll NEVER be called on a already cached block. If * is done, it's assumed that's you are reinserting the same exact * blocks due to a race condition and'll up
   Date the buffer but not modify * the size of the cache. * @param cachekey block ' s cache key * @param buf block buffer/public void Cacheblock (Blockcachekey cachekey, Ca
  Cheable buf) {cacheblock (CacheKey, buf, false); }

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.