The implementation mechanism of block cache for analytic hbase

Last Update:2017-02-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article unifies the HBase 0.94 1 version source code, carries on the analysis to the HBase block cache realization mechanism, summarizes learns its cache design core thought.

1. Overview

HBase regionserver memory is divided into two parts, part as Memstore, mainly used for writing, and the other part as Blockcache, mainly for reading.

Write requests will be written first memstore,regionserver will give each region a memstore, and when Memstore full 64MB, will start flush refresh to disk. When the total size of the memstore exceeds the limit (heapsize * hbase.regionserver.global.memstore.upperLimit * 0.9), the flush process is forced to start. Start flush from the largest memstore until below the limit.

Read the request first to Memstore to check the data, can not find the Blockcache to check, and then find the disk will be read, and the results of the reading into the Blockcache. Since Blockcache adopts the LRU strategy, Blockcache reaches the upper limit (heapsize * hfile.block.cache.size * 0.85), it will start the elimination mechanism, eliminating the oldest batch of data.

A regionserver has a blockcache and N Memstore, and their size cannot be greater than or equal to heapsize * 0.8, otherwise the hbase will not start normally.

By default, Blockcache is 0.2, and Memstore is 0.4. In the focus on read response time scenarios, you can set the Blockcache larger, Memstore set smaller, to increase the cache hit rate.

HBase Regionserver contains three levels of block priority queues:

Single: If a block is accessed for the first time, it is placed in this priority queue;

Multi: If a block is accessed multiple times, move from the single queue to the Multi queue;

InMemory: If a block is inmemory, it is placed in this queue.

The advantage of the idea that the cache is graded is:

First, by InMemory type cache, the In-memory column families can be selectively placed in regionserver memory, such as meta metadata information;

By distinguishing between single and multi type cache, the least used block is added to the elimination algorithm because of frequent cache bumps caused by the scan operation.

By default, for the entire Blockcache memory, the following percentages are assigned to single, Multi, inmemory use: 0.25, 0.50, and 0.25.

Note that the InMemory queue is used to hold hbase meta data information, so if you set the user table with a large amount of data to inmemory, the meta table cache may fail, which can have an impact on the performance of the entire cluster.

2. Source Code Analysis

The following is an analysis of the related source code (org.apache.hadoop.hbase.io.hfile.LruBlockCache) in HBase 0.94.1.

2.1 Add block Cache

/** Concurrent Map (the cache) * * Private final concurrenthashmap<blockcachekey,cachedblock> map;
   /** * Cache The block with the specified name and buffer.  * <p> * It is assumed this'll NEVER be called on a already cached block.
   If * is done, an exception would be thrown.  * @param cachekey block ' s cache key * @param buf blocks buffer * @param inmemory if block is in-memory/public
    void Cacheblock (Blockcachekey cachekey, cacheable buf, Boolean inmemory) {Cachedblock cb = Map.get (CacheKey);
    if (CB!= NULL) {throw new RuntimeException ("Cached an already Cached block");
    cb = new Cachedblock (CacheKey, buf, Count.incrementandget (), inmemory);
    Long newsize = Updatesizemetrics (cb, FALSE);
    Map.put (CacheKey, CB);
    Elements.incrementandget ();
    if (NewSize > Acceptablesize () &&!evictioninprogress) {runeviction (); }/** * Cache The block with the specified name and BUFFer.  * <p> * It is assumed this'll NEVER be called on a already cached block. If * is done, it's assumed that's you are reinserting the same exact * blocks due to a race condition and'll up
   Date the buffer but not modify * the size of the cache. * @param cachekey block ' s cache key * @param buf block buffer/public void Cacheblock (Blockcachekey cachekey, Ca
  Cheable buf) {cacheblock (CacheKey, buf, false); }

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The implementation mechanism of block cache for analytic hbase

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support