The split strategy of HBase split source analysis

Source: Internet
Author: User
Tags compact

In the work of contact with split, so look at this piece of source code, first saw the split strategy, today say this bar, follow-up will have split other source analysis and compact related source analysis.

Read a lot of other people's blogs, many of them are forwarded, original also did not indicate which version. In fact, to confuse many readers, I am here based on the Hbase-0.98.13 version as an analysis, note: Different versions of this part of the source code is likely not the same.

The split strategy used in this release is Increasingtoupperboundregionsplitpolicy. In fact, he is the strategy after the 0.94 version. Class is Org/apache/hadoop/hbase/regionserver/increasingtoupperboundregionsplitpolicy.java
First look at the Configureforregion method, where the initialsize will be used in the future. The main purpose of this method is to initialize the initialsize

 @Override protected void Configureforregion (hregion    region) {super.configureforregion (region);    Configuration conf = getconf (); If Hbase.increasing.policy.initial.size is set, the user-set this.initialsize = Conf.getlong is used ("    Hbase.increasing.policy.initial.size ",-1);    if (This.initialsize > 0) {return; }//If not set, see if hbase.hregion.memstore.flush.size has//If set initialsize=2*hbase.hregion.memstore.flush.size,//If not    Use the default 1024*1024*128l (128M) Htabledescriptor desc = Region.gettabledesc ();    if (desc! = null) {this.initialsize = 2*desc.getmemstoreflushsize ();        } if (this.initialsize <= 0) {this.initialsize = 2*conf.getlong (Hconstants.hregion_memstore_flush_size,    Htabledescriptor.default_memstore_flush_size); }  }

If configured by default, this method initializes initialsize to 2*hbase.hregion.memstore.flush.size
Look at the other methods, there is a method called Shouldsplit, as the name implies is to judge can split.

@Override protected Boolean shouldsplit () {if (Region.shouldforcesplit ()) return true;    Boolean foundabigstore = false; Get the number of online region of the same table/get count of regions that has the same common table as this.region int tableregionscount = GE    Tcountofcommontableregions ();    Get the split threshold//get size to check long Sizetocheck = Getsizetocheck (Tableregionscount); Check each store, if there is no split, then this judgment is false for (Store store:region.getStores (). values ()) {//If any of the stores is U Nable to split (eg they contain reference files)/and then don t split//If the current region cannot be split, then false if (!store.ca      Nsplit ())) {return false;      }//Mark If any store is big enough long size = Store.getsize (); if (Size > Sizetocheck) {log.debug ("shouldsplit because" + store.getcolumnfamilyname () + "size=" + S        Ize + ", sizetocheck=" + Sizetocheck + ", regionswithcommontable=" + tableregionscount);      Foundabigstore = true; }   } return Foundabigstore; }

where Long Sizetocheck = Getsizetocheck (Tableregionscount); This is important. Follow up to see

Protected long Getsizetocheck (final int tableregionscount) {    //Safety Check for + avoid numerical overflow in ex Treme Cases    Return Tableregionscount = = 0 | | tableregionscount > Getdesiredmaxfilesize ():      math.min ( Getdesiredmaxfilesize (),        this.initialsize * tableregionscount * tableregionscount * tableregionscount);  }

This is a three mesh operation, if the number of the region in the table is 0 or greater than 100, then use the Getdesiredmaxfilesize () method to get this threshold, Otherwise, use Getdesiredmaxfilesize () to get the threshold and InitialSize * (Tableregionscount three times) The small one, in the follow-up to Getdesiredmaxfilesize method to see

Long Getdesiredmaxfilesize () {    return desiredmaxfilesize;  }

extends Constantsizeregionsplitpolicy, this is not a clue, just look at this class and find the following code

private long desiredmaxfilesize;    @Override protected void Configureforregion (Hregion region), {super.configureforregion (region);    Configuration conf = getconf ();    Htabledescriptor desc = Region.gettabledesc ();    if (desc! = null) {this.desiredmaxfilesize = Desc.getmaxfilesize (); }//Set desiredmaxfilesize = hbase.hregion.max.filesize The default size is 10G if (this.desiredmaxfilesize <= 0) {This.desi    Redmaxfilesize = Conf.getlong (hconstants.hregion_max_filesize, hconstants.default_max_file_size); }//If Hbase.hregion.max.filesize.jitter is set, Desiredmaxfilesize does a jitter float jitter = conf.getfloat ("Hbase.hregion.max.    Filesize.jitter ", Float.nan); if (!    Float.isnan (jitter)) {this.desiredmaxfilesize + = (long) (Desiredmaxfilesize * (Random.nextfloat ()-0.5D) * jitter); }  }

if Hbase.hregion.max.filesize.jitter is set, use Hregion_max_filesize + hregion_max_filesize* random decimals * Hbase.hregion.max.filesize.jitter, where jitter defaults to 0.5,hregion_max_filesize is actually Hbase.hregion.max.filesize, the default is 10G, as for why jitter, some people say is to prevent restarting Regionserver when a large number of major compact, which I temporarily do not understand, first put.

back in the Shouldsplit method, Let's see what the Cansplit method does.

@Override Public  Boolean cansplit () {    this.lock.readLock (). Lock ();    try {      //not split-able if we find a reference store file present in the store.      Boolean result =!hasreferences ();      if (!result && log.isdebugenabled ()) {        log.debug ("Cannot split region due to reference files being There"); 
    }      return result;    finally {      this.lock.readLock (). Unlock ();    }  }



Very simple, is to see if there is no reference file, if there is not split, if not, you can, again back toShouldsplit method, you can see that if the size of the current store is greater than the threshold that was just calculated, then the return is true, which is judged by split.

OK, here's a summary:

HBase slices a region with several conditions:
1, if the user request segmentation, no matter what the situation can be sliced.
2. If non-user requests and any store in this region contains reference files, do not slice
3, if not the user request, and no reference file, then determine the size of each store, as long as there is a greater than the threshold, then slice. This threshold has been mentioned above.

The meaning of this strategy
The 0.94 version was preceded by the constantsizeregionsplitpolicy policy, which allows split to be only larger than a basic fixed threshold, while the current strategy is that the store size is larger than a change in threshold to allow split, What do you mean, for example, when the properties of HBase-related split are not configured, by default, a table is just established, by default only 1 region, then logically when the region's store size exceeds 1*1*1*flushsize*2 = 128M Split is allowed at 256M, if this value is sliced, there will be two region, where one of the store size is greater than 2*2*2*flushsize*2 = 2048M, split is allowed, so the calculation Until this size exceeds the hbase.hregion.max.filesize+ hbase.hregion.max.filesize* Random decimals * Hbase.hregion.max.filesize.jitter only allow split, the basic is fixed, if the poor calculation can put this hbase.hregion.max.filesize The size as the last threshold, the default is 10G, it is said that when the threshold value changes to 10G, this threshold is basically no longer changing.

This thought makes the threshold reached a basic fixed value before doing a few split, and these several split data volume is very small, the impact on hbase is not so big, and the equivalent of data import amount of time to do a "pre-region", In a certain sense, reduce the occurrence of hot spots in the future.






The split strategy of HBase split source analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.