HBase 0.94. In 0, the split approach to region introduces a very handy splitpolicy, through which the splitpolicy can proactively intervene to control the way the region split. In the Org.apache.Hadoop.hbase.regionserver package, you can find so many splitpolicy:constantsizeregionsplitpolicy, Increasingtoupperboundregionsplitpolicy, and Keyprefixregionsplitpolicy.
The three types of split strategies can be distinguished from their names:
Constantsizeregionsplitpolicy: split region by fixed length, fixed-length value first gets table's "max_filesize" value, if not set this property, Using the Hbase.hregion.max.filesize value configured in Hbase-site.xml, the default value for this value in version 0.94 has been adjusted to: Ten * 1024x768 * 1024L is 10G, many online about Hbase.hregion.max.filesize default 1G articles should all be based on the 0.92 hbase. This requires a specific HBase version number in use. This policy is used by default before version 0.94, when a store size in a region of a table exceeds a predetermined maximum fixed length, and the region is split. The choice of Splitpoint algorithm is also based on the "data half" principle, find the middle length of the largest store in the region Rowkey split.
Increasingtoupperboundregionsplitpolicy: The region is divided by region number, this policy is the default policy used by HBase 0.94, which is divided by this policy is unequal. The size of each new region increases with the number of region. The specific growth method is: Min (r^2 * "Memstore_flushsize" | | " Hbase.hregion.memstore.flush.size "," hbase.hregion.max.filesize "), where R is the number of the region corresponding to this table in the current Regionserver , memstore_flushsize specifies the size for table creation and ignores the following hbase.hregion.memstore.flush.size if the table specifies this property.
Hbase.hregion.memstore.flush.size set size default 128M for Hbase-site
Hbase.hregion.max.filesize is a single region size set in Hbase-site, default 10G
Each region size is taken from the smaller of the two sizes mentioned above.
Assuming that hbase.hregion.memstore.flush.size 128M is used, Hregion.max.filesize is 10G, then each region growth is: 512m,1152m,2g,3,2g,4,6g, 6,2g,etc. When region grew to 9, 9*9*128m/1024=10.125g >10g, then the region split size was fixed to 10G.
Keyprefixregionsplitpolicy: Specifies the number of Rowkey prefix bits that are divided into region by reading the Prefix_split_key_policy.prefix_length property of the table, which is a numeric type That represents the prefix length,
at Split, the Splitpoint is intercepted at this length. The personal understanding is that the rowkey prefixes are unequal, then divide the region. This strategy is more suitable for fixed prefix rowkey. When the Prefix_split_key_policy.prefix_length property is not set in the table, or if the Prefix_split_key_policy.prefix_length property is not an integer type, Specifies that this policy effect is equivalent to using Increasingtoupperboundregionsplitpolicy.
650) this.width=650; "title=" Splitpolicy "alt=" Splitpolicy class inheritance relationship "src=" http://toby941-wordpress.stor.sinaapp.com/ Uploads/2013/12/plitpolicy.jpg "height=" "width=" 673 "style=" border:none; "/>
Attach code to specify Splicpolicy when creating or modifying a table
[Java] view plain copy
//Update the split strategy for existing tables
Hbaseadmin admin = new hbaseadmin (conf);
htable htable = new htable (conf, "test");
Htabledescriptor HTD = Htable.gettabledescriptor ();
-
htabledescriptor newhtd = new htabledescriptor ( HTD);
-
newhtd.setvalue (htabledescriptor. split_policy, Keyprefixregionsplitpolicy.class .getName ()); / / Specifies the policy
-
newhtd.setvalue ("Prefix_split_key_policy.prefix_length", "″);
Newhtd.setvalue ("Memstore_flushsize", "5242880″"); //5M
Admin.disabletable ("test");
Admin.modifytable (Bytes tobytes ("test"), NEWHTD);
Admin.enabletable ("test");
The region split strategy used by the HBASE1.0.1.1 currently used is increasingtoupperboundregionsplitpolicy.
Verify this by viewing the tdc_tweets_201604 table in the system through the HBase front end and finding that the table is split into 18 region, as follows:
650) this.width=650; "src=" Http://s4.51cto.com/wyfs02/M00/7F/DA/wKiom1cwCljgiZ0fAAFeNUbwOi4266.png "title=" 11.png "alt=" Wkiom1cwcljgiz0faafenubwoi4266.png "/>
View each region size with the Hadoop command, find the largest 7.4G, the smallest 88M, conform to the region split logic, as follows:
650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M01/7F/DA/wKiom1cwC1XR11ZhAABA8MwWUUQ757.png "title=" 12.png "alt=" Wkiom1cwc1xr11zhaaba8mwwuuq757.png "/>
HBASE Region split Strategy