Hash and pre-zoning design of HBase Rowkey

Source: Internet
Author: User

Transferred from: http://www.cnblogs.com/bdifn/p/3801737.html

Questions Guide:
1. How to prevent hot spots?
2. How do I pre-partition?
Extended:
Why is hotspot storage generated?

In HBase, the table is divided into 1...N region, which is hosted in Regionserver. Region Two important attributes: Startkey and EndKey represent the range of Rowkey maintained by this region, and when we want to read/write the data, if the Rowkey falls within a certain Start-end key range, it will be positioned to the target area and read/ Write to the relevant data. Simply put, there is a little bit like the crowd, 1-15 years old for children, 16-39 years old for young people, 40-64 for middle-aged, 65 years old for the elderly. (These values are taken out of the head, just for example, not real), then someone to find the team, and then according to age, in which range, to find the team they belong to. : (a bit of nonsense ....)
Then, by default, when we create a table just by hbaseadmin specifying Tabledescriptor, there is only one region, which is in a chaotic period, and start-end key has no boundaries, and it is a sea of rivers. What kind of rowkey are acceptable, all to this region, however, when the data more and more, the size of the region is getting larger and larger, to a certain threshold, hbase think again to this region to plug the data is not appropriate, You'll find a midkey that divides the region into 2 region, a process called splitting (region-split). The Midkey is the critical of the two region, the left n is no lower bound, and the right is m without upper bound. < Midkey the,> Midkey into the M area when the yin is plugged into the N area.

How to find Midkey? The content involved is much more, not to discuss, the simplest can be considered as the region of the total number of rows/2 of the row of data Rowkey. Although it's actually a little more complicated than it is.
If we do so by default, build tables, the table constantly put data, more serious is our rowkey or order increase, is more scary. The shortcomings of the present are more obvious.


The first is the hot write, we will always go to the largest start-key in the region to write things, because our rowkey will always be larger than before, and HBase is sorted in ascending order. So the write operation is always positioned in the region with no upper bounds.
Secondly, because of writing hot spots, we always go to the largest Start-key region to write records, before the division of the region will not be written data, a bit into the limbo bright, they are half full state, such distribution is unfavorable.
If the data grows faster and the split is more frequent in the scenario where the comparison is written, we don't want this to happen often because split is time-consuming and resource-intensive.
............


Seeing these shortcomings, we know that in a clustered environment, in order to get better parallelism, we want to have a good load blance, so that each node provides the request processing is equal. We also hope that region does not often split, because split will give the server a pause for a while, how can it be done?
Random hashing and pre-partitioning. The combination of the two, is relatively perfect, pre-zoning a beginning to build a part of the region, these regions are maintained by the Start-end keys, and then with random hashing, write data can hit these pre-built region evenly, can solve the above shortcomings, Greatly improves performance.

2 ideas are available: hash and partition.
Hash is Rowkey front by a string of random strings, random string generation can be generated by SHA or MD5, etc., as long as the region is managed by the Start-end keys range is relatively random, then you can solve the write hotspot problem.

Long CurrentID = 1l;byte [] Rowkey = Bytes.add (Md5hash.getmd5ashex (bytes.tobytes)). CurrentID (0, 8). SUBSTRING ( ),                    bytes.tobytes (CurrentID));

Assuming that Rowkey was originally a long type of self-growth, you can convert Rowkey to hash and then to bytes, plus its own ID to bytes, composed of rowkey, thus generating random rowkey. So how to rowkey the design of this way, how to do pre-partitioning it?
1. Sampling, generate a certain number of rowkey randomly, and put the sampled data in ascending order into a set
2. According to the number of regions in the pre-partitioned region, the average partition of the whole set is the related Splitkeys.
3.hbaseadmin.createtable (Htabledescriptor tabledescriptor,byte[][] splitkeys) can specify a pre-partitioned splitkey, That is, specify the Rowkey threshold between region.

1. Create the split calculator to generate a more appropriate Splitkeys from the sampled data

public class Hashchorewoker implements splitkeyscalculator{//random pickup number private int baserecord;    Rowkey Generator Private Rowkeygenerator Rkgen;    When sampling, the quantity obtained by dividing the number of samples and the number of region.    private int splitkeysbase;    Splitkeys number of private int splitkeysnumber;    Splitkeys results calculated by sampling private byte[][] splitkeys;        Public hashchorewoker (int baserecord, int prepareregions) {This.baserecord = Baserecord;        Instantiate Rowkey Generator Rkgen = new Hashrowkeygenerator ();        Splitkeysnumber = prepareRegions-1;    Splitkeysbase = baserecord/prepareregions;        } public byte[][] Calcsplitkeys () {Splitkeys = new byte[splitkeysnumber][];        Use TreeSet to save sampled data, sorted treeset<byte[]> rows = new treeset<byte[]> (bytes.bytes_comparator);        for (int i = 0; i < Baserecord; i++) {Rows.Add (Rkgen.nextid ());        } int pointer = 0;        iterator<byte[]> rowkeyiter = Rows.iterator ();        int index = 0; while (rowKeyiter.hasnext ()) {byte[] Temprow = Rowkeyiter.next ();            Rowkeyiter.remove ();                    if (pointer! = 0) && (pointer% splitkeysbase = 0)) {if (Index < splitkeysnumber) {                    Splitkeys[index] = Temprow;                Index + +;        }} pointer + +;        } rows.clear ();        rows = null;    return Splitkeys; }}
Keygenerator and implement//interfacepublic interface Rowkeygenerator {    byte [] NextID ();} Implementspublic class Hashrowkeygenerator implements Rowkeygenerator {    private long currentid = 1;    Private Long currenttime = System.currenttimemillis ();    Private random random = new random ();    Public byte[] NextID () {        try {            currenttime + = Random.nextint (+);            byte[] Lowt = Bytes.copy (Bytes.tobytes (CurrentTime), 4, 4);            byte[] Lowu = Bytes.copy (Bytes.tobytes (CurrentID), 4, 4);            Return Bytes.add (Md5hash.getmd5ashex (Bytes.add (Lowu, lowt)). substring (0, 8). GetBytes (),                    bytes.tobytes ( CurrentID));        } finally {            currentid++;}}    }

Unit Test Case Test

@Testpublic void Testhashandcreatetable () throws exception{        hashchorewoker worker = new Hashchorewoker (1000000,10) ;        byte [] Splitkeys = Worker.calcsplitkeys ();                Hbaseadmin admin = new Hbaseadmin (Hbaseconfiguration.create ());        TableName TableName = tablename.valueof ("hash_split_table");                if (admin.tableexists (TableName)) {            try {                admin.disabletable (tableName);            } catch (Exception e) {            }            admin.deletetable (tableName);        }        Htabledescriptor Tabledesc = new Htabledescriptor (tableName);        Hcolumndescriptor Columndesc = new Hcolumndescriptor (bytes.tobytes ("info"));        Columndesc.setmaxversions (1);        Tabledesc.addfamily (COLUMNDESC);        Admin.createtable (Tabledesc, Splitkeys);        Admin.close ();    }

View Table results: Execute scan ' Hbase:meta '

Above we just show some of the region's information, you can see the region of the Start-end key or relatively random hash. You can also view the directory structure of HDFs, which is exactly the same as the expected 38 pre-partitions:

Above, has been in accordance with the hash method, pre-built the partition, later in the insertion of data, but also in accordance with this rowkeygenerator way to generate Rowkey, interested, you can also do some experiments, insert some data, look at the distribution of data.

Partition so the name of meaning, is partitioned, this partition is somewhat similar to the partitioner in MapReduce, the region with a long integer (long) as the partition number, each region manages the corresponding area data, when the rowkey generated, the ID is modeled, Then spell the whole ID as rowkey. This is relatively simple, do not need to sample, Splitkeys is very simple, directly is the area code. Directly on the code bar:

public class Partitionrowkeymanager implements Rowkeygenerator,        splitkeyscalculator {public    static final int Default_partition_amount =;    Private long CurrentID = 1;    private int partition = Default_partition_amount;    public void setpartition (int partition) {        this.partition = partition;    }    Public byte[] NextID () {        try {            long PartitionID = currentid% partition;            Return Bytes.add (Bytes.tobytes (PartitionID),                    bytes.tobytes (CurrentID));        } finally {            currentid++;        }    }    Public byte[][] Calcsplitkeys () {        byte[][] Splitkeys = new byte[partition-1][];        for (int i = 1; i < partition; I + +) {            splitkeys[i-1] = Bytes.tobytes ((long) i);        }        return Splitkeys;}    }

Calcsplitkeys method Relatively simple, Splitkey is the number of partition, we look at the test class:

@Test public    void Testpartitionandcreatetable () throws exception{                Partitionrowkeymanager Rkmanager = new Partitionrowkeymanager ();        Only pre-built 10 partitions        rkmanager.setpartition (ten);                byte [] Splitkeys = Rkmanager.calcsplitkeys ();                Hbaseadmin admin = new Hbaseadmin (Hbaseconfiguration.create ());        TableName TableName = tablename.valueof ("partition_split_table");                if (admin.tableexists (TableName)) {            try {                admin.disabletable (tableName);            } catch (Exception e) {            }< C12/>admin.deletetable (TableName);        }        Htabledescriptor Tabledesc = new Htabledescriptor (tableName);        Hcolumndescriptor Columndesc = new Hcolumndescriptor (bytes.tobytes ("info"));        Columndesc.setmaxversions (1);        Tabledesc.addfamily (COLUMNDESC);        Admin.createtable (Tabledesc, Splitkeys);        Admin.close ();    }

Also we can look at the meta table and HDFs directory results, in fact, similar to hash, region will be divided into a good area, not here.

Through the partition implementation of Loadblance writing, of course, the generation of Rowkey way also to combine the current region number modulo, we can also do some experiments to see the distribution of data after insertion.
Here also to mention, if it is the sequential growth of the original ID, you can save the ID to a database, traditional or redis, each time you take, the value is set to 1000 or so, after the ID can grow in memory, when the amount of memory has more than 1000, then go to load next, Somewhat similar to the sqeuence in Oracle.

Random distribution plus pre-partitioning is not once and for all. Because the data is constantly growing, over time, has been divided into good areas, may have been unable to hold more data, of course, the further split, but also the performance of the loss problem, so we still have to plan the data growth rate, observe good data regular maintenance, On-demand analysis of whether to further branch the division by hand, or more serious is to create a new table, do a larger pre-partition and then data migration. Xiao Wu is only a rookie, Yun-dimensional aspect is only self-thought, for everyone to make a simple reference. If the data is not loaded, for the partition way pre-partitioning, if let it naturally split, the situation is a bit more serious. Because the partition number will be the same, so the calculation to PartitionID words, in fact, or back to the order of writing, there will be some hot write problems appear, if the use of partition way to generate the primary key, the data growth will be continuously adjusted partition, such as increased pre-partition, or join sub-partition number processing. (Our partition number is long and can be used as a multi-level partition)

OK, write here, basically already finished to prevent hot spots to write the method and prevent frequent split and take the pre-partition. But Rowkey design, far more than these, such as Rowkey length, and then its length can be the maximum of Char MaxValue, but read before I write KeyValue analysis know that Our data are stored in keyvalue mode in Memstore or hfile, each keyvalue will store rowkey information, if the rowkey is too large, such as 128 bytes, a row of 10 fields of the table, 1 million rows of records, Light Rowkey accounted for 1.2g+ so the length or not too long, in addition to design, or according to demand.

The final digression is that I want to share I built a project in GitHub and I wanted to do some hbase tools: Https://github.com/bdifn/hbase-tools, if you have Git installed locally, you can execute the command: Git clone Https://github.com/bdifn/hbase-tools.git currently adds a Region-helper sub-project and is currently the only subproject, the project uses MAVEN management, the main purpose is to help us design rowkey to do some reference , such as our design of random write and pre-partition testing, provides a sampling function, provides detection of random write function, and then statistics according to the current Rowkey design, random write n records, statistics of the number of records per region, and then display scale and so on.
Test Simulation Module I Cheng Simualtor, mainly analog hbase region behavior, simple implementation, only the above mentioned prediction of our Rowkey design, after the pre-partition, write data distribution ratio, and emulation is more realistic simulation, The idea is that when we write the data, we'll count the numbers, According to our hbase-site.xml setting, simulating memstore behavior, simulating the behavior of hfile, will eventually generate a table of reports, such as partition data size, whether split, and so on, for us to design the HBase table when there is a reference, but unfortunately, due to time, I only spent a bit of business More time simply a frame, there is no more than a step in the realization of the future time to improve, of course, we also welcome you to join together to learn it.

Project using MAVEN management, in order to facilitate testing, some components of the instantiation, I use Java spi,download source code, if you want to test their own rowkeygeneator, Once you have opened the Com.bdifn.hbasetools.regionhelper.rowkey.RowKeyGenerator file, replace it with your ID generator. If it is a hash, sampling and testing, etc., can be reused.

such as test code:

public class Hbasesimulatortest {//Get Hbasesimulator instance via SPI, the implementation of SPI is simgple private hbasesimulator hbase = Beanfac    Tory.getinstance (). Getbeaninstance (Hbasesimulator.class); To get the Rowkeygenerator instance, the SPI is implemented as Hashrowkey private rowkeygenerator Rkgen = Beanfactory.getinstance (). Getbeaninstance (    Rowkeygenerator.class);        First of all, to detect 100w sampling rowkey, and then generate a set of Splitkeys hashchorewoker worker = new Hashchorewoker (1000000,10);        @Test public void Testhash () {byte []] Splitkeys = Worker.calcsplitkeys ();        Hbase.createtable ("User", Splitkeys);        Insert 100 million records to see data distribution TableName TableName = tablename.valueof ("user");            for (int i = 0; i < 100000000; i + +) {Put put = new put (Rkgen.nextid ());        Hbase.put (TableName, put);    } hbase.report (TableName);        } @Test public void Testpartition () {//default-partitions.        Partitionrowkeymanager Rkmanager = new Partitionrowkeymanager (); byte [] Splitkeys =Rkmanager.calcsplitkeys ();                Hbase.createtable ("person", splitkeys);        TableName TableName = tablename.valueof ("person");            Insert 100 million records to see Data distribution for (int i = 0; i < 100000000; i + +) {Put put = new put (Rkmanager.nextid ());        Hbase.put (TableName, put);    } hbase.report (TableName); }}

Execution Result:

Execution reprort:[startrowkey:puts requsts: (Put ratio)]:9973569: (1.0015434) 1986344a\x00\x00\x00\x00\x00\x01\x0e\ xae:9999295: (1.0041268) 331ee65f\x00\x00\x00\x00\x00\x0f) g:10012532: (1.005456) 4cbfd4f6\x00\x00\x00\x00\x00\ x00o0:9975842: (1.0017716) 664c6388\x00\x00\x00\x00\x00\x02\x1du:10053337: (1.0095537) 800945e0\x00\x00\x00\x00\ x00\x01\xadv:9998719: (1.0040689) 99a158d9\x00\x00\x00\x00\x00\x0bz\xf3:10000563: (1.0042541) b33a2223\x00\x00\x00 \x00\x00\x07\xc6\xe6:9964921: (1.000675) ccbcf370\x00\x00\x00\x00\x00\x00*\xe2:9958200: (1.0) e63b8334\x00\x00\x00 \x00\x00\x03g\xc1:10063022: (1.0105262) Total requests:100000000execution reprort:[startrowkey:puts requsts: (Put Ratio)]:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x01:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x02:5000000: ( 1.0) \x00\x00\x00\x00\x00\x00\x00\x03:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x04:5000000: (1.0) \x00\x00\x00\ x00\x00\x00\x00\x05:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x06:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\ x07:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x08:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x09:5000000: (1.0) \x00\x00\x00\x00\ x00\x00\x00\x0a:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x0b:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x0c : 5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x0d:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x0e:5000000: (1.0) \x00 \x00\x00\x00\x00\x00\x00\x0f:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x10:5000000: (1.0) \x00\x00\x00\x00\x00\ x00\x00\x11:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x12:5000000: (1.0) \x00\x00\x00\x00\x00\x00\x00\x13 : 5000000: (1.0) Total requests:100000000

Original stickers Address: http://www.cnblogs.com/bdifn/p/3801737.html, reprint please specify

Hash and pre-zoning design of HBase Rowkey

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.