HBase Development Practice

Source: Internet
Author: User
Tags comparison regular expression rowcount split table definition table name
the manual was compiled in September 2014, for my home company, the main reference HBase authoritative guide for the summary of the practice, for reference only!

Please indicate the origin of the transfer. Baidu Library Link: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e preface

HBase is a distributed, column-oriented open source database. Unlike a generic relational database, HBase is a database suitable for unstructured data storage. Another difference is that HBase is based on columns rather than on a row based pattern. Hbase–hadoop Database is a highly reliable, high-performance, column-oriented, scalable, distributed storage system that leverages HBase technology to build large structured storage clusters on inexpensive PC servers.

In the development of the regulation layer, the Regulation file can input multiple sub rules, so the system splits up several regulation sub rules, which is used for the tariff matching query. Due to the imperfect design, regulatory files and regulation of the rules are saved to the hbase, so it will involve hbase paging and hbase sorting problems, although the problem is solved, but feel that the regulatory document does not need to put in the hbase, because it does not participate in tariff matching query.

Hint: Relational data operations try not to use HBase as a database. the first part of the data structure table Definition

Table name: Current table name definition "T_" + Entity name uppercase

Column family: The column family definition does not exceed three. Currently defines only one column family "Baseinfo" data structure definition

Attribute type: string is recommended. In a traditional database, because the size of the data is different, it will be used int, short, long, String, double to save, the definition of data format needs to be given a size, but in hbase, do not need to define the size of the data storage space.

Property name: Uppercase primary key

Primary key: Table name prefix +yyyymmdd+4 bit serial number

Note: The ordinal column is fetched from the self-added table and the daily definition is reset to 0. initialization of

The table name and the column family must be initialized before the table can be used. Example: Create ' T_xcontrol ', ' baseinfo ' The second part of the table initialization

Code reference: Inithbasetable.properties inithbasetable initialization policy Keep the latest version

Only need to save the latest version of the data hcolumndescriptor.setmaxversions (1) compression Strategy

Whether to use compression policy to automatically expire

Data is automatically expired, hcolumndescriptor.settimetolive (inttimetolive) to set the storage lifecycle Pre-partition of the data in the table

Introduction: HBase Region split strategy

Whether to write memory using a pre-partitioning policy

Whether you need to write to memory

Part III code development crud Operations

Simple crud Operations, refer to the HBase authoritative guide (Chinese version). PDF, Below is the crud operation of the HBase basic operation after object-oriented encapsulation. All DAO layers that use HBase as the storage database inherit the Hbasedaoimpl class, and the following examples are used. new action

Public String Add (Xcontrol control) throws Exception {

String id = hbaserowkeyutil.getrowkey (controltablename);

Control.setid (ID);

Control.setstatus (Status.ADD.getValue ());

Putdelete Pd=hbaseconvetorutil.convetor (Control,id);

Super.saveputdelete (Controltablename, PD);

return ID; Update action

Public String Update (Xcontrol control) throws Exception {

String id = control.getid ();

Putdelete Pd=hbaseconvetorutil.convetor (Control,id);

Super.saveputdelete (Controltablename, PD);

return ID;

query Operation

Public Xcontrol Getxcontrol (String id) throws Exception {

Return Super.get (xcontrol.class,controltablename, id);

} delete operation

public void Delete (String id) throws IOException {

Delete (controltablename, id);

} table instance pool

Creating a htable instance is a time-consuming operation that usually takes a few seconds to complete. In a resource-intensive environment with thousands of requests per second, creating a separate htable instance for each request is simply not feasible. Users should create instances at the outset and then reuse them in the client lifecycle.

However, there are other problems with reusing htable instances in a multithreaded environment.

Clients can solve this problem by Htablepool classes. It has only one purpose, which is to provide a client connection pool for the HBase cluster.

Htablepool class use reference Htablepoolutil table instance Pool

Get instance: Htablepoolutil.gethtablepoolutil (). gethtable (tablename);

Close instance: Htablepoolutil.gethtablepoolutil (). puthtable (tablename);

get total

Get totals leverages the HBase coprocessor feature

1. Configure

Add a configuration entry in $hbase_home/conf/hbase-site.xml. I use the 0.94 version of the implementation for Aggregateimplementation, specifically as follows

<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>

If you have not previously configured this item, you will need to restart the hbase to take effect once the configuration is complete.

2. Client use code example

Get the total number of eligible results

Public Long gettotal (String tablename, Filter valuefilter) {

Scan scan=new Scan ();

if (null!=valuefilter) {

Scan.setfilter (Valuefilter);

}

Aggregationclient aggregationclient = new Aggregationclient (conf);

Long rowcount = 0;

try {

Scan.addcolumn (Bytes.tobytes ("Baseinfo"), NULL)//must have this sentence, or use addfamily (), otherwise error, exception contains CI * * * *

RowCount = Aggregationclient.rowcount (Bytes.tobytes (tablename), NULL, scan);

catch (Throwable e) {

E.printstacktrace ();

}

return rowcount;

}

list Paging

HBase's paging implementation is relatively complex. The core idea is to combine the paging filter pagefilter (pageSize) and query settings to start line Scan.setstartrow (lastrow), lastrow for the last query Rowkey, note that the Rowkey is an array, The storage location corresponding to multiple fields;

Different user login will produce different lastrow, so we store lastrow into session, reference Pagelastrowcache.

To understand decoupling, we encapsulate the lastrow operations into Hbasedaoimpl so that we do not need to care about lastrow operations when we develop code writing.

Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {

Conditional filter

filterlist filterlist = new Querycontrolrulefilterlist (QO). Getfilterlist ();

Get the total number of eligible results

Long total = Gettotal (Controltablename, filterlist);

Filter Collection

FilterList fl=new filterlist ();

Paging Filter

Filter filter = new Pagefilter (pageSize);

Fl.addfilter (filterlist);

Fl.addfilter (filter);

Encapsulating result Sets

list<xcontrol> list = GetList (Xcontrol.class, Controltablename, FL, currteindex);

Log.info ("---------------------total:" + list.size ());

return result set

PageInfo page = new PageInfo (total, list);

return page;

list sort

HBase are sorted in dictionary order, which is descending, in which the earliest data (Rowkey smallest) is displayed in the front of the page.

The current solution is to add a Foreign key association table to the primary key, and the foreign key generation rule is

400000000000-The primary key number, such as the primary key is X201401110001, the corresponding foreign key is X198598889999, in order to achieve ascending sorting function, save the entity with X198598889999 as the primary key, Page query and then from the association table in accordance with X198598889999 to obtain X201401110001.

Note: You need to associate actions with new, delete, and query.

Example:

Public String Add (Xcontrol control) throws Exception {

PKCONTROLDAO.ADDXCONTROLFK (ID);

}

public void Delete (String id) throws Exception {

PKCONTROLDAO.DELETEXCONTROLFK (ID);

}

Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {

Matching a primary key based on a foreign key query

if (Stringutils.isnotblank (Qo.getid ())) {

Qo.setpks (Pkcontroldao.getxcontrolpks (Qo.getid ()));

}

Code reference: Hbaserowkeyutil pkxcontroldaohbaseimpl part fourth hbase query comparison operator

Note: All comparison operators match the data value of the database with the set value, rather than the set value to the database data value comparison

Less

Match values that are less than the set value

Less_or_equal

Match a value that is less than or equal to a set value

EQUAL

Match value equal to set value

Not_equal

Match values that are not equal to the set value

Greater_or_equal

Match values greater than or equal to set values

GREATER

Matches a value greater than the Set value

No_op

Exclude all values

Introduction to comparators

Binarycomparator

Use Bytes.compareto () to compare current values with thresholds

Binaryprefixcomparator

Similar to the above, using Bytes.compareto () to match, but starting from the left end prefix matching

Nullcomparator

Do not match, only to determine whether the current value is null

Bitcomparator

Bitwise comparison with (and), or (or), exclusive, or XOR operations provided by the Bitwiseop class

Regexstringcomparator

According to a regular expression, the data in the table is matched when the comparer is instantiated

Substringcomparator

The value of the valve and the data in the table are instances of string, while the contains () operation matches the strings

On the hbase of Comparefilter filters, we use Binarycomparator, Nullcomparator, Regexstringcomparator in the control project. Below the detailed description binarycomparator, nullcomparator use situation binarycomparator

Can be used for all comparison operators, so use it when equal, not equal, or range matched. nullcomparator

The comparator is used when the judgment is empty or is not empty.

In the use of nullcomparator, it is necessary to note that the definition of hbase is null. An example is provided:

Row1 's Endarea is not a value, but in HBase, it is empty.

The column is not present in the Row2 Endarea, and in HBase it represents null.

regexstringcomparator

Similar to the Substringcomparator comparator, often used to do string matching, with equals, not equal to comparison operators, can not be used in conjunction with the scope (less, GREATER ...). ) is used for comparison operations. General Filter introduction

HBase provides a lot of filters, detailed reference HBase authoritative guide (Chinese version). PDF, the filter used in this project, we mainly used the singlecolumnvaluefilter and paging time to use the pagefilter. Filter Application example

Note: The filter instances provided here are filtered for a single-column value.

Range filter: Less than, less than or equal to, greater than, greater than or equal to, greater than or equal to or less than, less than or greater than

Value filtering: equals, not equal to

String filtering: matching, mismatched

Empty filter: Empty, Non-empty

Code Reference filterhelper Filter Help class Filter set use

In traditional database queries, where A is often used. and b=. , or where a like. Or b=..

HBase to implement this feature, you need to use filterlist, example:

where A like. and b=. Can write like this

FilterList andlist = new filterlist (Operator.must_pass_all);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

where A like. or b=. Can write like this

FilterList andlist = new filterlist (operator.must_pass_one);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

Complex Query filtering

The above query is simpler, but more complex queries are often encountered in the actual business. For example: where (A like. and b=. ) or (where A is like. or b=?), compared with the example above, is actually more than a layer of nesting.

In HBase we can also nest filterlist to implement this complex query:

FilterList andlist = new filterlist (Operator.must_pass_all);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

FilterList orlist = new filterlist (operator.must_pass_one);

Orlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Orlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

filterlist list = new filterlist (operator.must_pass_one);

List.addfilter (andlist);

List.addfilter (orlist);

We use a lot of these filterlist nesting in the Regulation project, and the nesting levels are different according to the business logic. Fifth part hbase performance Optimization query caching

The default value of the scan caching property is 1, meaning the scanner captures 1 records at a time from the region server to match. We can set the caching to a much larger value than 1. For example, if you set to 500, you can crawl 500 at a time, and you need to be aware that the larger the value, the more memory overhead the server will have.

Htableinterface htable=gethtable (tablename);

Scan scan=new Scan ();

/* Set Cache * *

Scan.setcaching (Staticconfig.geticontrol_hbase_cache ());

Resultscanner scanner= Htable.getscanner (scan); multithreaded configuration

Hbase.regionser.handler.count

Number of RPC listener instances in Regionserver. For master, this property is the number of processing threads (handler) that Master accepts. The default value is 10.

According to the business scene of the regulation layer, the matching query of 1 tariff will produce 4 hbase concurrent query. If there are 20, there may be 80 concurrent, the concurrent amount is equivalent. In addition to the appropriate tuning of this parameter can increase the concurrent processing capacity, but also with the number of clusters and server configuration has a direct relationship, the number of clusters is expected, the higher the server CPU core, the more concurrent processing capacity. Pre- partitioning

Hregion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit means that different hregion can be distributed on different hregion servers. However, a hregion is not split into multiple servers.

Hbase.hregion.max.filesize

The maximum value of the hstorefile. If the storage file for any of the region in the region exceeds this limit, it will be split into two. Default: 268435456 (256x1024x1024), or 256M.

Our regulatory files are relatively small, to reach the maximum partition limit of 256M need more regulatory documents. In order to increase concurrency, we need to produce multiple hregion to save and process the data without reaching the partition cap, where the HBase function is used.

Example:

Configuration conf = hbaseconfiguration.create ()

Hbaseadmin admin = new hbaseadmin (conf);

Htabledescriptor desc = new Htabledescriptor (

Bytes.tobytes (tablename));

Hcolumndescriptor coldef = new Hcolumndescriptor (

Bytes.tobytes (colfamily));

Admin.createtable (DESC, bytes.tobytes (1L), Bytes.tobytes (10L), 10);

Divide the area with the first character differently

Desc.setvalue (Htabledescriptor.split_policy,

KeyPrefixRegionSplitPolicy.class.getName ());

Desc.setvalue ("Prefix_split_key_policy.prefix_length", "1"); Appendix: code see Baidu Library: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.