HBase Development Practices

Source: Internet
Author: User
Tags integer regular expression rowcount sort split table definition
This manual was compiled in September 2014, for I in the home company, the main reference HBase authoritative guide Practice summary, for reference only!

Please indicate the source of the transfer. Baidu Library Link: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e preface

HBase is a distributed, column-oriented, open-source database. HBase differs from the general relational database, which is a database suitable for unstructured data storage. The other difference is that HBase is column-based instead of row-based patterns. Hbase–hadoop Database is a highly reliable, high-performance, column-oriented, scalable distributed storage system that leverages HBase technology to build large-scale structured storage clusters on inexpensive PC servers.

In the development of the regulatory layer, the control file can be entered into a number of sub-rules, so the system split up a number of regulatory sub-rules, used for freight matching query. Due to the imperfect pre-design, the control file and the regulatory sub-rules are saved in hbase, so it will be related to hbase paging and hbase sorting problems, although the problem is resolved, but it does not feel that the control file is not necessary in hbase, because it does not participate in the price matching query.

Tip: Relational data operations try not to use HBase as a database. the first part of the data structure table Definition

Table name: Current table name definition "T_" + Entity name capitalization

Column family: There are no more than three column family definitions. Currently only defines a column family "Baseinfo" data structure definition

Property Type: string is recommended. In the traditional database because the size of the data is different, so it will be used int, short, long, String, double to save, the definition of data format needs to be given a size, but in hbase, there is no need to define the size of the data storage space.

Property name: Uppercase primary key

Primary key: Table name prefix +yyyymmdd+4 bit sequence number

Note: The ordinal column is obtained from the self-increment table, and the daily definition resets to 0. Initialize

The table name and column family must be initialized before the table can be used. Example: Create ' T_xcontrol ', ' baseinfo ' The initialization of the second part of the table

Code reference: Inithbasetable.properties inithbasetable initialization policy Keep the latest version

Only need to save the latest version of the data hcolumndescriptor.setmaxversions (1) compression policy

Whether to automatically expire using a compression policy

Data is automatically expired by hcolumndescriptor.settimetolive (inttimetolive) to set the storage life-time Pre-partitioning of data in the table

Description: HBase Region split strategy

Whether to write memory using a pre-partitioning policy

Whether to write memory

Part III code development crud Operations

For simple crud operations, refer to the HBase authoritative guide (Chinese version). PDF, Below is an object-oriented crud operation for HBase basic operations. All of the DAO layers that use HBase as the storage database inherit the Hbasedaoimpl class, and the following are examples of usage. new action

Public String Add (Xcontrol control) throws Exception {

String id = hbaserowkeyutil.getrowkey (controltablename);

Control.setid (ID);

Control.setstatus (Status.ADD.getValue ());

Putdelete Pd=hbaseconvetorutil.convetor (Control,id);

Super.saveputdelete (Controltablename, PD);

return ID; Update action

Public String Update (Xcontrol control) throws Exception {

String id = control.getid ();

Putdelete Pd=hbaseconvetorutil.convetor (Control,id);

Super.saveputdelete (Controltablename, PD);

return ID;

} query Operation

Public Xcontrol Getxcontrol (String id) throws Exception {

Return Super.get (xcontrol.class,controltablename, id);

} delete operation

public void Delete (String id) throws IOException {

Delete (controltablename, id);

} table instance pool

Creating an Htable instance is a time-consuming operation that typically takes several seconds to complete. In a highly resource-intensive environment, there are thousands of requests per second, and creating htable instances for each request is simply not a viable one. Users should create instances at the outset, and then reuse them throughout the client life cycle.

However, there are other issues with reusing htable instances in a multithreaded environment.

The client can solve this problem through the Htablepool class. It has only one purpose, which is to provide the client connection pool for the HBase cluster.

Use reference Htablepoolutil table instance pool for the Htablepool class

Get instance: Htablepoolutil.gethtablepoolutil (). gethtable (TableName);

Close instance: Htablepoolutil.gethtablepoolutil (). puthtable (TableName);

get total

Get totals take advantage of hbase coprocessor capabilities

1. Configuration

Add a configuration item to $hbase_home/conf/hbase-site.xml. I use the 0.94 version of the implementation as aggregateimplementation, specifically as follows

<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>

If this is not previously configured, it will take effect to restart HBase after the configuration is complete.

2. Client use code example

Get total Eligible results

Public Long gettotal (String tableName, Filter valuefilter) {

Scan scan=new scan ();

if (null!=valuefilter) {

Scan.setfilter (Valuefilter);

}

Aggregationclient aggregationclient = new Aggregationclient (conf);

Long RowCount = 0;

try {

Scan.addcolumn (Bytes.tobytes ("Baseinfo"), null);//must have this sentence, or with addfamily (), otherwise error, exception contains CI * * * *

RowCount = Aggregationclient.rowcount (Bytes.tobytes (tableName), null, scan);

} catch (Throwable e) {

E.printstacktrace ();

}

return rowCount;

}

list pagination

The paging implementation of HBase is relatively complex. The core idea is to combine the paging filter pagefilter (pageSize) and the query settings to start line Scan.setstartrow (lastrow), lastrow for the last query Rowkey, notice that the Rowkey is an array, corresponding to the storage location of multiple fields;

Different user login will produce different lastrow, so we store lastrow in session, refer to Pagelastrowcache.

To understand the decoupling, we also encapsulate the lastrow operation to Hbasedaoimpl so that we do not need to be concerned about LastRow's operation when writing code.

Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {

Conditional filters

filterlist filterlist = new Querycontrolrulefilterlist (QO). Getfilterlist ();

Get total Eligible results

Long total = Gettotal (Controltablename, filterlist);

Filter set

FilterList fl=new filterlist ();

Paging Filter

Filter filter = new Pagefilter (pageSize);

Fl.addfilter (filterlist);

Fl.addfilter (filter);

Encapsulating result Sets

list<xcontrol> list = GetList (Xcontrol.class, Controltablename, FL, currteindex);

Log.info ("---------------------total:" + list.size ());

return result set

PageInfo page = new PageInfo (total, list);

return page;

} list sort

HBase is sorted in dictionary order, that is, descending, in the page is the expression of the earliest data (Rowkey the smallest) row in front.

The current solution is to add a Foreign key association table to the primary key, and the foreign key generation rule is

400000000000-PRIMARY Key number, such as the primary key is X201401110001, the corresponding foreign key is X198598889999, in order to achieve the ascending sort function, save the entity with X198598889999 as the primary key, The page query is then retrieved from the association table according to X198598889999. X201401110001.

Note: Additional, delete, and query related actions are required.

Example:

Public String Add (Xcontrol control) throws Exception {

PKCONTROLDAO.ADDXCONTROLFK (ID);

}

public void Delete (String id) throws Exception {

PKCONTROLDAO.DELETEXCONTROLFK (ID);

}

Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {

Query out matching primary key based on foreign key

if (Stringutils.isnotblank (Qo.getid ())) {

Qo.setpks (Pkcontroldao.getxcontrolpks (Qo.getid ()));

}

Code reference: Hbaserowkeyutil pkxcontroldaohbaseimpl part Fourth hbase query comparison operators

Note: When all comparison operators match, the data values of the database are compared with the set values, rather than the data values of the database compared with the set values.

Less

Match values that are less than the set value

Less_or_equal

Match values less than or equal to the Set value

EQUAL

Match values equal to set values

Not_equal

Match values that are not equal to the set value

Greater_or_equal

Match values greater than or equal to the Set value

GREATER

Match a value greater than the Set value

No_op

Exclude all values

Comparator Introduction

Binarycomparator

Use Bytes.compareto () to compare current and threshold values

Binaryprefixcomparator

Similar to the above, use Bytes.compareto () to match, but prefix matches from the left side

Nullcomparator

Do not match, only judge whether the current value is null

Bitcomparator

Bitwise comparison with (and), or (or), XOR (XOR) operations provided by the Bitwiseop class

Regexstringcomparator

According to a regular expression, when instantiating the comparator, match the data in the table.

Substringcomparator

Use threshold and table data as string instances, and match strings by contains () operation

Represents hbase for Comparefilter-based filters, which we use for Binarycomparator, Nullcomparator, Regexstringcomparator in the control project, The following details the use of Binarycomparator, Nullcomparator binarycomparator

It can be used for all comparison operators, so it can be used when equal, not equal, and range matches. nullcomparator

The comparator is used when the judge is empty or not empty.

In the use of nullcomparator, it is important to note that hbase defines null. To illustrate:

The Row1 Endarea is not a value, but in HBase, it also represents null.

Row2 Endarea does not exist in this column, in HBase, it represents null.

regexstringcomparator

Similar to the Substringcomparator comparator, often used to do string matching, with equal, not equal to the comparison operator, can not be used with the scope (less, GREATER ... ) The comparison operation is used. Common Filter introduction

HBase offers a number of filters, with detailed reference to the HBase authoritative guide (Chinese version). PDF, the filter used in this project, we use the Singlecolumnvaluefilter and the pagefilter when paging. Filter Application example

Note: The filter instances provided here are filtered for single-column values.

Range filtering: Less than, less than or equal to, greater than, greater than or equal to, greater than or equal to or less than, less than, or greater than

Value filtering: equals, not equals

String filtering: matching, mismatched

Null filter: Empty, non-empty

Code reference Filterhelper Filtering Help class Filter Collection usage

In traditional database queries, where A like is often used. and b=. , or where a is like. or b=.

HBase implements this function and needs to use the filterlist, example:

where A like. and b=. Can write like this

FilterList andlist = new filterlist (Operator.must_pass_all);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

where A like. or b=. Can write like this

FilterList andlist = new filterlist (operator.must_pass_one);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

Complex Query filtering

The query above is relatively simple, but the actual business often encounters more complex queries. For example: where (A like. and b=. ) or (where A like. or b=?), in fact, compared to the above example, is actually more than a layer of nesting.

In HBase we can also nest filterlist to implement this complex query:

FilterList andlist = new filterlist (Operator.must_pass_all);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

FilterList orlist = new filterlist (operator.must_pass_one);

Orlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Orlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

filterlist list = new filterlist (operator.must_pass_one);

List.addfilter (andlist);

List.addfilter (orlist);

In the regulatory project we use many of these filterlist nesting, different levels of nesting depending on the business logic. part fifth hbase performance optimization query cache

The default value of the caching property of scan is 1, which means that each time the scanner fetches 1 records from the region server to match. We can set caching to a much larger value than 1. For example, set to 500, you can crawl 500 at a time, and be aware that the larger the value is, the more memory overhead the server will have.

Htableinterface htable=gethtable (tableName);

Scan scan=new scan ();

/* Set Cache */

Scan.setcaching (Staticconfig.geticontrol_hbase_cache ());

Resultscanner scanner= Htable.getscanner (scan); multithreaded configuration

Hbase.regionser.handler.count

The number of RPC listener instances in the Regionserver. For master, this attribute is the number of processing threads (handler) accepted by master. The default value is 10.

According to the business scenario of the regulatory layer, the matching query of 1 freight rates will produce 4 hbase concurrent queries. If there are 20, there may be 80 concurrent, this concurrency is equivalent. In addition to the appropriate adjustment of this parameter can increase the concurrent processing capacity, but also with the number of clusters and the configuration of the server directly, it is expected that the more clusters, the higher the number of server CPU, the more concurrent processing power. Pre- partitioning

Hregion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit means that different hregion can be distributed on different hregion servers. However, a hregion is not split across multiple servers.

Hbase.hregion.max.filesize

The maximum value of the hstorefile. If the storage file for any of the column families in region exceeds this limit, it will be split into two region. Default: 268435456 (256x1024x1024), which is 256M.

Our control file is relatively small, to reach the maximum partition limit of 256M need more control files. In order to increase the concurrency, we need to generate multiple hregion to save and process the data without reaching the partition cap, where the pre-partitioning functionality of hbase is used.

Example:

Configuration conf = hbaseconfiguration.create ()

Hbaseadmin admin = new hbaseadmin (conf);

Htabledescriptor desc = new Htabledescriptor (

Bytes.tobytes (tablename));

Hcolumndescriptor coldef = new Hcolumndescriptor (

Bytes.tobytes (colfamily));

Admin.createtable (DESC, bytes.tobytes (1L), Bytes.tobytes (10L), 10);

Divide the area with the first character

Desc.setvalue (Htabledescriptor.split_policy,

KeyPrefixRegionSplitPolicy.class.getName ());

Desc.setvalue ("Prefix_split_key_policy.prefix_length", "1"); Appendix: code see Baidu Library: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.