International - English

Cart Console

Topic Center

Contact Sales

Home > Others

HBase Development Practices

Last Update:2018-07-24 Source: Internet

Author: User

Tags integer regular expression rowcount sort split table definition

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This manual was compiled in September 2014, for I in the home company, the main reference HBase authoritative guide Practice summary, for reference only!

Please indicate the source of the transfer. Baidu Library Link: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e preface

HBase is a distributed, column-oriented, open-source database. HBase differs from the general relational database, which is a database suitable for unstructured data storage. The other difference is that HBase is column-based instead of row-based patterns. Hbase–hadoop Database is a highly reliable, high-performance, column-oriented, scalable distributed storage system that leverages HBase technology to build large-scale structured storage clusters on inexpensive PC servers.

In the development of the regulatory layer, the control file can be entered into a number of sub-rules, so the system split up a number of regulatory sub-rules, used for freight matching query. Due to the imperfect pre-design, the control file and the regulatory sub-rules are saved in hbase, so it will be related to hbase paging and hbase sorting problems, although the problem is resolved, but it does not feel that the control file is not necessary in hbase, because it does not participate in the price matching query.

Tip: Relational data operations try not to use HBase as a database. the first part of the data structure table Definition

Table name: Current table name definition "T_" + Entity name capitalization

Column family: There are no more than three column family definitions. Currently only defines a column family "Baseinfo" data structure definition

Property Type: string is recommended. In the traditional database because the size of the data is different, so it will be used int, short, long, String, double to save, the definition of data format needs to be given a size, but in hbase, there is no need to define the size of the data storage space.

Property name: Uppercase primary key

Primary key: Table name prefix +yyyymmdd+4 bit sequence number

Note: The ordinal column is obtained from the self-increment table, and the daily definition resets to 0. Initialize

The table name and column family must be initialized before the table can be used. Example: Create ' T_xcontrol ', ' baseinfo ' The initialization of the second part of the table

Code reference: Inithbasetable.properties inithbasetable initialization policy Keep the latest version

Only need to save the latest version of the data hcolumndescriptor.setmaxversions (1) compression policy

Whether to automatically expire using a compression policy

Data is automatically expired by hcolumndescriptor.settimetolive (inttimetolive) to set the storage life-time Pre-partitioning of data in the table

Description: HBase Region split strategy

Whether to write memory using a pre-partitioning policy

Whether to write memory

Part III code development crud Operations

For simple crud operations, refer to the HBase authoritative guide (Chinese version). PDF, Below is an object-oriented crud operation for HBase basic operations. All of the DAO layers that use HBase as the storage database inherit the Hbasedaoimpl class, and the following are examples of usage. new action

Public String Add (Xcontrol control) throws Exception {

String id = hbaserowkeyutil.getrowkey (controltablename);

Control.setid (ID);

Control.setstatus (Status.ADD.getValue ());

Putdelete Pd=hbaseconvetorutil.convetor (Control,id);

Super.saveputdelete (Controltablename, PD);

return ID; Update action

Public String Update (Xcontrol control) throws Exception {

String id = control.getid ();

Putdelete Pd=hbaseconvetorutil.convetor (Control,id);

Super.saveputdelete (Controltablename, PD);

return ID;

} query Operation

Public Xcontrol Getxcontrol (String id) throws Exception {

Return Super.get (xcontrol.class,controltablename, id);

} delete operation

public void Delete (String id) throws IOException {

Delete (controltablename, id);

} table instance pool

Creating an Htable instance is a time-consuming operation that typically takes several seconds to complete. In a highly resource-intensive environment, there are thousands of requests per second, and creating htable instances for each request is simply not a viable one. Users should create instances at the outset, and then reuse them throughout the client life cycle.

However, there are other issues with reusing htable instances in a multithreaded environment.

The client can solve this problem through the Htablepool class. It has only one purpose, which is to provide the client connection pool for the HBase cluster.

Use reference Htablepoolutil table instance pool for the Htablepool class

Get instance: Htablepoolutil.gethtablepoolutil (). gethtable (TableName);

Close instance: Htablepoolutil.gethtablepoolutil (). puthtable (TableName);

get total

Get totals take advantage of hbase coprocessor capabilities

1. Configuration

Add a configuration item to $hbase_home/conf/hbase-site.xml. I use the 0.94 version of the implementation as aggregateimplementation, specifically as follows

<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>

If this is not previously configured, it will take effect to restart HBase after the configuration is complete.

2. Client use code example

Get total Eligible results

Public Long gettotal (String tableName, Filter valuefilter) {

Scan scan=new scan ();

if (null!=valuefilter) {

Scan.setfilter (Valuefilter);

}

Aggregationclient aggregationclient = new Aggregationclient (conf);

Long RowCount = 0;

try {

Scan.addcolumn (Bytes.tobytes ("Baseinfo"), null);//must have this sentence, or with addfamily (), otherwise error, exception contains CI * * * *

RowCount = Aggregationclient.rowcount (Bytes.tobytes (tableName), null, scan);

} catch (Throwable e) {

E.printstacktrace ();

}

return rowCount;

}

list pagination

The paging implementation of HBase is relatively complex. The core idea is to combine the paging filter pagefilter (pageSize) and the query settings to start line Scan.setstartrow (lastrow), lastrow for the last query Rowkey, notice that the Rowkey is an array, corresponding to the storage location of multiple fields;

Different user login will produce different lastrow, so we store lastrow in session, refer to Pagelastrowcache.

To understand the decoupling, we also encapsulate the lastrow operation to Hbasedaoimpl so that we do not need to be concerned about LastRow's operation when writing code.

Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {

Conditional filters

filterlist filterlist = new Querycontrolrulefilterlist (QO). Getfilterlist ();

Get total Eligible results

Long total = Gettotal (Controltablename, filterlist);

Filter set

FilterList fl=new filterlist ();

Paging Filter

Filter filter = new Pagefilter (pageSize);

Fl.addfilter (filterlist);

Fl.addfilter (filter);

Encapsulating result Sets

list<xcontrol> list = GetList (Xcontrol.class, Controltablename, FL, currteindex);

Log.info ("---------------------total:" + list.size ());

return result set

PageInfo page = new PageInfo (total, list);

return page;

} list sort

HBase is sorted in dictionary order, that is, descending, in the page is the expression of the earliest data (Rowkey the smallest) row in front.

The current solution is to add a Foreign key association table to the primary key, and the foreign key generation rule is

400000000000-PRIMARY Key number, such as the primary key is X201401110001, the corresponding foreign key is X198598889999, in order to achieve the ascending sort function, save the entity with X198598889999 as the primary key, The page query is then retrieved from the association table according to X198598889999. X201401110001.

Note: Additional, delete, and query related actions are required.

Example:

Public String Add (Xcontrol control) throws Exception {

PKCONTROLDAO.ADDXCONTROLFK (ID);

}

public void Delete (String id) throws Exception {

PKCONTROLDAO.DELETEXCONTROLFK (ID);

}

Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {

Query out matching primary key based on foreign key

if (Stringutils.isnotblank (Qo.getid ())) {

Qo.setpks (Pkcontroldao.getxcontrolpks (Qo.getid ()));

}

Code reference: Hbaserowkeyutil pkxcontroldaohbaseimpl part Fourth hbase query comparison operators

Note: When all comparison operators match, the data values of the database are compared with the set values, rather than the data values of the database compared with the set values.

Less	Match values that are less than the set value
Less_or_equal	Match values less than or equal to the Set value
EQUAL	Match values equal to set values
Not_equal	Match values that are not equal to the set value
Greater_or_equal	Match values greater than or equal to the Set value
GREATER	Match a value greater than the Set value
No_op	Exclude all values

Comparator Introduction

Binarycomparator	Use Bytes.compareto () to compare current and threshold values
Binaryprefixcomparator	Similar to the above, use Bytes.compareto () to match, but prefix matches from the left side
Nullcomparator	Do not match, only judge whether the current value is null
Bitcomparator	Bitwise comparison with (and), or (or), XOR (XOR) operations provided by the Bitwiseop class
Regexstringcomparator	According to a regular expression, when instantiating the comparator, match the data in the table.
Substringcomparator	Use threshold and table data as string instances, and match strings by contains () operation

Represents hbase for Comparefilter-based filters, which we use for Binarycomparator, Nullcomparator, Regexstringcomparator in the control project, The following details the use of Binarycomparator, Nullcomparator binarycomparator

It can be used for all comparison operators, so it can be used when equal, not equal, and range matches. nullcomparator

The comparator is used when the judge is empty or not empty.

In the use of nullcomparator, it is important to note that hbase defines null. To illustrate:

The Row1 Endarea is not a value, but in HBase, it also represents null.

Row2 Endarea does not exist in this column, in HBase, it represents null.

regexstringcomparator

HBase offers a number of filters, with detailed reference to the HBase authoritative guide (Chinese version). PDF, the filter used in this project, we use the Singlecolumnvaluefilter and the pagefilter when paging. Filter Application example

Note: The filter instances provided here are filtered for single-column values.

Range filtering: Less than, less than or equal to, greater than, greater than or equal to, greater than or equal to or less than, less than, or greater than

Value filtering: equals, not equals

String filtering: matching, mismatched

Null filter: Empty, non-empty

Code reference Filterhelper Filtering Help class Filter Collection usage

In traditional database queries, where A like is often used. and b=. , or where a is like. or b=.

HBase implements this function and needs to use the filterlist, example:

where A like. and b=. Can write like this

FilterList andlist = new filterlist (Operator.must_pass_all);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

where A like. or b=. Can write like this

FilterList andlist = new filterlist (operator.must_pass_one);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

Complex Query filtering

The query above is relatively simple, but the actual business often encounters more complex queries. For example: where (A like. and b=. ) or (where A like. or b=?), in fact, compared to the above example, is actually more than a layer of nesting.

In HBase we can also nest filterlist to implement this complex query:

FilterList andlist = new filterlist (Operator.must_pass_all);

Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

FilterList orlist = new filterlist (operator.must_pass_one);

Orlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));

Orlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));

filterlist list = new filterlist (operator.must_pass_one);

List.addfilter (andlist);

List.addfilter (orlist);

In the regulatory project we use many of these filterlist nesting, different levels of nesting depending on the business logic. part fifth hbase performance optimization query cache

The default value of the caching property of scan is 1, which means that each time the scanner fetches 1 records from the region server to match. We can set caching to a much larger value than 1. For example, set to 500, you can crawl 500 at a time, and be aware that the larger the value is, the more memory overhead the server will have.

Htableinterface htable=gethtable (tableName);

Scan scan=new scan ();

/* Set Cache */

Scan.setcaching (Staticconfig.geticontrol_hbase_cache ());

Resultscanner scanner= Htable.getscanner (scan); multithreaded configuration

Hbase.regionser.handler.count

The number of RPC listener instances in the Regionserver. For master, this attribute is the number of processing threads (handler) accepted by master. The default value is 10.

According to the business scenario of the regulatory layer, the matching query of 1 freight rates will produce 4 hbase concurrent queries. If there are 20, there may be 80 concurrent, this concurrency is equivalent. In addition to the appropriate adjustment of this parameter can increase the concurrent processing capacity, but also with the number of clusters and the configuration of the server directly, it is expected that the more clusters, the higher the number of server CPU, the more concurrent processing power. Pre- partitioning

Hregion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit means that different hregion can be distributed on different hregion servers. However, a hregion is not split across multiple servers.

Hbase.hregion.max.filesize

The maximum value of the hstorefile. If the storage file for any of the column families in region exceeds this limit, it will be split into two region. Default: 268435456 (256x1024x1024), which is 256M.

Our control file is relatively small, to reach the maximum partition limit of 256M need more control files. In order to increase the concurrency, we need to generate multiple hregion to save and process the data without reaching the partition cap, where the pre-partitioning functionality of hbase is used.

Example:

Configuration conf = hbaseconfiguration.create ()

Hbaseadmin admin = new hbaseadmin (conf);

Htabledescriptor desc = new Htabledescriptor (

Bytes.tobytes (tablename));

Hcolumndescriptor coldef = new Hcolumndescriptor (

Bytes.tobytes (colfamily));

Admin.createtable (DESC, bytes.tobytes (1L), Bytes.tobytes (10L), 10);

Divide the area with the first character

Desc.setvalue (Htabledescriptor.split_policy,

KeyPrefixRegionSplitPolicy.class.getName ());

Desc.setvalue ("Prefix_split_key_policy.prefix_length", "1"); Appendix: code see Baidu Library: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

api development best practices software development workflow best practices software development best practices checklist software development best practices checklist agile software development principles patterns and practices practices for scaling lean agile development hbase nosql

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HBase Development Practices

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support