the manual was compiled in September 2014, for my home company, the main reference HBase authoritative guide for the summary of the practice, for reference only!
Please indicate the origin of the transfer. Baidu Library Link: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e preface
HBase is a distributed, column-oriented open source database. Unlike a generic relational database, HBase is a database suitable for unstructured data storage. Another difference is that HBase is based on columns rather than on a row based pattern. Hbase–hadoop Database is a highly reliable, high-performance, column-oriented, scalable, distributed storage system that leverages HBase technology to build large structured storage clusters on inexpensive PC servers.
In the development of the regulation layer, the Regulation file can input multiple sub rules, so the system splits up several regulation sub rules, which is used for the tariff matching query. Due to the imperfect design, regulatory files and regulation of the rules are saved to the hbase, so it will involve hbase paging and hbase sorting problems, although the problem is solved, but feel that the regulatory document does not need to put in the hbase, because it does not participate in tariff matching query.
Hint: Relational data operations try not to use HBase as a database. the first part of the data structure table Definition
Table name: Current table name definition "T_" + Entity name uppercase
Column family: The column family definition does not exceed three. Currently defines only one column family "Baseinfo" data structure definition
Attribute type: string is recommended. In a traditional database, because the size of the data is different, it will be used int, short, long, String, double to save, the definition of data format needs to be given a size, but in hbase, do not need to define the size of the data storage space.
Property name: Uppercase primary key
Primary key: Table name prefix +yyyymmdd+4 bit serial number
Note: The ordinal column is fetched from the self-added table and the daily definition is reset to 0. initialization of
The table name and the column family must be initialized before the table can be used. Example: Create ' T_xcontrol ', ' baseinfo ' The second part of the table initialization
Code reference: Inithbasetable.properties inithbasetable initialization policy Keep the latest version
Only need to save the latest version of the data hcolumndescriptor.setmaxversions (1) compression Strategy
Whether to use compression policy to automatically expire
Data is automatically expired, hcolumndescriptor.settimetolive (inttimetolive) to set the storage lifecycle Pre-partition of the data in the table
Introduction: HBase Region split strategy
Whether to write memory using a pre-partitioning policy
Whether you need to write to memory
Part III code development crud Operations
Simple crud Operations, refer to the HBase authoritative guide (Chinese version). PDF, Below is the crud operation of the HBase basic operation after object-oriented encapsulation. All DAO layers that use HBase as the storage database inherit the Hbasedaoimpl class, and the following examples are used. new action
Public String Add (Xcontrol control) throws Exception {
String id = hbaserowkeyutil.getrowkey (controltablename);
Control.setid (ID);
Control.setstatus (Status.ADD.getValue ());
Putdelete Pd=hbaseconvetorutil.convetor (Control,id);
Super.saveputdelete (Controltablename, PD);
return ID; Update action
Public String Update (Xcontrol control) throws Exception {
String id = control.getid ();
Putdelete Pd=hbaseconvetorutil.convetor (Control,id);
Super.saveputdelete (Controltablename, PD);
return ID;
query Operation
Public Xcontrol Getxcontrol (String id) throws Exception {
Return Super.get (xcontrol.class,controltablename, id);
} delete operation
public void Delete (String id) throws IOException {
Delete (controltablename, id);
} table instance pool
Creating a htable instance is a time-consuming operation that usually takes a few seconds to complete. In a resource-intensive environment with thousands of requests per second, creating a separate htable instance for each request is simply not feasible. Users should create instances at the outset and then reuse them in the client lifecycle.
However, there are other problems with reusing htable instances in a multithreaded environment.
Clients can solve this problem by Htablepool classes. It has only one purpose, which is to provide a client connection pool for the HBase cluster.
Htablepool class use reference Htablepoolutil table instance Pool
Get instance: Htablepoolutil.gethtablepoolutil (). gethtable (tablename);
Close instance: Htablepoolutil.gethtablepoolutil (). puthtable (tablename);
get total
Get totals leverages the HBase coprocessor feature
1. Configure
Add a configuration entry in $hbase_home/conf/hbase-site.xml. I use the 0.94 version of the implementation for Aggregateimplementation, specifically as follows
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>
If you have not previously configured this item, you will need to restart the hbase to take effect once the configuration is complete.
2. Client use code example
Get the total number of eligible results
Public Long gettotal (String tablename, Filter valuefilter) {
Scan scan=new Scan ();
if (null!=valuefilter) {
Scan.setfilter (Valuefilter);
}
Aggregationclient aggregationclient = new Aggregationclient (conf);
Long rowcount = 0;
try {
Scan.addcolumn (Bytes.tobytes ("Baseinfo"), NULL)//must have this sentence, or use addfamily (), otherwise error, exception contains CI * * * *
RowCount = Aggregationclient.rowcount (Bytes.tobytes (tablename), NULL, scan);
catch (Throwable e) {
E.printstacktrace ();
}
return rowcount;
}
list Paging
HBase's paging implementation is relatively complex. The core idea is to combine the paging filter pagefilter (pageSize) and query settings to start line Scan.setstartrow (lastrow), lastrow for the last query Rowkey, note that the Rowkey is an array, The storage location corresponding to multiple fields;
Different user login will produce different lastrow, so we store lastrow into session, reference Pagelastrowcache.
To understand decoupling, we encapsulate the lastrow operations into Hbasedaoimpl so that we do not need to care about lastrow operations when we develop code writing.
Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {
Conditional filter
filterlist filterlist = new Querycontrolrulefilterlist (QO). Getfilterlist ();
Get the total number of eligible results
Long total = Gettotal (Controltablename, filterlist);
Filter Collection
FilterList fl=new filterlist ();
Paging Filter
Filter filter = new Pagefilter (pageSize);
Fl.addfilter (filterlist);
Fl.addfilter (filter);
Encapsulating result Sets
list<xcontrol> list = GetList (Xcontrol.class, Controltablename, FL, currteindex);
Log.info ("---------------------total:" + list.size ());
return result set
PageInfo page = new PageInfo (total, list);
return page;
list sort
HBase are sorted in dictionary order, which is descending, in which the earliest data (Rowkey smallest) is displayed in the front of the page.
The current solution is to add a Foreign key association table to the primary key, and the foreign key generation rule is
400000000000-The primary key number, such as the primary key is X201401110001, the corresponding foreign key is X198598889999, in order to achieve ascending sorting function, save the entity with X198598889999 as the primary key, Page query and then from the association table in accordance with X198598889999 to obtain X201401110001.
Note: You need to associate actions with new, delete, and query.
Example:
Public String Add (Xcontrol control) throws Exception {
PKCONTROLDAO.ADDXCONTROLFK (ID);
}
public void Delete (String id) throws Exception {
PKCONTROLDAO.DELETEXCONTROLFK (ID);
}
Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {
Matching a primary key based on a foreign key query
if (Stringutils.isnotblank (Qo.getid ())) {
Qo.setpks (Pkcontroldao.getxcontrolpks (Qo.getid ()));
}
Code reference: Hbaserowkeyutil pkxcontroldaohbaseimpl part fourth hbase query comparison operator
Note: All comparison operators match the data value of the database with the set value, rather than the set value to the database data value comparison
Less |
Match values that are less than the set value |
Less_or_equal |
Match a value that is less than or equal to a set value |
EQUAL |
Match value equal to set value |
Not_equal |
Match values that are not equal to the set value |
Greater_or_equal |
Match values greater than or equal to set values |
GREATER |
Matches a value greater than the Set value |
No_op |
Exclude all values |
Introduction to comparators
Binarycomparator |
Use Bytes.compareto () to compare current values with thresholds |
Binaryprefixcomparator |
Similar to the above, using Bytes.compareto () to match, but starting from the left end prefix matching |
Nullcomparator |
Do not match, only to determine whether the current value is null |
Bitcomparator |
Bitwise comparison with (and), or (or), exclusive, or XOR operations provided by the Bitwiseop class |
Regexstringcomparator |
According to a regular expression, the data in the table is matched when the comparer is instantiated |
Substringcomparator |
The value of the valve and the data in the table are instances of string, while the contains () operation matches the strings |
On the hbase of Comparefilter filters, we use Binarycomparator, Nullcomparator, Regexstringcomparator in the control project. Below the detailed description binarycomparator, nullcomparator use situation binarycomparator
Can be used for all comparison operators, so use it when equal, not equal, or range matched. nullcomparator
The comparator is used when the judgment is empty or is not empty.
In the use of nullcomparator, it is necessary to note that the definition of hbase is null. An example is provided:
Row1 's Endarea is not a value, but in HBase, it is empty.
The column is not present in the Row2 Endarea, and in HBase it represents null.
regexstringcomparator
Similar to the Substringcomparator comparator, often used to do string matching, with equals, not equal to comparison operators, can not be used in conjunction with the scope (less, GREATER ...). ) is used for comparison operations. General Filter introduction
HBase provides a lot of filters, detailed reference HBase authoritative guide (Chinese version). PDF, the filter used in this project, we mainly used the singlecolumnvaluefilter and paging time to use the pagefilter. Filter Application example
Note: The filter instances provided here are filtered for a single-column value.
Range filter: Less than, less than or equal to, greater than, greater than or equal to, greater than or equal to or less than, less than or greater than
Value filtering: equals, not equal to
String filtering: matching, mismatched
Empty filter: Empty, Non-empty
Code Reference filterhelper Filter Help class Filter set use
In traditional database queries, where A is often used. and b=. , or where a like. Or b=..
HBase to implement this feature, you need to use filterlist, example:
where A like. and b=. Can write like this
FilterList andlist = new filterlist (Operator.must_pass_all);
Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));
Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));
where A like. or b=. Can write like this
FilterList andlist = new filterlist (operator.must_pass_one);
Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));
Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));
Complex Query filtering
The above query is simpler, but more complex queries are often encountered in the actual business. For example: where (A like. and b=. ) or (where A is like. or b=?), compared with the example above, is actually more than a layer of nesting.
In HBase we can also nest filterlist to implement this complex query:
FilterList andlist = new filterlist (Operator.must_pass_all);
Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));
Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));
FilterList orlist = new filterlist (operator.must_pass_one);
Orlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));
Orlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));
filterlist list = new filterlist (operator.must_pass_one);
List.addfilter (andlist);
List.addfilter (orlist);
We use a lot of these filterlist nesting in the Regulation project, and the nesting levels are different according to the business logic. Fifth part hbase performance Optimization query caching
The default value of the scan caching property is 1, meaning the scanner captures 1 records at a time from the region server to match. We can set the caching to a much larger value than 1. For example, if you set to 500, you can crawl 500 at a time, and you need to be aware that the larger the value, the more memory overhead the server will have.
Htableinterface htable=gethtable (tablename);
Scan scan=new Scan ();
/* Set Cache * *
Scan.setcaching (Staticconfig.geticontrol_hbase_cache ());
Resultscanner scanner= Htable.getscanner (scan); multithreaded configuration
Hbase.regionser.handler.count
Number of RPC listener instances in Regionserver. For master, this property is the number of processing threads (handler) that Master accepts. The default value is 10.
According to the business scene of the regulation layer, the matching query of 1 tariff will produce 4 hbase concurrent query. If there are 20, there may be 80 concurrent, the concurrent amount is equivalent. In addition to the appropriate tuning of this parameter can increase the concurrent processing capacity, but also with the number of clusters and server configuration has a direct relationship, the number of clusters is expected, the higher the server CPU core, the more concurrent processing capacity. Pre- partitioning
Hregion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit means that different hregion can be distributed on different hregion servers. However, a hregion is not split into multiple servers.
Hbase.hregion.max.filesize
The maximum value of the hstorefile. If the storage file for any of the region in the region exceeds this limit, it will be split into two. Default: 268435456 (256x1024x1024), or 256M.
Our regulatory files are relatively small, to reach the maximum partition limit of 256M need more regulatory documents. In order to increase concurrency, we need to produce multiple hregion to save and process the data without reaching the partition cap, where the HBase function is used.
Example:
Configuration conf = hbaseconfiguration.create ()
Hbaseadmin admin = new hbaseadmin (conf);
Htabledescriptor desc = new Htabledescriptor (
Bytes.tobytes (tablename));
Hcolumndescriptor coldef = new Hcolumndescriptor (
Bytes.tobytes (colfamily));
Admin.createtable (DESC, bytes.tobytes (1L), Bytes.tobytes (10L), 10);
Divide the area with the first character differently
Desc.setvalue (Htabledescriptor.split_policy,
KeyPrefixRegionSplitPolicy.class.getName ());
Desc.setvalue ("Prefix_split_key_policy.prefix_length", "1"); Appendix: code see Baidu Library: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e