This manual was compiled in September 2014, for I in the home company, the main reference HBase authoritative guide Practice summary, for reference only!
Please indicate the source of the transfer. Baidu Library Link: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e preface
HBase is a distributed, column-oriented, open-source database. HBase differs from the general relational database, which is a database suitable for unstructured data storage. The other difference is that HBase is column-based instead of row-based patterns. Hbase–hadoop Database is a highly reliable, high-performance, column-oriented, scalable distributed storage system that leverages HBase technology to build large-scale structured storage clusters on inexpensive PC servers.
In the development of the regulatory layer, the control file can be entered into a number of sub-rules, so the system split up a number of regulatory sub-rules, used for freight matching query. Due to the imperfect pre-design, the control file and the regulatory sub-rules are saved in hbase, so it will be related to hbase paging and hbase sorting problems, although the problem is resolved, but it does not feel that the control file is not necessary in hbase, because it does not participate in the price matching query.
Tip: Relational data operations try not to use HBase as a database. the first part of the data structure table Definition
Table name: Current table name definition "T_" + Entity name capitalization
Column family: There are no more than three column family definitions. Currently only defines a column family "Baseinfo" data structure definition
Property Type: string is recommended. In the traditional database because the size of the data is different, so it will be used int, short, long, String, double to save, the definition of data format needs to be given a size, but in hbase, there is no need to define the size of the data storage space.
Property name: Uppercase primary key
Primary key: Table name prefix +yyyymmdd+4 bit sequence number
Note: The ordinal column is obtained from the self-increment table, and the daily definition resets to 0. Initialize
The table name and column family must be initialized before the table can be used. Example: Create ' T_xcontrol ', ' baseinfo ' The initialization of the second part of the table
Code reference: Inithbasetable.properties inithbasetable initialization policy Keep the latest version
Only need to save the latest version of the data hcolumndescriptor.setmaxversions (1) compression policy
Whether to automatically expire using a compression policy
Data is automatically expired by hcolumndescriptor.settimetolive (inttimetolive) to set the storage life-time Pre-partitioning of data in the table
Description: HBase Region split strategy
Whether to write memory using a pre-partitioning policy
Whether to write memory
Part III code development crud Operations
For simple crud operations, refer to the HBase authoritative guide (Chinese version). PDF, Below is an object-oriented crud operation for HBase basic operations. All of the DAO layers that use HBase as the storage database inherit the Hbasedaoimpl class, and the following are examples of usage. new action
Public String Add (Xcontrol control) throws Exception {
String id = hbaserowkeyutil.getrowkey (controltablename);
Control.setid (ID);
Control.setstatus (Status.ADD.getValue ());
Putdelete Pd=hbaseconvetorutil.convetor (Control,id);
Super.saveputdelete (Controltablename, PD);
return ID; Update action
Public String Update (Xcontrol control) throws Exception {
String id = control.getid ();
Putdelete Pd=hbaseconvetorutil.convetor (Control,id);
Super.saveputdelete (Controltablename, PD);
return ID;
} query Operation
Public Xcontrol Getxcontrol (String id) throws Exception {
Return Super.get (xcontrol.class,controltablename, id);
} delete operation
public void Delete (String id) throws IOException {
Delete (controltablename, id);
} table instance pool
Creating an Htable instance is a time-consuming operation that typically takes several seconds to complete. In a highly resource-intensive environment, there are thousands of requests per second, and creating htable instances for each request is simply not a viable one. Users should create instances at the outset, and then reuse them throughout the client life cycle.
However, there are other issues with reusing htable instances in a multithreaded environment.
The client can solve this problem through the Htablepool class. It has only one purpose, which is to provide the client connection pool for the HBase cluster.
Use reference Htablepoolutil table instance pool for the Htablepool class
Get instance: Htablepoolutil.gethtablepoolutil (). gethtable (TableName);
Close instance: Htablepoolutil.gethtablepoolutil (). puthtable (TableName);
get total
Get totals take advantage of hbase coprocessor capabilities
1. Configuration
Add a configuration item to $hbase_home/conf/hbase-site.xml. I use the 0.94 version of the implementation as aggregateimplementation, specifically as follows
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>
If this is not previously configured, it will take effect to restart HBase after the configuration is complete.
2. Client use code example
Get total Eligible results
Public Long gettotal (String tableName, Filter valuefilter) {
Scan scan=new scan ();
if (null!=valuefilter) {
Scan.setfilter (Valuefilter);
}
Aggregationclient aggregationclient = new Aggregationclient (conf);
Long RowCount = 0;
try {
Scan.addcolumn (Bytes.tobytes ("Baseinfo"), null);//must have this sentence, or with addfamily (), otherwise error, exception contains CI * * * *
RowCount = Aggregationclient.rowcount (Bytes.tobytes (tableName), null, scan);
} catch (Throwable e) {
E.printstacktrace ();
}
return rowCount;
}
list pagination
The paging implementation of HBase is relatively complex. The core idea is to combine the paging filter pagefilter (pageSize) and the query settings to start line Scan.setstartrow (lastrow), lastrow for the last query Rowkey, notice that the Rowkey is an array, corresponding to the storage location of multiple fields;
Different user login will produce different lastrow, so we store lastrow in session, refer to Pagelastrowcache.
To understand the decoupling, we also encapsulate the lastrow operation to Hbasedaoimpl so that we do not need to be concerned about LastRow's operation when writing code.
Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {
Conditional filters
filterlist filterlist = new Querycontrolrulefilterlist (QO). Getfilterlist ();
Get total Eligible results
Long total = Gettotal (Controltablename, filterlist);
Filter set
FilterList fl=new filterlist ();
Paging Filter
Filter filter = new Pagefilter (pageSize);
Fl.addfilter (filterlist);
Fl.addfilter (filter);
Encapsulating result Sets
list<xcontrol> list = GetList (Xcontrol.class, Controltablename, FL, currteindex);
Log.info ("---------------------total:" + list.size ());
return result set
PageInfo page = new PageInfo (total, list);
return page;
} list sort
HBase is sorted in dictionary order, that is, descending, in the page is the expression of the earliest data (Rowkey the smallest) row in front.
The current solution is to add a Foreign key association table to the primary key, and the foreign key generation rule is
400000000000-PRIMARY Key number, such as the primary key is X201401110001, the corresponding foreign key is X198598889999, in order to achieve the ascending sort function, save the entity with X198598889999 as the primary key, The page query is then retrieved from the association table according to X198598889999. X201401110001.
Note: Additional, delete, and query related actions are required.
Example:
Public String Add (Xcontrol control) throws Exception {
PKCONTROLDAO.ADDXCONTROLFK (ID);
}
public void Delete (String id) throws Exception {
PKCONTROLDAO.DELETEXCONTROLFK (ID);
}
Public PageInfo Searchxcontrol (querycontrolruleqo qo,integer pagesize,integer currteindex) throws Exception {
Query out matching primary key based on foreign key
if (Stringutils.isnotblank (Qo.getid ())) {
Qo.setpks (Pkcontroldao.getxcontrolpks (Qo.getid ()));
}
Code reference: Hbaserowkeyutil pkxcontroldaohbaseimpl part Fourth hbase query comparison operators
Note: When all comparison operators match, the data values of the database are compared with the set values, rather than the data values of the database compared with the set values.
Less |
Match values that are less than the set value |
Less_or_equal |
Match values less than or equal to the Set value |
EQUAL |
Match values equal to set values |
Not_equal |
Match values that are not equal to the set value |
Greater_or_equal |
Match values greater than or equal to the Set value |
GREATER |
Match a value greater than the Set value |
No_op |
Exclude all values |
Comparator Introduction
Binarycomparator |
Use Bytes.compareto () to compare current and threshold values |
Binaryprefixcomparator |
Similar to the above, use Bytes.compareto () to match, but prefix matches from the left side |
Nullcomparator |
Do not match, only judge whether the current value is null |
Bitcomparator |
Bitwise comparison with (and), or (or), XOR (XOR) operations provided by the Bitwiseop class |
Regexstringcomparator |
According to a regular expression, when instantiating the comparator, match the data in the table. |
Substringcomparator |
Use threshold and table data as string instances, and match strings by contains () operation |
Represents hbase for Comparefilter-based filters, which we use for Binarycomparator, Nullcomparator, Regexstringcomparator in the control project, The following details the use of Binarycomparator, Nullcomparator binarycomparator
It can be used for all comparison operators, so it can be used when equal, not equal, and range matches. nullcomparator
The comparator is used when the judge is empty or not empty.
In the use of nullcomparator, it is important to note that hbase defines null. To illustrate:
The Row1 Endarea is not a value, but in HBase, it also represents null.
Row2 Endarea does not exist in this column, in HBase, it represents null.
regexstringcomparator
Similar to the Substringcomparator comparator, often used to do string matching, with equal, not equal to the comparison operator, can not be used with the scope (less, GREATER ... ) The comparison operation is used. Common Filter introduction
HBase offers a number of filters, with detailed reference to the HBase authoritative guide (Chinese version). PDF, the filter used in this project, we use the Singlecolumnvaluefilter and the pagefilter when paging. Filter Application example
Note: The filter instances provided here are filtered for single-column values.
Range filtering: Less than, less than or equal to, greater than, greater than or equal to, greater than or equal to or less than, less than, or greater than
Value filtering: equals, not equals
String filtering: matching, mismatched
Null filter: Empty, non-empty
Code reference Filterhelper Filtering Help class Filter Collection usage
In traditional database queries, where A like is often used. and b=. , or where a is like. or b=.
HBase implements this function and needs to use the filterlist, example:
where A like. and b=. Can write like this
FilterList andlist = new filterlist (Operator.must_pass_all);
Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));
Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));
where A like. or b=. Can write like this
FilterList andlist = new filterlist (operator.must_pass_one);
Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));
Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));
Complex Query filtering
The query above is relatively simple, but the actual business often encounters more complex queries. For example: where (A like. and b=. ) or (where A like. or b=?), in fact, compared to the above example, is actually more than a layer of nesting.
In HBase we can also nest filterlist to implement this complex query:
FilterList andlist = new filterlist (Operator.must_pass_all);
Andlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));
Andlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));
FilterList orlist = new filterlist (operator.must_pass_one);
Orlist.addfilter (Filterhelper.getregexstringfilter (field_a, Field_a_value));
Orlist.addfilter (Filterhelper.getequalfilter (Field_b, Field_a_value));
filterlist list = new filterlist (operator.must_pass_one);
List.addfilter (andlist);
List.addfilter (orlist);
In the regulatory project we use many of these filterlist nesting, different levels of nesting depending on the business logic. part fifth hbase performance optimization query cache
The default value of the caching property of scan is 1, which means that each time the scanner fetches 1 records from the region server to match. We can set caching to a much larger value than 1. For example, set to 500, you can crawl 500 at a time, and be aware that the larger the value is, the more memory overhead the server will have.
Htableinterface htable=gethtable (tableName);
Scan scan=new scan ();
/* Set Cache */
Scan.setcaching (Staticconfig.geticontrol_hbase_cache ());
Resultscanner scanner= Htable.getscanner (scan); multithreaded configuration
Hbase.regionser.handler.count
The number of RPC listener instances in the Regionserver. For master, this attribute is the number of processing threads (handler) accepted by master. The default value is 10.
According to the business scenario of the regulatory layer, the matching query of 1 freight rates will produce 4 hbase concurrent queries. If there are 20, there may be 80 concurrent, this concurrency is equivalent. In addition to the appropriate adjustment of this parameter can increase the concurrent processing capacity, but also with the number of clusters and the configuration of the server directly, it is expected that the more clusters, the higher the number of server CPU, the more concurrent processing power. Pre- partitioning
Hregion is the smallest unit of distributed storage and load balancing in HBase. The smallest unit means that different hregion can be distributed on different hregion servers. However, a hregion is not split across multiple servers.
Hbase.hregion.max.filesize
The maximum value of the hstorefile. If the storage file for any of the column families in region exceeds this limit, it will be split into two region. Default: 268435456 (256x1024x1024), which is 256M.
Our control file is relatively small, to reach the maximum partition limit of 256M need more control files. In order to increase the concurrency, we need to generate multiple hregion to save and process the data without reaching the partition cap, where the pre-partitioning functionality of hbase is used.
Example:
Configuration conf = hbaseconfiguration.create ()
Hbaseadmin admin = new hbaseadmin (conf);
Htabledescriptor desc = new Htabledescriptor (
Bytes.tobytes (tablename));
Hcolumndescriptor coldef = new Hcolumndescriptor (
Bytes.tobytes (colfamily));
Admin.createtable (DESC, bytes.tobytes (1L), Bytes.tobytes (10L), 10);
Divide the area with the first character
Desc.setvalue (Htabledescriptor.split_policy,
KeyPrefixRegionSplitPolicy.class.getName ());
Desc.setvalue ("Prefix_split_key_policy.prefix_length", "1"); Appendix: code see Baidu Library: http://wenku.baidu.com/view/c39c1456482fb4daa48d4b3e