HBase table data Paging

Source: Internet
Author: User
Tags columnar database

HBase table data Paging

HBase is a key technology in the Hadoop Big Data ecosystem technology circle. It is a columnar database for Distributed Storage of big data. For more details about HBase, friends can search on the Internet, and I will write a technical topic on HBase in the next days. If you are interested, you can look forward to it a little. However, this chapter focuses on the paging processing of HBase table data.

First, let's look at an unavoidable indicator in the table data paging: the total number of records. In relational databases, it is easy to COUNT the total number of records, but in HBase, this is a big problem. At least for the moment, friends do not expect to use a similar "select count (*) from table "is used to calculate the total number of rows in a TABLE. The table row count statistics function provided by HBase is a MapReduce task that is extremely time-consuming. Therefore, when processing HBase table data by page, we can only ignore the statistical indicator of the total number of records.

If the total number of records is uncertain, the total number of pages is also uncertain, whether the next page exists is unknown, and other problems arising therefrom, we need to pay special attention when processing HBase table data by page.

1. HBase table data paging model class

Import java. io. Serializable;
Import java. text. DecimalFormat;
Import java. util. ArrayList;
Import java. util. List;
Import org. apache. hadoop. hbase. client. Result;
/**
* Description: The paging model class of HBase table data. <Br>
* This class can be used to manage multiple HBaseQualifierModel objects.
* Copyright: Copyright (c) 2014 <br>
* Company: smart grid Institute of Henan Electric Power Research Institute <br>
* @ Author shangbingbing compiled on
* @ Version 1.0
*/
Public class HBasePageModel implements Serializable {
Private static final long serialVersionUID = 330347716100946538l;
Private int pageSize = 100;
Private int pageIndex = 0;
Private int prevPageIndex = 1;
Private int nextPageIndex = 1;
Private int pageCount = 0;
Private int pageFirstRowIndex = 1;
Private byte [] pageStartRowKey = null;
Private byte [] pageEndRowKey = null;
Private boolean hasNextPage = true;
Private int queryTotalCount = 0;
Private long startTime = System. currentTimeMillis ();
Private long endTime = System. currentTimeMillis ();
Private List <Result> resultList = new ArrayList <Result> ();
Public HBasePageModel (int pageSize ){
This. pageSize = pageSize;
}
/**
* Retrieve the number of paging records
* @ Return
*/
Public int getPageSize (){
Return pageSize;
}
/**
* Set the number of paging records
* @ Param pageSize
*/
Public void setPageSize (int pageSize ){
This. pageSize = pageSize;
}
/**
* Get the serial number of the current page
* @ Return
*/
Public int getPageIndex (){
Return pageIndex;
}
/**
* Set the serial number of the current page
* @ Param pageIndex
*/
Public void setPageIndex (int pageIndex ){
This. pageIndex = pageIndex;
}
/**
* Retrieve the total number of pages
* @ Return
*/
Public int getPageCount (){
Return pageCount;
}
/**
* Set the total number of pages.
* @ Param pageCount
*/
Public void setPageCount (int pageCount ){
This. pageCount = pageCount;
}
/**
* Obtain the sequence number of the first line on each page.
* @ Return
*/
Public int getPageFirstRowIndex (){
This. pageFirstRowIndex = (this. getPageIndex ()-1) * this. getPageSize () + 1;
Return pageFirstRowIndex;
}
/**
* Get the start row key for each page
* @ Return
*/
Public byte [] getPageStartRowKey (){
Return pageStartRowKey;
}
/**
* Set the start row key for each page
* @ Param pageStartRowKey
*/
Public void setPageStartRowKey (byte [] pageStartRowKey ){
This. pageStartRowKey = pageStartRowKey;
}
/**
* Get the end row key of each page
* @ Return
*/
Public byte [] getPageEndRowKey (){
Return pageEndRowKey;
}
/**
* Set the end row key for each page
* @ Param pageStartRowKey
*/
Public void setPageEndRowKey (byte [] pageEndRowKey ){
This. pageEndRowKey = pageEndRowKey;
}
/**
* Obtain the sequence number of the previous page.
* @ Return
*/
Public int getPrevPageIndex (){
If (this. getPageIndex ()> 1 ){
This. prevPageIndex = this. getPageIndex ()-1;
} Else {
This. prevPageIndex = 1;
}
Return prevPageIndex;
}
/**
* Get the sequence number of the next page
* @ Return
*/
Public int getNextPageIndex (){
This. nextPageIndex = this. getPageIndex () + 1;
Return nextPageIndex;
}
/**
* Obtain whether the next page exists.
* @ Return
*/
Public boolean isHasNextPage (){
// This judgment is not rigorous, because it is very likely that the remaining data is just one page.
If (this. getResultList (). size () = this. getPageSize ()){
This. hasNextPage = true;
} Else {
This. hasNextPage = false;
}
Return hasNextPage;
}
/**
* Retrieve the total number of retrieved records
*/
Public int getQueryTotalCount (){
Return queryTotalCount;
}
/**
* Retrieve the total number of retrieved records
* @ Param queryTotalCount
*/
Public void setQueryTotalCount (int queryTotalCount ){
This. queryTotalCount = queryTotalCount;
}
/**
* Initialization Start Time (MS)
*/
Public void initStartTime (){
This. startTime = System. currentTimeMillis ();
}
/**
* Initialization deadline (MS)
*/
Public void initEndTime (){
This. endTime = System. currentTimeMillis ();
}
/**
* Obtain time consumption information in milliseconds
* @ Return
*/
Public String getTimeIntervalByMilli (){
Return String. valueOf (this. endTime-this. startTime) + "millisecond ";
}
/**
* Obtain time consumed in the second format
* @ Return
*/
Public String getTimeIntervalBySecond (){
Double interval = (this. endTime-this. startTime)/1000.0;
DecimalFormat df = new DecimalFormat ("#.##");
Return df. format (interval) + "seconds ";
}
/**
* Print the time information.
*/
Public void printTimeInfo (){
LogInfoUtil. printLog ("Start Time:" + this. startTime );
LogInfoUtil. printLog ("Deadline:" + this. endTime );
LogInfoUtil. printLog ("Time consumed:" + this. getTimeIntervalBySecond ());
}
/**
* Retrieve the HBase search result set
* @ Return
*/
Public List <Result> getResultList (){
Return resultList;
}
/**
* Set the HBase search result set
* @ Param resultList
*/
Public void setResultList (List <Result> resultList ){
This. resultList = resultList;
}
}

In summary, we did not perform statistical processing on the total number of records and the total number of pages, and replaced the total number of records with "Number of retrieved records ". In addition, statistics are recorded on the time consumed for each retrieval to facilitate developers to debug the statistical efficiency.

2. HBase table data paging Retrieval Method

Like relational database Oracle, there are often many search conditions attached to data retrieval, and HBase table data retrieval is no exception. HBase table data retrieval conditions include the RowKey row key range (full table if the range is not determined), filter, and data version. Therefore, when we decide to design a common data paging Retrieval Interface method, we have to consider the above search conditions.

/**
* Retrieve table data by page. <Br>
* (If a non-default namespace is specified for the table during table creation, you must spell the namespace name in the format of [namespace: tablename ]).
* @ Param tableName: Table Name (*).
* @ Param startRowKey start row key (it can be null. If it is null, It is retrieved from the first row of the table ).
* @ Param endRowKey end row key (can be blank ).
* @ Param filterList the set of Search Condition filters (excluding the paging filter; it can be blank ).
* @ Param maxVersions specifies the maximum number of versions. [if it is the maximum integer, all versions are retrieved. If it is the smallest integer, the latest version is retrieved. Otherwise, only the specified number of versions is retrieved ].
* @ Param pageModel paging model (*).
* @ Return returns the HBasePageModel paging object.
*/
Public static HBasePageModel scanResultByPageFilter (String tableName, byte [] startRowKey, byte [] endRowKey, FilterList filterList, int maxVersions, HBasePageModel pageModel ){
If (pageModel = null ){
PageModel = new HBasePageModel (10 );
}
If (maxVersions <= 0 ){
// By default, only the latest data version is retrieved.
MaxVersions = Integer. MIN_VALUE;
}
PageModel. initStartTime ();
PageModel. initEndTime ();
If (StringUtils. isBlank (tableName )){
Return pageModel;
}
HTable table = null;

Try {
// Obtain the HTable table object based on the HBase table name. Here we use a table information management class built by myself.
Table = HBaseTableManageUtil. getHBaseTable (tableName );
Int tempPageSize = pageModel. getPageSize ();
Boolean isEmptyStartRowKey = false;
If (startRowKey = null ){
// Read the table's first row record. Here we use a table data operation class built by myself.
Result firstResult = HBaseTableDataUtil. selectFirstResultRow (tableName, filterList );
If (firstResult. isEmpty ()){
Return pageModel;
}
StartRowKey = firstResult. getRow ();
}
If (pageModel. getPageStartRowKey () = null ){
IsEmptyStartRowKey = true;
PageModel. setPageStartRowKey (startRowKey );
} Else {
If (pageModel. getPageEndRowKey ()! = Null ){
PageModel. setPageStartRowKey (pageModel. getPageEndRowKey ());
}
// From the second page, retrieve one more record each time, because the first record is to be deleted.
TempPageSize + = 1;
}

Scan scan = new Scan ();
Scan. setStartRow (pageModel. getPageStartRowKey ());
If (endRowKey! = Null ){
Scan. setStopRow (endRowKey );
}
PageFilter pageFilter = new PageFilter (pageModel. getPageSize () + 1 );
If (filterList! = Null ){
FilterList. addFilter (pageFilter );
Scan. setFilter (filterList );
} Else {
Scan. setFilter (pageFilter );
}
If (maxVersions = Integer. MAX_VALUE ){
Scan. setMaxVersions ();
} Else if (maxVersions = Integer. MIN_VALUE ){

} Else {
Scan. setMaxVersions (maxVersions );
}
Resultpartition partition = table. getpartition (scan );
List <Result> resultList = new ArrayList <Result> ();
Int index = 0;
For (Result rs: values. next (tempPageSize )){
If (isEmptyStartRowKey = false & index = 0 ){
Index + = 1;
Continue;
}
If (! Rs. isEmpty ()){
ResultList. add (rs );
}
Index + = 1;
}
Response. close ();
PageModel. setResultList (resultList );
} Catch (Exception e ){
E. printStackTrace ();
} Finally {
Try {
Table. close ();
} Catch (IOException e ){
E. printStackTrace ();
}
}

Int pageIndex = pageModel. getPageIndex () + 1;
PageModel. setPageIndex (pageIndex );
If (pageModel. getResultList (). size ()> 0 ){
// Obtain the row key information of the first and last rows of the paging data
Byte [] pageStartRowKey = pageModel. getResultList (). get (0). getRow ();
Byte [] pageEndRowKey = pageModel. getResultList (). get (pageModel. getResultList (). size ()-1). getRow ();
PageModel. setPageStartRowKey (pageStartRowKey );
PageModel. setPageEndRowKey (pageEndRowKey );
}
Int queryTotalCount = pageModel. getQueryTotalCount () + pageModel. getResultList (). size ();
PageModel. setQueryTotalCount (queryTotalCount );
PageModel. initEndTime ();
PageModel. printTimeInfo ();
Return pageModel;
}

 

By the way, the interface method of "getting the first row of HBase table data" is provided.

 

/**
* Retrieve the first row record of the specified table. <Br>
* (If a non-default namespace is specified for the table during table creation, you must spell the namespace name in the format of [namespace: tablename ]).
* @ Param tableName: Table Name (*).
* @ Param filterList: a collection of filters, which can be null.
* @ Return
*/
Public static Result selectFirstResultRow (String tableName, FilterList filterList ){
If (StringUtils. isBlank (tableName) return null;
HTable table = null;
Try {
Table = HBaseTableManageUtil. getHBaseTable (tableName );
Scan scan = new Scan ();
If (filterList! = Null ){
Scan. setFilter (filterList );
}
Resultpartition partition = table. getpartition (scan );
Iterator <Result> iterator = iterator. Iterator ();
Int index = 0;
While (iterator. hasNext ()){
Result rs = iterator. next ();
If (index = 0 ){
Response. close ();
Return rs;
}
}
} Catch (IOException e ){
E. printStackTrace ();
} Finally {
Try {
Table. close ();
} Catch (IOException e ){
E. printStackTrace ();
}
}
Return null;
}

3. HBase table data paging retrieval application instance

HBasePageModel pageModel = new HBasePageModel (pageSize );
PageModel = scanResultByPageFilter ("DLQX: SZYB_DATA", null, pageModel );
If (pageModel. getResultList (). size () = 0 ){
// This page contains no data, indicating that it is the last page.
Return;
}

Hadoop + HBase cloud storage creation summary PDF

Regionserver startup failed due to inconsistent time between HBase nodes

Hadoop + ZooKeeper + HBase cluster configuration

Hadoop cluster Installation & HBase lab environment setup

HBase cluster configuration based on Hadoop cluster'

Hadoop installation and deployment notes-HBase full distribution mode installation

Detailed tutorial on creating HBase environment for standalone Edition

HBase details: click here
HBase: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.