HBase Scanners and Filters

Source: Internet
Author: User
Tags key string

Scanning Device

HBase uses the scanner table scanner when scanning data.
Htable calls Getscanner (scan) through a scan instance to get the scanner. You can configure the scan start and end positions and other filtering conditions.
Using iterators to return query results, although not very convenient, is not complicated.

But here's one thing that might be overlooked, is the scanner iterator that returns, and the default configuration accesses one regionserver at a time each time the next record is fetched. This is not a good network situation, the impact on performance is very large, it is recommended to configure the scanner cache .

Scanner Cache

hbase.client.scanner.cachingConfiguration items can set the number of data bars that hbase scanner fetches from the server at a time, one at a time by default. By setting it to a reasonable value, you can reduce the time overhead of next () in the scan process, at the cost of scanner to maintain the cache row records through the client's memory.
There are three places to configure it:
1. Configure in the Conf configuration file of HBase.
2. Configure by calling htable.setscannercaching (int scannercaching).
3. Configure by calling scan.setcaching (int caching).
The priority of the three is getting higher.

Scanner Demo
 PackageScanner;ImportJava.io.IOException;ImportOrg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.hbase.HBaseConfiguration;ImportOrg.apache.hadoop.hbase.KeyValue;ImportOrg.apache.hadoop.hbase.client.HConnection;ImportOrg.apache.hadoop.hbase.client.HConnectionManager;ImportOrg.apache.hadoop.hbase.client.HTableInterface;ImportOrg.apache.hadoop.hbase.client.Result;ImportOrg.apache.hadoop.hbase.client.ResultScanner;ImportOrg.apache.hadoop.hbase.client.Scan;ImportOrg.apache.hadoop.hbase.util.Bytes; Public  class Scanner {    PrivateString RootDir;PrivateString Zkserver;PrivateString Port;PrivateConfiguration conf;PrivateHconnection Hconn =NULL;Private Scanner(String rootdir,string zkserver,string Port)throwsioexception{ This. RootDir = RootDir; This. zkserver = Zkserver; This. Port = port;        conf = Hbaseconfiguration.create (); Conf.set ("Hbase.rootdir", RootDir); Conf.set ("Hbase.zookeeper.quorum", Zkserver); Conf.set ("Hbase.zookeeper.property.clientPort", port);      Hconn = hconnectionmanager.createconnection (conf); } Public void scantable(String tablename) {Scan Scan =NewScan ();//Set Scan cacheScan.setcaching ( +);Try{Htableinterface table = hconn.gettable (tablename); Resultscanner scanner = Table.getscanner (scan); for(Result Result:scanner)            {format (result); }        }Catch(IOException e)        {E.printstacktrace (); }    } Public void format(Result result) {//Row keyString Rowkey = bytes.tostring (Result.getrow ());//return A cells of a Result as an array of keyvalueskeyvalue[] Kvs = Result.raw (); for(KeyValue Kv:kvs) {//Column family nameString family = bytes.tostring (kv.getfamily ());//Column nameString qualifier = bytes.tostring (Kv.getqualifier ());            String value = bytes.tostring (Result.getvalue (bytes.tobytes (family), bytes.tobytes (qualifier))); System.out.println ("Rowkey->"+rowkey+", family->"+family+", qualifier->"+qualifier); System.out.println ("Value->"+value); }    }//Command line Scan ' students '     Public Static void Main(string[] args)throwsIOException {String RootDir ="Hdfs://hadoop1:8020/hbase"; String Zkserver ="HADOOP1"; String Port ="2181";//InitializeScanner conn =NewScanner (Rootdir,zkserver,port); Conn.scantable ("Students"); }}
Filter Filters

1, the use of filters can improve the efficiency of the Operation table, hbase two data read function get () and scan () both support the filter, support direct access and access by specifying the start and end row keys, but the lack of fine-grained filtering capabilities (such as the ability to filter the row keys or values based on regular expressions).
2. You can use pre-defined filters or implement custom filters.
3, the filter is created in the client, through RPC to the server side, the server side to perform filtering operations, the data back to the client.

Filter Classification

1, comparision Filters (comparison filter)

RowFilterFamilyFilterQualifierFilterValueFilterDependentColumnFilter

2, dedicated Filters (special filter)

SingleColumnValueFilterSingleColumnValueExcludeFilterPrefixFilterPageFilterKeyOnlyFilterFirstKeyOnlyFilterTimestampsFilterRandomRowFilter

3, Decorating Filters (additional filter)

SkipFilterWhileMatchFilters
Filter Demo
Package Filter;Import Java. IO. IOException;import org. Apache. Hadoop. conf. Configuration;import org. Apache. Hadoop. HBase. Hbaseconfiguration;import org. Apache. Hadoop. HBase. KeyValue;import org. Apache. Hadoop. HBase. Client. Hconnection;import org. Apache. Hadoop. HBase. Client. Hconnectionmanager;import org. Apache. Hadoop. HBase. Client. Htableinterface;import org. Apache. Hadoop. HBase. Client. Result;import org. Apache. Hadoop. HBase. Client. Resultscanner;import org. Apache. Hadoop. HBase. Client. Scan;import org. Apache. Hadoop. HBase. Filter. Binarycomparator;import org. Apache. Hadoop. HBase. Filter. Comparefilter;import org. Apache. Hadoop. HBase. Filter. Pagefilter;import org. Apache. Hadoop. HBase. Filter. Regexstringcomparator;import org. Apache. Hadoop. HBase. Filter. RowFilter;import org. Apache. Hadoop. HBase. Util. Bytes;public class Filterdemo {private String RootDir;Private String Zkserver;Private String Port;Private Configuration conf; Private Hconnection Hconn = null;Private Filterdemo (String rootdir,string zkserver,string port) throws ioexception{this. RootDir= RootDir;This. Zkserver= Zkserver;This. Port= Port;conf = hbaseconfiguration. Create();Conf. Set("Hbase.rootdir", RootDir);Conf. Set("Hbase.zookeeper.quorum", Zkserver);Conf. Set("Hbase.zookeeper.property.clientPort", port);Hconn = Hconnectionmanager. CreateConnection(conf); }//Compare filter public void filtertable (String tablename) {Scan scan = new Scan ();Scan. setcaching( +);RowFilter filter = new RowFilter (comparefilter. CompareOp. EQUAL, New Binarycomparator (Bytes. Tobytes("Tom")));Scan. SetFilter(filter);try {htableinterface table = Hconn. GetTable(tablename);Resultscanner scanner = table. Getscanner(scan);for (result result:scanner) {format (result);}} catch (IOException e) {E. Printstacktrace();}} public void Filtertableregex (String tablename) {Scan scan = new Scan ();Scan. setcaching( +);RowFilter filter = new RowFilter (comparefilter. CompareOp. EQUAL, New Regexstringcomparator ("t\\w+"));Scan. SetFilter(filter);try {htableinterface table = Hconn. GetTable(tablename);Resultscanner scanner = table. Getscanner(scan);for (result result:scanner) {format (result);}} catch (IOException e) {E. Printstacktrace();}}//Private filter public void Filtertablepage (String tablename) {Pagefilter pagefilter = new Pagefilter (4);//Row 3 records, one row reserved. byte[] LastRow = null;//Record The last read of Rowkey as the rowkey of the next queryint PageCount =0;//Indicates the first few pagestry {htableinterface table = Hconn. GetTable(tablename);while (++pagecount>0) {System. out. println("PageCount ="+ PageCount);Scan scan = new Scan ();Scan. SetFilter(Pagefilter);if (lastrow! = null) {Scan. Setstartrow(lastrow);} Resultscanner Resultscanner = table. Getscanner(scan);int count=0;for (result result:resultscanner) {lastrow = result. GetRow();if (++count>3) { Break;} format (Result);} if (count<3) {//When the data read is less than3Indicates to end, terminates the loop Break;}}} catch (IOException e) {E. Printstacktrace();}} public void format (result result) {//row key String Rowkey = Bytes. toString(Result. GetRow());Return a cells of a result as an array of keyvalues keyvalue[] Kvs = Result. Raw();for (KeyValue Kv:kvs) {//Column family name String family = Bytes. toString(KV. getfamily());Column name String qualifier = Bytes. toString(KV. Getqualifier());String value = Bytes. toString(Result. GetValue(Bytes. Tobytes(family), Bytes. Tobytes(qualifier)));System. out. println("Rowkey->"+rowkey+", family->"+family+", qualifier->"+qualifier);System. out. println("Value->"+value);}} public static void Main (string[] args) throws IOException {String RootDir ="Hdfs://hadoop1:8020/hbase";String Zkserver ="HADOOP1";String Port ="2181";Initialize Filterdemo Filterdemo = new Filterdemo (RootDir, Zkserver, Port);Filterdemo. Filtertable("Students");Filterdemo. Filtertableregex("Students");Filterdemo. Filtertablepage("Students");}}

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

HBase Scanners and Filters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.