HBase query, scan detailed

Source: Internet
Author: User
Tags types of filters

First, Shell query

HBase queries are fairly straightforward, providing both get and scan two ways, and there is no problem with multi-table federated queries. Complex queries need to create the appropriate external tables through hive, with SQL statements to automatically generate MapReduce.
But this simple, sometimes to achieve the goal, is not so handy. At least it differs from SQL query in a large way.

HBase provides a number of filters that can be filtered for row keys, columns, and values. The filtering method can be substring, binary, prefix, regular comparison and so on. The condition can be a combination of and,or and so on. So through the filtration, or can meet the needs, to find the right results.

1.1 Filter Type

A description of the filter is available in the English version of HBase's latest official document (http://abloz.com/hbase/book.html). There are 5 types of filters:

    1. Stereotype Filter: A filter used to contain another set of filters. Includes: filterlist
    2. Column-Valued Filter: Filters the values of each column. The equivalent of = and like in SQL queries includes:
      Singlecolumnvaluefilter
      Comparators, including: regexstringcomparator regular expressions that support value comparisons Substringcomparator are used to detect whether a substring exists in the value. Casing is not sensitive. Binaryprefixcomparator binary prefix comparison binarycomparator binary comparison
    3. Key-Value metadata filter: Used to filter columns. Including:
      Familyfilter is used to filter column families. In general, selecting Columnfamilie in scan is better than doing it in a filter. Qualifierfilter is used for filtering based on column names (i.e., Qualifier). Columnprefixfilter can be filtered based on the column name (that is, the qualifier) prefix. Multiplecolumnprefixfilter and Columnprefixfilter behave the same, but multiple prefixes can be specified. The columnrangefilter can perform an efficient internal scan.

    4. Rowkey: Filters the row keys. It is generally considered that the Startrow/stoprow method is better when the row selection is used. However, RowFilter can also be used.
    5. Tools: such as firstkeyonlyfilter for counting the number of rows.
Ii. examples

1.FirstKeyOnlyFilter, a convenient filter for calculating the number of rows
HBase (main):002:0> scan ' toplist_ware_ios_1009_201231 ',{columns=> ' info ',filter=> "(Firstkeyonlyfilter ()) "} 0000000001                       Column=info:loginid, timestamp=1343625459713, value=jjm168131013 0000000002                       column=info: loginID, timestamp=1343625459713, VALUE=LOVESWH ... Row (s) in 0.5480 seconds
2. Filter By column name substring
HBase (main):006:0> scan ' toplist_ware_ios_1009_201231 ', {columns=>[' info: '],filter=> ' (qualifierfilter (=, ' Substring:id ')) "}row column+cell0000000001 Column=info:loginid, timestamp=1343625459713, value= jjm1681310130000000001 Column=info:userid, timestamp=1343625459713, value=1681310130000000002 Column=info:loginid, timestamp=1343625459713, value=loveswh0000000002 Column=info:userid, timestamp=1343625459713, value=100898152hbase (main):005:0> scan ' toplist_ware_ios_1009_201231 ', {columns=>[' Info:loginid '],filter=> ' (QualifierFilter ( =, ' Substring:id ')) "}row column+cell0000000001 Column=info:loginid, timestamp=1343625459713, value= jjm1681310130000000002 Column=info:loginid, timestamp=1343625459713, Value=loveswhhbase (main):007:0> scan ' toplist_ware_ios_1009_201231 ', {columns=>[' info: '],filter=> ' (qualifierfilter (=, ' Substring:nid ')) "}ROW column+cell0000000001 Column=info:loginid, timestamp=1343625459713, value=jjm1681310130000000002 column=info: loginID, timestamp=1343625459713, Value=loveswhhbase (main):008:0> scan ' toplist_ware_ios_1009_201231 ', {columns=>[' info: '],filter=> ' (Qualifierfilter (=, ' Substring:nick ')) "}row column+cell0000000001 Column=info:nick, timestamp=1343625459713, value= \xe5\xae\xb6\xe6\x9c\x89\xe8\x99\x8e\xe5\xae\x9d0000000002 Column=info:nick, timestamp=1343625459713, value= Loveswh08
3.Value Filtration
3.1 Regular filter hbase (main):004:0> scan ' toplist_ware_ios_1009_201231 ',{columns=> ' info ',filter=> "( Singlecolumnvaluefilter (' info ', ' nick ', =, ' regexstring:.*99 ', true,true) '}row column+cell 0000                       000009 Column=info:loginid, timestamp=1343625459713, value=zgh1968 0000000009 Column=info:nick, timestamp=1343625459713, value=zwy99 0000000009 Column=info:score, timestamp=13436  25459713, value=5 0000000009 Column=info:userid, timestamp=1343625459713, value=1003662621 Row (s) in 0.2520 seconds3.2 Sub-string imports import Org.apache.hadoop.hbase.filter.CompareFilterimport Org.apache.hadoop.hbase.filter.SingleColumnValueFilterimport Org.apache.hadoop.hbase.filter.SubstringComparatorimport org.apache.hadoop.hbase.util.Byteshbase (main): 028:0 > Scan ' toplist_ware_ios_1001_201231 ', {COLUMNS = ' Info:nick ', filter=>singlecolumnvaluefilter.new ( Bytes.tobytes (' info '), bytes.tobytes (' Nick '), comparefiltEr::compareop.valueof (' EQUAL '), substringcomparator.new (' 8888 '))}row column+cell0000000002 Column=info:nick, timestamp=1343625446556, value=\xe7\x81\x8f???? \xe3\x81\x8a?? 88881 row (s) in 0.0330 seconds3.3 binary substring, etc. do not support multibyte text, so use binary to compare HBase (main):010:0> scan ' toplist_ware_ios_1009_201231 ' {columns=>[' info: '],filter=> ' (qualifierfilter (=, ' Substring:nick ') and valuefilter (=, ' binary:7789\xe6\xb4\ X81 ')) "}row column+cell0000000016 Column=info:nick, timestamp=1343625459713, value=7789\xe6\xb4\x811 ROW (s) in 0.1710 Seconds
4 composite column name substring and value binary comparison
HBase (main):012:0> scan ' toplist_ware_ios_1009_201231 ', {columns=>[' info: '],filter=> ' (qualifierfilter (=, ' Substring:nick ') and valuefilter (=, ' binary:7789\xe6\xb4\x81 ')) "}row column+cell0000000016 Column=info:nick, timestamp=1343625459713, value=7789\xe6\xb4\x811 Row (s) in 0.0120 seconds
HBase (main):014:0> scan ' toplist_ware_ios_1009_201231 ',{columns=> "info:",filter=> "(Prefixfilter (' 000000002 ') and (qualifierfilter (=, ' Substring:nick ') "}row Column+cell 0000000020 Column=info:nick, timestamp= 1343625459713, Value=denny_feng 0000000021 Column=info:nick, timestamp=1343625459713, value=\xE5\xB0\x8F\xE7\xBD\ X97\xe6\x95\x99\xe7\xbb\x8 312 row (s) in 0.0440 seconds
5. Line Query

HBase (main):005:0> get ' toplist_ware_ios_1009_201231 ', ' 0000000009 ' COLUMN CELL info:loginid timestamp= 1343625459713, value=zgh1968 info:nick timestamp=1343625459713, value=zwy99 info:score timestamp=1343625459713, value =5 Info:userid timestamp=1343625459713, value=1003662624 Row (s) in 0.1000 seconds
HBase (main):006:0> get ' toplist_ware_ios_1009_201231 ', ' 0000000009 ', ' Info:nick ' COLUMN CELL info:nick timestamp= 1343625459713, value=zwy991 Row (s) in 0.0100 seconds
 hbase (Main):009:0> scan ' toplist_ware_ios_1009_201231 ',filter=> "prefixfilter (' 000000002 ')" ROW column+ CELL 0000000020 Column=info:loginid, timestamp=1343625459713, value=jjm169212318 0000000020 Column=info:nick, timestamp=1343625459713, Value=denny_feng 0000000020 column=info:score, timestamp=1343625459713, value=1 0000000020 Column=info:userid, timestamp=1343625459713, value=169212318 0000000021 Column=info:loginid, timestamp= 1343625459713, value=jjm169371841 0000000021 Column=info:nick, timestamp=1343625459713, value=\xE5\xB0\x8F\xE7\xBD\ x97\xe6\x95\x99\xe7\xbb\x8 0000000021 Column=info:score, timestamp=1343625459713, value=1 0000000021 column=info: UserID, timestamp=1343625459713, value=1693718412 Row (s) in 0.0180 seconds 
HBase (main):010:0> scan ' toplist_ware_ios_1009_201231 ',filter=> "prefixfilter (' 000000002 ')", Limit=>1row Column+cell 0000000020 Column=info:loginid, timestamp=1343625459713, value=jjm169212318 0000000020 Column=info:nick, timestamp=1343625459713, Value=denny_feng 0000000020 column=info:score, timestamp=1343625459713, value=1 0000000020 Column=info:userid, timestamp=1343625459713, value=1692123181 Row (s) in 0.0170 seconds
HBase (main):011:0> scan ' toplist_ware_ios_1009_201231 ',{columns=> "Info:nick",filter=> "Prefixfilter (' 000000002 ') ", Limit=>1}row Column+cell 0000000020 Column=info:nick, timestamp=1343625459713, value=Denny_feng1 Row (s) in 0.0160 seconds

HBase query, scan detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.