HBase Shell Basics and Common commands

Source: Internet
Author: User
Keywords Can name timestamp delete this
Http://www.aliyun.com/zixun/aggregation/13713.html ">hbase is a distributed, column-oriented open source database, rooted in a Google paper BigTable: A distributed storage system of structured data. HBase is the Open-source implementation of Google BigTable, using Hadoop HDFs as its file storage system, using Hadoop MapReduce to handle the massive data in HBase, using zookeeper as a collaborative service.


1. Brief introduction


HBase is a distributed, column-oriented open source database, rooted in a Google paper, "BigTable: a distributed storage system with structured data." HBase is the Open-source implementation of Google BigTable, using Hadoop HDFs as its file storage system, using Hadoop MapReduce to handle the massive data in HBase, using zookeeper as a collaborative service.


2. HBASE Table Structure


HBase stores data as a table. The table consists of rows and columns. Columns are divided into a number of column families/clusters (column accessibility).


Row Key column-family1 column-family2 column-family3


column1 column2 column1 column2 column3 column1


Key1


Key2


Key3


As shown in the figure above, Key1,key2,key3 is the only row key value for three records, and Column-family1,column-family2,column-family3 is a three-column family with several columns under each column family. For example column-family1 This column family consists of two columns, the name is Column1 and COLUMN2,T1:ABC,T2:GDXDF is uniquely identified by row Key1 and column-family1-column1 a cell. There are two data in this cell, ABC and GDXDF. The two-value timestamp is different, t1,t2, and HBase returns the value of the most recent time to the requester.


the specific meanings of these nouns are as follows:


(1) Row Key


like the NoSQL database, row key is the primary key used to retrieve records. There are only three ways to access the rows in HBase table:


(1.1) Access
through a single row key

(1.2) through the range of row key


(1.3) Full table scan

The
row key (row keys) can be any string (the maximum length is 64KB, the actual length of the application is generally 10-100bytes), and within HBase, the row key is saved as a byte array.

When
is stored, the data is sorted according to the dictionary order of the row key (byte ordering). When you design a key, you want to fully sort the storage feature and put together the row stores that are often read together. (Position dependency)


Note:

The result of the
dictionary order for int is 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,..., 9,91,92,93,94,95,96,97,98,99. To maintain the natural order of the reshaping, the line keys must be filled with 0 left.


is an atomic operation (regardless of how many columns are read or written). This design decision makes it easy for the user to understand the behavior of the program when the concurrent update operation is performed on the same row.


(2) Row family column Accessibility


Each column in the HBase table belongs to a column family. The column family is part of the Chema of the table (and the column is not) and must be defined before the table is used. The column names are prefixed by the column family. For example, Courses:history, Courses:math belong to courses.


access control, disk, and memory usage statistics are at the column family level. Practical application, control permissions on the column family can help us manage different types of applications: we allow some applications to add new basic data, some applications can read basic data and create inherited column families, and some applications only allow browsing of data (and may even be unable to browse all data for privacy reasons).


(3) cell


hbase is defined by row and columns as a cell for a storage unit. The only cells identified by {row key, column (=<family> + <label>), version}. The data in the cell is of no type, and is all byte-code form storage.


(4) timestamp timestamp


Each cell holds multiple versions of the same data. The version is indexed by the timestamp. The timestamp type is a 64-bit integral type. Timestamps can be assigned by HBase (automatically when data is written), when the timestamp is the current system time that is accurate to milliseconds. The timestamp can also be explicitly assigned by the customer. If your application wants to avoid data version conflicts, you must generate a unique timestamp yourself. In each cell, different versions of the data are sorted in reverse chronological order, that is, the most recent data is in the front.


to avoid the burden of management (including storage and indexing) caused by excessive versions of data, HBase provides two ways to recycle data versions. The first is to save the last n versions of the data, and the second is to save the latest version (for example, the last seven days). Users can set up for each column family.


3. Basic usage of HBase shell


HBase provides a shell terminal for user interaction. Use the command hbase Shell to enter the command interface. You can see the Help information for the command by executing the.


the use of hbase with an example of a student's score sheet online.


Name Grad marshalling


Math Art


Tom 5 97 87


Jim 4 89 80


here grad for the table is only its own column family, marshalling for the table is a two column family, this row of two columns composed of math and art, of course, we can according to our needs in the marshalling to build more of the column family, such as computer, Physics and other corresponding columns added to the marshalling column family.


(1) Establish a table scores with two grad and Courese


hbase (Main):001:0> create ' scores ', ' Grade ', ' marshalling '


can use the list command to see which tables are in the current hbase. Use the describe command to view the table structure. (Remember all the indicated, column names need to be quoted)


(2) Insert value by Design table structure:


put ' scores ', ' Tom ', ' Grade: ', ' 5 '


put ' scores ', ' Tom ', ' Course:math ', ' 97 '


put ' scores ', ' Tom ', ' Course:art ', ' 87 '


put ' scores ', ' Jim ', ' Grade ', ' 4 '


put ' scores ', ' Jim ', ' marshalling: ', ' 89 '


put ' scores ', ' Jim ', ' marshalling: ', ' 80 '


so that the table structure up, in fact, relatively free, the column family inside can be free to add a child column is very convenient. If there are no child columns under the column family, it is OK to add without a colon.


put command is relatively simple, only this one usage:


hbase> put ' t1 ', ' R1 ', ' C1 ', ' value ', Ts1


T1 refers to the table name, R1 refers to the row key name, C1 refers to the column name, value refers to the cell value. Ts1 refers to the time stamp, generally omitted.


(3) query data based on key values


get ' scores ', ' Jim '


get ' scores ', ' Jim ', ' Grade '


you may find that a regular, hbase shell operation, a general order is the operation of the keyword followed by the table name, row name, column name such an order, if there are other conditions and then add curly braces.


get has the following usage:


hbase> get ' t1 ', ' R1 '


hbase> get ' t1 ', ' R1 ', {timerange => [Ts1, TS2]}


hbase> get ' t1 ', ' R1 ', {COLUMN => ' C1 '}


hbase> get ' t1 ', ' R1 ', {COLUMN => [' C1 ', ' C2 ', ' C3 ']}


hbase> get ' t1 ', ' R1 ', {COLUMN => ' C1 ', TIMESTAMP => ts1}


hbase> get ' t1 ', ' R1 ', {COLUMN => ' C1 ', Timerange => [Ts1, TS2], versions => 4}


hbase> get ' t1 ', ' R1 ', {COLUMN => ' C1 ', TIMESTAMP => ts1, versions => 4}


hbase> get ' t1 ', ' R1 ', ' C1 '


hbase> get ' t1 ', ' R1 ', ' C1 ', ' C2 '


hbase> get ' t1 ', ' R1 ', [' C1 ', ' C2 ']


(4) scan all data


scan ' scores '


can also specify modifiers: Timerange, FILTER, LIMIT, StartRow, Stoprow, TIMESTAMP, Maxlength,or COLUMNS. No modifiers, just the top sentences, show all the rows of data.


examples are as follows:


hbase> Scan '. META. '


hbase> Scan '. META. ', {COLUMNS => ' info:regioninfo '}


hbase> scan ' t1 ', {COLUMNS => [' C1 ', ' C2 '], LIMIT =>, StartRow => ' xyz '}


hbase> scan ' t1 ', {COLUMNS => ' C1 ', Timerange => [1303668804, 1303668904]}


hbase> scan ' t1 ', {FILTER => ' prefixfilter (' Row2 ') and (Qualifierfilter (>=, ' binary:xyz ')) and ( Timestampsfilter (123, 456)) "}


hbase> scan ' t1 ', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new (1, 0)}


Filter filter has two methods to indicate:


A. Using a filterstring-more information on it available in the


Filter Language document Checkmark to the HBASE-4176 JIRA


B. Using the entire package name of the filter.


also has a cache_blocks modifier, switch scan cache, the default is open (Cache_blocks=>true), you can choose to close (Cache_blocks=>false).


(5) Delete the specified data


Delete ' scores ', ' Jim ', ' Grade '


Delete ' scores ', ' Jim '


Delete Data command also not much change, only one:


hbase> Delete ' t1 ', ' R1 ', ' C1 ', Ts1


also has a deleteall command, you can do the entire line of the range of delete operations, use caution!


If you need to do a full table delete operation, use the TRUNCATE command, in fact, there is no direct full table Delete command, this command is also a combination of disable,drop,create three commands.


(6) Modify table structure


disable ' scores '


alter ' scores ',name=> ' info '


enable ' scores '

The
alter command is used as follows (in the case of a successful version, a general table disable is required):


A, changing or adding a column family:


hbase> alter ' T1 ', NAME => ' F1 ', versions => 5


B, delete a column family:


hbase> alter ' T1 ', NAME => ' F1 ', method => ' delete '


hbase> alter ' t1 ', ' delete ' => ' F1 '


C, you can also modify table properties such as Max_filesize


memstore_flushsize, READONLY, and Deferred_log_flush:


hbase> alter ' T1 ', method => ' Table_att ', max_filesize => ' 134217728 '


d, you can add a table to collaborate with the processor


hbase> alter ' T1 ', method => ' Table_att ', ' coprocessor ' => ' 1001|arg1=1,arg2=2 '


A table can be configured with multiple collaboration processors, and a sequence automatically grows to identify. Loading a collaborative processor (which can be said to be a filtering program) requires the following rules:


[coprocessor jar file Location] | Class name | [Priority] | [Arguments]


E, remove coprocessor as follows:


hbase> alter ' T1 ', method => ' Table_att_unset ', NAME => ' max_filesize '


hbase> alter ' T1 ', method => ' Table_att_unset ', NAME => ' coprocessor$1 '


F, multiple ALTER commands can be executed at one time:


hbase> alter ' T1 ', {name => ' F1 '}, {name => ' F2 ', method => ' delete '}


(7) Statistics of rows:


hbase> count ' T1 '


hbase> count ' t1 ', INTERVAL => 100000


hbase> count ' t1 ', CACHE => 1000


hbase> count ' t1 ', INTERVAL =>, CACHE => 1000


count is typically time-consuming, and statistical results are cached by using MapReduce, and the default is 10 rows. The statistical interval defaults to 1000 rows (INTERVAL).


(8) Disable and enable Operation


Many operations need to pause the availability of the table, such as the alter operation mentioned above, which is also required to delete the table. Disable_all and Enable_all can operate more tables.


(9) Table deletion


to stop the table's usability and then execute the delete command.


drop ' t1 '


above is a number of commonly used commands in detail, specific to all hbase shell commands as follows, divided a few command groups, see English can be seen probably useful, detailed use of help "cmd" to understand.


COMMAND GROUPS:


Group name:general


Commands:status, Version


Group Name:ddl


Commands:alter, Alter_async, Alter_status, create, describe, disable, disable_all, drop, Drop_all,


Enable, Enable_all, exists, is_disabled, is_enabled, list, show_filters


Group name:dml


Commands:count, delete, DeleteAll, get, Get_counter, incr, put, scan, truncate


Group Name:tools


commands:assign, Balance_switch, balancer, close_region, compact, flush, Hlog_roll, Major_compact,


move, Split, unassign, Zk_dump


Group name:replication


Commands:add_peer, Disable_peer, Enable_peer, List_peers, Remove_peer, Start_replication,


stop_replication


Group name:security


commands:grant, revoke, user_permission


4. HBase Shell Script


since it is a shell command, of course, you can also write all the hbase shell commands into a file, like the Linux shell script program in order to execute all the commands. Like writing a Linux shell, write all hbase shell commands in one file and execute the following command:


  $ hbase Shell Test.hbaseshell
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.