Turn from: http://my.oschina.net/u/189445/blog/595232
HBase shell command |
Describe |
Alter |
Modify Column family (family) mode |
Count |
Number of rows in the statistics table |
Create |
Create a table |
Describe |
Show table-related details |
Delete |
Deletes the value of the specified object (you can specify a value for a table, row, column, or a timestamp value) |
DeleteAll |
Deletes all element values for the specified row |
Disable |
Make Table invalid |
Drop |
Delete Table |
Enable |
Make a table valid |
Exists |
Test whether the table exists |
Exit |
Exit HBase Shell |
Get |
Gets the value of a row or cell (cells) |
incr |
Increase the value of a specified table, row, or column |
List |
List all tables that exist in HBase |
Put |
Add a value to a table cell that points to |
Tools |
List the tools supported by HBase |
Scan |
To get the value of a pair by scanning the table |
Status |
Returns the status information for the HBase cluster |
Shutdown |
Close HBase cluster (different from exit) |
Truncate |
Re-create the specified table |
Version |
Return HBase version Information |
Note the difference between shutdown and exit: shutdown indicates that the HBase service must be restarted before hbase can be restored, exit simply exits the hbase shell and can be completely reopened after exiting.
HBase uses coordinates to locate data in a table, which is the first coordinate and the next coordinate is the column family.
HBase is an online system that is tightly integrated with the Hadoop mapreduce and gives it offline access.
HBase a command to save change information or write a failure exception, by default. Writes are written to two places: the Write-ahead log (also known as Hlog) and memstore to ensure data persistence. Memstore is a write buffer in memory. The client does not interact directly with the underlying hfile during the write process, and when Menstore is full, it refreshes to the hard disk, generating a new hfile.hfile that is the underlying storage format used by HBase. The size of the menstore is defined by the system-level attribute Hbase.hregion.memstore.flush.size in the Hbase-site.xml file.
HBase uses the LRU caching mechanism (Blockcache) for read operations, Blockcache designed to hold frequently accessed data read from the hfile to avoid hard disk reads. Each clan has its own blockcache. The block in Blockcache is the unit of data that HBase reads from the hard disk. The block is the smallest unit of data that is indexed and the smallest unit of data that is read from the hard disk. If used primarily for random queries, smaller blocks will be better, but will cause the index to become larger, consume more memory, if the main execution of sequential scan, larger block will be better, block variable large index entries, thus saving memory.
LRU is the least recently used algorithm for least recently Used. Memory management of a page replacement algorithm, for in-memory but not use the block (memory block) is called LRU, the operating system according to which data belong to the LRU and remove it out of memory to make room to load additional data.
Data Model Overview:
Table---------HBase to organize data using tables. The table name is a string, consisting of characters that can be used in the file system path.
Rows (row)---------in table, data is stored in rows. A row is uniquely identified by a row health (Rowkey). Row health has no data type and is always treated as a byte array byte[].
The Columns Family (column family)-----------rows of data are grouped by column family, and the column family also affects the physical storage of hbase data. Therefore, they must be defined beforehand and not easily modified. Each row in the table has the same column family, although rows do not need to store data in each column family. The column family name is a string that consists of characters that can be used in the file system path. (HBase can add a column family, alter ' T1 ', {NAME => ' F1 ', versions => 5} Disable the table after alter, then enable)
Column qualifiers (columns qualifier)--------The data in the column family is positioned by a column qualifier or column. Column qualifiers do not have to be defined beforehand. Column qualifiers do not have to be consistent with each other, just as with a row, the column qualifier has no data type and is always treated as a byte array byte[].
Unit (cell)-------Row Health, the column family and the column qualifier together to determine a unit. The data stored in a cell is called a Cell value (value), and there is no data type, and is always treated as a byte array byte[].
Time versions (version)--------cell values are sometimes versions, time versions are identified with timestamps, and are a long. When a time version is not specified, the current timestamp is the base of the operation. The number of HBase reserved cell value time versions is configured based on the column family. The default number is 3.
HBase data in the table and the use of four-dimensional coordinate system, followed by: Row-kin, column family, column qualifiers and time version. HBase the time stamp in descending order, and the other mappings are sorted in ascending order.
HBase stores data on a distributed file system that provides a single namespace. A table consists of several smaller region, The server hosting region is called Regionserver. A single region size is determined by the configuration parameter hbase.hregion.max.filesize, which is divided into 2 region when a region size becomes larger than this value.
HBase is a database built on Hadoop. Rely on Hadoop to achieve data access and data reliability. HBase is an online system that targets low latency, while Hadoop is an off-line system optimized for throughput. Complementary can build horizontally extended data applications.
The representation in HBase is stored by column family
Create a table with 3 column family
Create ' t1 ', {name => ' F1 ', versions => 1}, {name => ' F2 ', versions => 1}, {name => ' F3 ', versions => 1}
When you define a table, you only need to specify the name of column family, which is dynamically specified when put
Inserting data
Insert a name with no column specified below
Put ' t1 ', ' R1 ', ' F1 ', ' v1 '
Put ' T1 ', ' R2 ', ' F2 ', ' v2 '
Put ' T1 ', ' R3 ', ' F3 ', ' V3 '
Insert the name of the specified column below
Put ' t1 ', ' R4 ', ' f1:c1 ', ' v1 '
Put ' T1 ', ' R5 ', ' f2:c2 ', ' v2 '
Put ' T1 ', ' R6 ', ' f3:c3 ', ' v3 '
HBase (main):245:0> scan ' T1 '
ROW Column+cell
R1 COLUMN=F1:, timestamp=1335407967324, VALUE=V1
R2 COLUMN=F2:, timestamp=1335408004559, Value=v2
R4 column=f1:c1, timestamp=1335408640777, VALUE=V1
R5 Column=f2:c1, timestamp=1335408640822, Value=v2
R6 column=f1:c6, timestamp=1335412392258, Value=v3
R6 column=f2:c1, timestamp=1335412384739, Value=v3
R6 column=f2:c2, timestamp=1335412374797, Value=v3
Inserting multiple columns of data
Put ' t1 ', ' R7 ', ' f1:c4 ', ' v9 '
Put ' t1 ', ' R7 ', ' f2:c3 ', ' v9 '
Put ' t1 ', ' R7 ', ' f3:c2 ', ' v9 '
Write Memstore to hfile by hand.
Flush ' T1 '
Delete all CF3 data
DeleteAll ' t1 ', ' R7 '
Flush ' T1 '
Every flash will build a new hfile.
$ .. /bin/hadoop DFS-LSR/HBASE/T1
Data is stored directly under the CF directory, each CF directory has 3 to 4 hfile
F1
f1/098a7a13fa53415b8ff7c73d4d69c869
f1/321c6211383f48dd91e058179486587e
F1/9722a9be0d604116882115153e2e86b3
F2
F2/43561825dbde4900af4fb388040c24dd
f2/93a20c69fdec43e8beeed31da8f87b8d
f2/b2b126443bbe4b6892fef3406d6f9597
F3
f3/98352b1b34e242ecac72f5efa8f66963
f3/e76ed1b564784799affa59fea349e00d
f3/f9448a9a381942e7b785e0983a66f006
F3/fca4c36e48934f2f9aaf1a585c237d44
F3 data has been deleted, because there are no merged files
Manually merged hfile
HBase (Main):244:0> compact ' T1 '
0 row (s) in 0.0550 seconds
$ .. /bin/hadoop DFS-LSR/HBASE/T1
F1
f1/00c05ba881a14ca0bdea55ab509c2327
F2
F2/95fbe85769d64fc4b291cabe73b1ddb2
/f3
There's only one hfile,f3 under F1 and F2. hfile because the data was deleted.
You can put only one column at a time
You can delete only one column at a time
Delete whole line, with DeleteAll
DeleteAll ' T1 ', ' R1 '
HBase table Design:
The HBase table is flexible enough to store anything in a character array. All things that store similar access patterns in the same column family.
The index is built on the key part of the KeyValue object, and key consists of a row, a column qualifier, and a timestamp in order. High table may support you to reduce the computational complexity to O (1), but pay a price for atomicity.
HBase does not support cross row transactions, column qualifiers can be used to store data, and the length of the column family name affects the size of the data that is passed back to the client over the network (in the KeyValue object), so be as concise as possible.
Hashing supports fixed-length keys and better data distribution, but loses the benefit of sorting. When designing HBase mode, it is a feasible way to deal with the inverse normalization. From a performance standpoint, normalization is optimized for writing, while the reverse normalization is optimized for reading.
Enter HBase Shell Console
$HBASE _home/bin/hbase Shell
If you have Kerberos authentication, you need to use the appropriate keytab for authentication (using the Kinit command), and then use the HBase shell after the authentication is successful. You can use the WhoAmI command to view the current user
HBase (main) > WhoAmI
The management of the table
1 List all created tables (except-root tables and. Meta tables (filtered))
HBase (main) > list
2 Create the table, where T1 is the table name, F1, F2 is the T1 column family. The table in HBase has at least one column family. Among them, the column family directly affects the physical characteristics of the hbase data storage.
# syntax: Create <table>, {NAME => <family>, versions => <versions>}
# For example: Create a table T1 with two family Name:f1,f2, with a version of 2
HBase (Main) > create ' t1 ', {NAME => ' F1 ', versions => 2},{name => ' F2 ', versions => 2}
3) Delete Table
Two steps: First disable, then drop
For example: Delete table T1
HBase (Main) > Disable ' T1 '
HBase (Main) > drop ' t1 '
4 View the structure of the table
# syntax: describe (DESC) <table> (you can see all the default parameters for this table)
# For example: View the structure of table T1
HBase (Main) > describe ' t1 '/desc ' t1 '
5) Modify Table structure
Modify table structure must first disable
# syntax: Alter ' t1 ', {name => ' F1 '}, {name => ' F2 ', method => ' delete '}
# For example: To modify the table Test1 CF TTL is 180 days
HBase (Main) > Disable ' test1 '
HBase (Main) > Alter ' test1 ',{name=> ' body ',ttl=> ' 15552000 '},{name=> ' meta ', ttl=> ' 15552000 '}
HBase (Main) > Enable ' Test1 '
Rights Management
1) Assigning Permissions
# syntax: Grant <user> <permissions> <table> <column family> <column qualifier> parameters separated by commas
# permission is represented by five letters: "RWXCA".
# READ (' R '), WRITE (' W '), EXEC (' X '), CREATE (' C '), ADMIN (' A ')
# For example, give the user ' test ' the right to read and write to the table T1,
HBase (Main) > Grant ' Test ', ' RW ', ' T1 '
2) View Permissions
# syntax: User_permission <table>
# For example, view the list of permissions for table T1
HBase (Main) > user_permission ' t1 '
3) Withdrawal of authority
# Similar to assignment permissions, syntax: Revoke <user> <table> <column family> <column qualifier>
# For example, retract the test user's permissions on the table T1
HBase (Main) > Revoke ' Test ', ' T1 '
Check and delete the data of table
1) Add data
# Grammar: Put <table>,<rowkey>,<family:column>,<value>,<timestamp>
# For example: Add a row to the table T1: Rowkey is rowkey001,family name:f1,column name:col1,value:value01,timestamp: System default
HBase (Main) > put ' t1 ', ' rowkey001 ', ' f1:col1 ', ' value01 '
The usage is more unitary.
2) query data
A) querying a row record
# syntax: Get <table>,<rowkey>,[<family:column>,....]
# For example: Query table t1,rowkey001 in the F1 under the Col1 value
HBase (Main) > Get ' T1 ', ' rowkey001 ', ' f1:col1 '
Or
HBase (Main) > Get ' T1 ', ' rowkey001 ', {column=> ' f1:col1 '}
# All column values under F1 in the query table t1,rowke002
HBase (Main) > Get ' T1 ', ' rowkey001 '
b) Scan table
# syntax: Scan <table>, {COLUMNS => [<family:column>,....], LIMIT => num}
# In addition, you can add advanced features such as StartRow, Timerange, and Fitler
# For example: Scan table T1 the first 5 data
HBase (Main) > scan ' t1 ', {limit=>5}
c) The number of rows of data in the query table
# syntax: Count <table>, {INTERVAL => intervalnum, CACHE => Cachenum}
# interval set How many lines to display once and the corresponding Rowkey, default 1000;cache each time to fetch the buffer size, the default is 10, adjust this parameter can improve query speed
# For example, the number of rows in the query table T1, every 100 shows once, and the buffer is 500
HBase (Main) > count ' T1 ', {INTERVAL => MB, CACHE => 500}
3) Delete data
A to delete a column value in a row
# syntax: delete <table>, <rowkey>, <family:column>, <timestamp>, must specify column name
# For example: Delete the F1:COL1 data from the table t1,rowkey001
HBase (main) > delete ' t1 ', ' rowkey001 ', ' f1:col1 '
Note: Data for all versions of the row f1:col1 column will be deleted
b) Delete rows
# syntax: DeleteAll <table>, <rowkey>, <family:column>, <timestamp> You can delete an entire row of data without specifying a column name
# For example: Delete Table t1,rowk001 data
HBase (Main) > DeleteAll ' t1 ', ' rowkey001 '
(c) Delete all data in the table
# syntax: Truncate <table>
# Its specific process is: Disable table-> drop table-> CREATE table
# For example: Delete all data from table T1
HBase (Main) > truncate ' t1 '
Region Management
1) Mobile Region
# syntax: Move ' encoderegionname ', ' ServerName '
# Encoderegionname refers to the regioname behind the code, servername refers to the Master-status region servers list
# example
HBase (Main) >move ' 4343995a58be8e5bbc739af1e91cd72d ', ' db-41.xxx.xxx.org,60020,1390274516739 '
2) Open/close region
# syntax: Balance_switch true|false
HBase (main) > Balance_switch
3) Manual Split
# Grammar: Split ' regionname ', ' Splitkey '
4) manually trigger major compaction
#语法:
#Compact all regions in a table:
#hbase > major_compact ' t1 '
#Compact an entire region:
#hbase > Major_compact ' R1 '
#Compact a single column family within a region:
#hbase > Major_compact ' R1 ', ' C1 '
#Compact a single column family within a table:
#hbase > major_compact ' t1 ', ' C1 '
Configuration Management and Node restart
1) Modify HDFS configuration
HDFs Configuration Location:/etc/hadoop/conf
# Sync HDFs Configuration
Cat/home/hadoop/slaves|xargs-i-T Scp/etc/hadoop/conf/hdfs-site.xml hadoop@{}:/etc/hadoop/conf/hdfs-site.xml
#关闭:
Cat/home/hadoop/slaves|xargs-i-t ssh hadoop@{} "sudo/home/hadoop/cdh4/hadoop-2.0.0-cdh4.2.1/sbin/hadoop-daemon.sh --config/etc/hadoop/conf Stop Datanode "
#启动:
Cat/home/hadoop/slaves|xargs-i-t ssh hadoop@{} "sudo/home/hadoop/cdh4/hadoop-2.0.0-cdh4.2.1/sbin/hadoop-daemon.sh --config/etc/hadoop/conf start Datanode "
2) Modify HBase configuration
HBase Configuration Location:
# Sync HBase Configuration
Cat/home/hadoop/hbase/conf/regionservers|xargs-i-T Scp/home/hadoop/hbase/conf/hbase-site.xml hadoop@{}:/home/ Hadoop/hbase/conf/hbase-site.xml
# Graceful Reboot
CD ~/hbase
bin/graceful_stop.sh--restart--reload--debug inspurXXX.xxx.xxx.org