1. Introduction
Hbase is a distributed, column-oriented open-source database. It originated from Google's paper bigtable: a distributed storage system for structured data. Hbase is an open-source implementation of Google bigtable. It uses hadoop HDFS as its file storage system, hadoop mapreduce to process massive data in hbase, and zookeeper as a collaborative service.
2. hbase table structure
Hbase stores data in tables. A table consists of rows and columns. The column is divided into several column families/column families ).
Row key |
Column-family1 |
Column-family2 |
Column-family3 |
Column1 |
Column2 |
Column1 |
Column2 |
Column3 |
Column1 |
Key1 |
T1: ABC T2: gdxdf |
|
T4: dfads T3: Hello T2: World |
|
|
|
Key2 |
T3: ABC T1: gdxdf |
|
T4: dfads T3: Hello |
|
T2: dfdsfa T3: dfdf |
|
Key3 |
|
T2: dfadfasd T1: dfdasddsf |
|
|
|
T2: dfxxdfasd T1: taobao.com |
As shown in, key1, key2, key3 is the unique row key value of three records, column-family1, column-family2, and column-family3 are three columns, each containing several columns. For example, the column-family1 family contains two columns named column1 and column2, T1: ABC, T2: gdxdf is a cell uniquely identified by row key1 and the column-family1-column1. The cell contains two data types: ABC and gdxdf. The timestamps of the two values are different, T1, T2, and hbase returns the value of the latest time to the requester.
The specific meanings of these terms are as follows:
(1) Row key
Like nosql databases, row keys are the primary keys used to retrieve records. There are only three methods to access rows in hbase table:
(1.1) access through a single row key
(1.2) use the range of the row key
(1.3) full table Scan
The row key can be any string (the maximum length is 64 KB, and the actual length is generally 10-bytes). In hbase, the row key is saved as a byte array.
Data is stored in the Lexicographic Order (byte order) of the row key. When designing keys, you need to fully sort and store the rows that are frequently read together. (Location correlation)
Note:
The result of the lexicographically ordered int is 1, 10, 11, 12, 13, 16, 17, 18, 19, 21 ,..., 9,91, 92,93, 94,95, 96,97, 98,99. To maintain the natural order of the integer, the row key must be left filled with 0.
One read/write operation on a row is an atomic operation (no matter how many columns are read/written at a time ). This design decision makes it easy for users to understand the program's behavior when performing concurrent update operations on the same row.
(2) column family
Each column in The hbase table belongs to a column family. A column family is a part of the table's Chema (rather than a column) and must be defined before the table is used. All column names are prefixed with column families. For example, courses: History and courses: math all belong to the courses column family.
Access control, disk and memory usage statistics are all performed at the column family level. In practical applications, the control permissions on the columnfamily can help us manage different types of applications: we allow some applications to add new basic data, some applications to read basic data and create inherited columnfamily, and some applications to only browse data (or even not because of privacy ). all data ).
(3) Cell
In hbase, a storage unit identified by row and columns is called cell. Uniquely identified by {row key, column (= <family> + <label>), and version. The data in cell is of no type and all are stored in bytecode format.
(4) Timestamp
Each cell stores multiple versions of the same data. Versions are indexed by timestamps. The timestamp type is a 64-bit integer. The timestamp can be assigned by hbase (automatically when data is written). The timestamp is accurate to the current system time in milliseconds. The timestamp can also be explicitly assigned by the customer. To avoid data version conflicts, the application must generate a unique timestamp. In each cell, data of different versions are sorted in reverse chronological order, that is, the latest data is ranked first.
To avoid the management (including storage and indexing) burden caused by excessive data versions, hbase provides two data version recycling methods. The first is to save the last n versions of the data, and the second is to save the versions (for example, the last seven days) in the recent period ). You can set for each column family.
3. Basic usage of hbase Shell
Hbase provides a shell terminal for user interaction. Run help get to view the help information of the command.
The usage of hbase is demonstrated using an example of an online student sequence table.
Name |
Grad |
Course |
Math |
Art |
Zkb |
5 |
97 |
87 |
Baoniu |
4 |
89 |
80 |
Here, grad is a column for a table, and course is a column family for the table. This column family consists of two columns: Math and art, of course, we can create more columns in course as needed, such as computer and physics, and add columns to the course columnfamily. Note the value 90 in the figure. The columns in the column family can also have no names.
(1) create a table. Scores has two columns: grad and courese.
hbase(main): 001 : 0 > create 'scores' , 'grade' ,
'course' |
0 row(s) in 0.4780 seconds |
(2) view the tables in the current hbase
1 row(s) in 0.0270 seconds |
(3) view the table structure
hbase(main): 004 : 0 > describe 'scores' |
{NAME => 'scores' , FAMILIES => [{NAME => 'course' , BLOOMFILTER =>
'NONE' , REPLICATION_SCOPE => '0' ,
true |
COMPRESSION => 'NONE' , VERSIONS => '3' , TTL =>
'2147483647' , BLOCKSIZE => '65536' , IN_MEMORY => 'fal |
se ', BLOCKCACHE => ' true '}, {NAME => ' grade ', BLOOMFILTER => ' NONE ', REPLICATION_SCOPE => ' 0 ', COMPR |
ESSION => 'NONE' , VERSIONS =>
'3' , TTL => '2147483647' , BLOCKSIZE => '65536' , IN_MEMORY =>
'false' , |
1 row(s) in 0.0390 seconds |
(4) Add a row of data. The column name of the row named grad in the zkb column family is "" value 5
hbase(main): 006 : 0 > put 'scores' , 'zkb' , 'grade:' , '5' |
0 row(s) in 0.0420 seconds |
(5) Add a column to the column family course of the Data row zkb <math, 97>
hbase(main): 007 : 0 > put 'scores' , 'zkb' , 'course:math' , '97' |
0 row(s) in 0.0270 seconds |
(6) Add a column to the column family course of the Data row zkb <art, 87>
hbase(main): 008 : 0 > put 'scores' , 'zkb' , 'course:art' , '87' |
0 row(s) in 0.0260 seconds |
(7) Add a row of data. The row name is the grad column name of the baoniu columnfamily. The column name is "" and the value is 4.
hbase(main): 009 : 0 > put 'scores' , 'baoniu' , 'grade:' , '4' |
0 row(s) in 0.0260 seconds |
(8) Add a column to the column family course of the Data row baoniu <math, 89>
hbase(main): 010 : 0 > put 'scores' , 'baoniu' , 'course:math' , '89' |
0 row(s) in 0.0270 seconds |
(9) Add a column to the column family course of the Data row of Jerry <art, 80>
hbase(main): 011 : 0 > put 'scores' , 'baoniu' , 'course:art' , '80' |
0 row(s) in 0.0270 seconds |
(10) view zkb data in the scores table
hbase(main): 012 : 0 > get 'scores' , 'zkb' |
course:art timestamp= 1316100110921 , value= 87 |
course:math timestamp= 1316100025944 , value= 97 |
grade: timestamp= 1316099975625 , value= 5 |
3 row(s) in 0.0480 seconds |
(11) view all data in the scores table
Note: The scan command can specify startrow and stoprow to scan multiple rows, for example, scan 'user _ test', {columns => 'info: username', limit => 10, startrow => 'test', stoprow => 'test2 '}
hbase(main): 013 : 0 > scan 'scores' |
baoniu column=course:art, timestamp= 1316100293784 , value= 80 |
baoniu column=course:math, timestamp= 1316100234410 , value= 89 |
baoniu column=grade:, timestamp= 1316100178609 , value= 4 |
zkb column=course:art, timestamp= 1316100110921 , value= 87 |
zkb column=course:math, timestamp= 1316100025944 , value= 97 |
zkb column=grade:, timestamp= 1316099975625 , value= 5 |
2 row(s) in 0.0470 seconds |
(12) view all data in the courses column family in the scores table
hbase(main): 017 : 0 > scan 'scores' ,{COLUMNS =>
'course' } |
baoniu column=course:art, timestamp= 1316100293784 , value= 80 |
baoniu column=course:math, timestamp= 1316100234410 , value= 89 |
zkb column=course:art, timestamp= 1316100110921 , value= 87 |
zkb column=course:math, timestamp= 1316100025944 , value= 97 |
2 row(s) in 0.0350 seconds |
(13) Delete the scores table
hbase(main): 024 : 0 > disable 'scores' |
0 row(s) in 0.0330 seconds |
hbase(main): 025 : 0 > drop 'scores' |
0 row(s) in 1.0840 seconds |
To sum up, hbase shell commonly used operation Commands include create, describe, disable, drop, list, scan, put, get, delete, deleteall, Count, status, etc, you can see the detailed usage through help.
This article Reprinted from: http://archive.cnblogs.com/a/2178064/