Used to be used MongoDB, but the quantity is big, mongodb appear not so reliable, change into hbase prop up a magnitude.
HBase is a database of Apache Hadoop, which provides random, real-time, read-write access to large data. The goal of HBase is to store and process large data. HBase is an open-source, distributed, multiple-version, column-oriented storage model. It stores loose-type data.
The HBase provides a rich access interface.
HBase Shell
Java CLIETN API
Jython, Groovy DSL, Scala
REST
Thrift (Ruby, Python, Perl, C + + ...) )
Mapreduce
Hive/pig
HBase (Main):001:0>
#创建表
HBase (Main): 002:0* create ' blog ', ' info ', ' content '
0 row (s) in 2.0290 seconds
#查看表
HBase (main):003:0> list
TABLE
Blog
Test_standalone
2 row (s) in 0.0270 seconds
#增添数据
HBase (main):004:0> put ' blog ', ' 1 ', ' info:editor ', ' Liudehua '
0 row (s) in 0.1340 seconds
HBase (main):005:0> put ' blog ', ' 1 ', ' info:address ', ' BJ '
0 row (s) in 0.0070 seconds
HBase (main):006:0> put ' blogs ', ' 1 ', ' content:header ', ' This is header '
0 row (s) in 0.0070 seconds
HBase (Main):007:0>
HBase (main): 008:0*
HBase (Main): 009:0* get ' blog ', ' 1 '
COLUMN CELL
Content:header timestamp=1407464302384, Value=this is header
Info:address timestamp=1407464281942, VALUE=BJ
Info:editor timestamp=1407464270098, Value=liudehua
3 row (s) in 0.0360 seconds
HBase (main):010:0> get ' blog ', ' 1 ', ' info '
column cell
info:address timestamp=1407464281942, value=bj
info:editor timestamp=1407464270098, value=liudehua
2 row (s) in 0.0120 seconds
#这里是可以按照条件查询的.
HBase (Main): 012:0* scan ' blog '
ROW Column+cell
1 Column=content:header, timestamp=1407464302384, Value=this is header
1 column= Info:address, timestamp=1407464281942, value=bj
1 Column=info:editor, timestamp=1407464270098, Value=liudehua
1 row (s) in 0.0490 seconds
HBase (Main):013:0>
HBase (Main): 014:0* put ' blogs ', ' 1 ', ' content:header ', ' This is Header2 '
0 row (s) in 0.0080 seconds
HBase (Main):015:0>
HBase (main): 016:0*
HBase (Main): 017:0* put ' blogs ', ' 1 ', ' content:header ', ' This is Header3 '
0 row (s) in 0.0050 seconds
HBase (main):018:0> Scan ' blog '
row Column+cell
1 Column=content:header, timestamp=1407464457128, Value=this is header 3
1 column= Info:address, timestamp=1407464281942, value=bj
1 Column=info:editor, timestamp=1407464270098, Value=liudehua
1 row (s) in 0.0180 seconds
HBase (main):020:0> get ' blog ', ' 1 ', ' Content:header '
column cell
Content:header timestamp=1407464457128, Value=this is Header3
1 row (s) in 0.0090 seconds
HBase (Main):021:0>
#可以看到历史版本记录
HBase (Main): 022:0* get ' blog ', ' 1 ', {COLUMN => ' content:header ', versions => 2}
COLUMN CELL
Content:header timestamp=1407464457128, Value=this is Header3
Content:header timestamp=1407464454648, Value=this is Header2
2 row (s) in 0.0100 seconds
#可以看到历史版本记录
HBase (main):023:0> get ' blog ', ' 1 ', {COLUMN => ' content:header ', versions => 3}
COLUMN CELL
Content:header timestamp=1407464457128, Value=this is Header3
Content:header timestamp=1407464454648, Value=this is Header2
Content:header timestamp=1407464302384, Value=this is header
3 row (s) in 0.0490 seconds
HBase (Main):024:0>
Base Java to operate is the most convenient, but also the most efficient way. But Java is not lightweight and inconvenient to debug in any environment. And different developers are familiar with the language is not the same, development efficiency is not the same. HBase through the thrift, you can also use Python,ruby,cpp,perl and other languages to operate.
Thrift is a remote invocation component of Google-like Protobuf, which Facebook developed open source. However, PROTOBUF only has serialization of data, and only binary protocols are supported, and there is no remote call section. Protobuf native Support Cpp,python,java, in addition to third-party implementation of Objectc,ruby and other languages. And thrift is the realization of serialization, transmission, protocol definition, remote invocation and other functions, more cross-language capabilities. In some respects they can be substituted for each other, but in some respects they have their own scope of application.
Thrift installation and Thrift Python related modules ~
The code is as follows |
Copy Code |
Http://www.apache.org/dist//thrift/0.9.1/thrift-0.9.1.tar.gz Tar zxvf thrift-0.8.0.tar.gz CD thrift-0.8.0 ./configure-with-cpp=no Make sudo make install sudo pip install thrift |
Here are the thrift and hbase modules that can generate Python ~
Thrift-gen py/home/ubuntu/hbase-0.98.1/hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/ Hbase.thrift
The code is as follows |
Copy Code |
From Thrift.transport import Tsocket
From Thrift.protocol import Tbinaryprotocol
From HBase import HBase
Transport=tsocket.tsocket (' localhost ', 9090)
Protocol=tbinaryprotocol.tbinaryprotocol (transport)
Client=hbase.client (Protocol)
Transport.open ()
Client.gettablenames () |
The version of HBase 0.98 looks like there is no thrift related to the formation, I am here with the 0.94 version of the fix.