1 overview
HBase is a distributed, column-oriented, extensible open source database based on Hadoop. Use HBase when large data is required for random, real-time reading and writing. Belong to NoSQL. HBase uses Hadoop/hdfs as its file storage system, uses Hadoop/mapreduce to deal with the massive data in HBase, and uses zookeeper to provide distributed collaboration, distributed synchronization and configuration management.
HBase Architecture:
LSM-Solve disk random write problem (in order to write the King);
hfile-resolves data indexing problems (only indexes can be read efficiently);
WAL-Resolve Data persistence (persistent solution in the face of failure);
Zookeeper-resolve core data consistency and cluster recovery;
Replication-Introduce mysql-like data replication solutions to address usability;
In addition: Automatic split split, automatic compression (COMPACTION,LSM associated technology), automatic load balancing, automatic region migration.
HBase clusters need to rely on a zookeeper ensemble. HBase all nodes in the cluster and to access HBase
Clients need to be able to access the zookeeper ensemble. HBase with zookeeper, but for convenience
Other applications use zookeeper, preferably with a separately installed zookeeper ensemble. In addition, the zookeeper ensemble is typically configured as an odd number of nodes, and Hadoop clusters, zookeeper ensemble,
The HBase cluster is three separate clusters that do not need to be deployed on the same physical nodes, between them through the network
Communication.
2 Installation and Configuration
2.1 Download Installation HBase
Download hbase-0.96.1.1-hadoop1-bin.tar.gz and unzip to/usr below and rename to HBase directory. The hbase version needs to be mapped to Hadoop, to see if it corresponds only to see if the version number after Hbase/lib/hadoop-core corresponds to the version of Hadoop, and if not, you can copy Hadoop hadoop-core files. But there is no guarantee that there will be no problem.
2.2 Setting environment variables
Vim/etc/profile:
# Set HBase path
Export Hbase_home=/usr/hbase
Export path= $PATH: $HBASE _home/bin
2.3 Configuration HBase
Edit configuration file Hbase-site.xml:vim/usr/hbase/conf/hbase-site.xml
Single:
Hbase.rootdir
File:///tmp/hbase-${user.name}/hbase
Pseudo Distribution:
Hbase.rootdir
Hdfs://localhost:9000/hbase
Dfs.replication
1
Full distribution:
1) Configure Hbase-site.xml
Hbase.rootdir
Hdfs://192.168.56.1:9000/hbase
HBase Data Storage Directory
hbase.cluster.distributed
True
Specifies the mode to run HBase: false: Stand-alone/pseudo distribution; true: Full distribution
Hbase.master
hdfs://192.168.56.1:60000
Specify Master Location
Hbase.zookeeper.property.dataDir
/var/lib/zookeeper
Hbase.zookeeper.quorum
192.168.56.1,192.168.56.101,192.168.56.102,192.168.56.103,192.168.56.104
Specify Zookeeper Cluster
Hbase.master.info.bindAddress
192.168.56.1
The bind address for the HBase Master Web UI
2 Edit configuration file regionservers:
192.168.56.101
192.168.56.102
192.168.56.103
192.168.56.104
3 Set the environment variable hbase-env.sh:
Export java_home=/usr/java/jdk1.7.0_45/
Export hbase_classpath=/usr/hadoop/conf
Export hbase_heapsize=2048
Export Hbase_manages_zk=false
Note:
Where Java_home represents the JAVA installation directory, Hbase_classpath points to the directory where the Hadoop configuration file is stored, so HBASE can find HDFs configuration information, because this article Hadoop and HBASE are deployed on the same physical node. So it points to the Conf directory under the Hadoop installation path. The hbase_heapsize unit is MB, and can be set to the desired and actual remaining memory defaults to 1000. Hbase_manages_zk=false instructs the HBASE to use an existing zookeeper instead of a self-contained one.
2.4 Replicate to each node, and then configure the environment variables for each node
Scp-r/usr/hbase Node Ip:/usr
3 Start and stop HBase
Start HBase: Need to start HDFs and zookeeper in advance, start sequence for hdfs-zookeeper-"
Start all nodes on Server1: start-hbase.sh
Stop hbase:stop-hbase.sh
Connect hbase CREATE TABLE: HBase shell
HBase Shell; Enter ' help ' for list of keyword commands.
Type "Exit" to leave the HBase Shell
Version 0.96.1.1-hadoop1, Runknown, Tue Dec 11:52:14 PST 2013
HBase (Main):001:0>
View Status: HBase (main):001:0> status
4 servers, 0 dead, 2.2500 average load
4 Testing and Web viewing
4.1 Creating a Table test
Create a table named Sgt, which has only one column accessibility as cf. You can list all tables to check the creation, and then insert some values.
HBase (main):003:0> create ' Sgt ', ' CF '
0 row (s) in 1.2200 seconds
HBase (main):003:0> list
Sgt
1 row (s) in 0.0550 seconds
HBase (main):004:0> put ' Sgt ', ' Row1 ', ' cf:a ', ' value1 '
0 row (s) in 0.0560 seconds
HBase (main):005:0> put ' Sgt ', ' Row2 ', ' cf:b ', ' value2 '
0 row (s) in 0.0370 seconds
HBase (main):006:0> put ' Sgt ', ' row3 ', ' cf:c ', ' value3 '
0 row (s) in 0.0450 seconds
Check insert: Scan this table
HBase (main):005:0> Scan ' Sgt '
Get row, action as follows
HBase (main):008:0> get ' Sgt ', ' Row1 '
Disable again drop this table, you can clear the action you just
HBase (Main):012:0> Disable ' Sgt '
0 row (s) in 1.0930 seconds
HBase (Main):013:0> drop ' Sgt '
0 row (s) in 0.0770 seconds
Exporting and importing
HBase Org.apache.hadoop.hbase.mapreduce.Driver Export Sgt Sgt
The exported table, in the current user directory of the Hadoop file system, in the Sgt folder. For example, the directory structure after export in the Hadoop file system:
Hadoop Dfs-ls
Found 1 Items
Drwxr-xr-x-hadoop supergroup 0 2013-10-22 10:44/user/hadoop/small
Hadoop Dfs-ls./small
Found 3 Items
-rw-r--r--2 hadoop supergroup 0 2013-10-22 10:44/user/hadoop/small/_success
Drwxr-xr-x-hadoop supergroup 0 2013-10-22 10:44/user/hadoop/small/_logs
-rw-r--r--2 hadoop supergroup 285 2013-10-22 10:44/user/hadoop/small/part-m-00000
When you import this table into a hbase in another cluster, you need to put part-m-00000 in another hadoop, assuming the path to put is also:
/user/hadoop/small/
Also, the hbase to be imported is already built with the same form.
Then import data from Hadoop to HBase:
HBase Org.apache.hadoop.hbase.mapreduce.Driver Import Sgt part-m-00000
In this way, the hbase data can be imported to another HBase database without any surprises.
4.2 Web View
For accessing and monitoring the running state of the Hadoop system
Daemon Default port configuration parameters
HDFSNamenode50070dfs.http.address
Datanodes50075dfs.datanode.http.address
Secondarynamenode50090dfs.secondary.http.address
Backup/checkpoint node*50105dfs.backup.http.address
MRJobracker50030mapred.job.tracker.http.address
Tasktrackers50060mapred.task.tracker.http.address
HBaseHMaster60010hbase.master.info.port
HRegionServer60030hbase.regionserver.info.port
Http://192.168.56.1:60010/master-status
5 Summary
This paper introduces the configuration of HBase installation and configuration, including single machine, pseudo distribution, fully distributed three modes, and focuses on the installation and configuration of HBase distributed cluster. The following will introduce Chukwa cluster, pig, etc.