1.2. Quick Start - Standalone HBase
This guide describes setup of a standalone HBase instance running against the local filesystem. This is not an appropriate configuration for a production instance of HBase, but will allow you to experiment with HBase. This section shows you how to create a table in HBase using the hbase shell CLI, insert rows into the table, perform put and scan operations against the table, enable or disable the table, and start and stop HBase. Apart from downloading HBase, this procedure should take less than 10 minutes.
Local Filesystem and Durability
The below advice is for HBase 0.98.2 and earlier releases only. This is fixed in HBase 0.98.3 and beyond. See HBASE-11272 and HBASE-11218.
Using HBase with a local filesystem does not guarantee durability. The HDFS local filesystem implementation will lose edits if files are not properly closed. This is very likely to happen when you are experimenting with new software, starting and stopping the daemons often and not always cleanly. You need to run HBase on HDFS to ensure all writes are preserved. Running against the local filesystem is intended as a shortcut to get you familiar with how the general system works, as the very first phase of evaluation. Seehttps://issues.apache.org/jira/browse/HBASE-3696 and its associated issues for more details about the issues of running on the local filesystem.
Loopback IP - HBase 0.94.x and earlier
The below advice is for hbase-0.94.x and older versions only. This is fixed in hbase-0.96.0 and beyond.
Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you . See Why does HBase care about /etc/hosts? for detail.
Example 1.1. Example /etc/hosts File for Ubuntu
設定本機地址時,是使用localhost還是127.0.0.1,需要看vi /etc/hosts 中,是如何聲明的,如果是127.0.0.1 localhost 則應該使用127.0.0.1,如果使用localhost,hbase將會出現錯誤
The following /etc/hosts file works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble.
127.0.0.1 localhost127.0.0.1 ubuntu.ubuntu-domain ubuntu
1.2.1. JDK Version Requirements
HBase requires that a JDK be installed. See Table 2.1, “Java” for information about supported JDK versions. 1.2.2. Get Started with HBase
Procedure 1.1. Download, Configure, and Start HBase
Choose a download site from this list of Apache Download Mirrors. Click on the suggested top link. This will take you to a mirror of HBase Releases. Click on the folder named stable and then download the binary file that ends in .tar.gz to your local filesystem. Be sure to choose the version that corresponds with the version of Hadoop you are likely to use later. In most cases, you should choose the file for Hadoop 2, which will be called something like hbase-0.98.3-hadoop2-bin.tar.gz. Do not download the file ending in src.tar.gz for now.
Extract the downloaded file, and change to the newly-created directory.
tarxzvfhbase−<?eval tar xzvf hbase-{project.version}?>-hadoop2-bin.tar.gz
cdhbase−<?eval cd hbase-{project.version}?>-hadoop2/
For HBase 0.98.5 and later, you are required to set the JAVA_HOME environment variable before starting HBase. Prior to 0.98.5, HBase attempted to detect the location of Java if the variables was not set. You can set the variable via your operating system’s usual mechanism, but HBase provides a central mechanism, conf/hbase-env.sh. Edit this file, uncomment the line starting withJAVA_HOME, and set it to the appropriate location for your operating system. The JAVA_HOME variable should be set to a directory which contains the executable file bin/java. Most modern Linux operating systems provide a mechanism, such as /usr/bin/alternatives on RHEL or CentOS, for transparently switching between versions of executables such as Java. In this case, you can setJAVA_HOME to the directory containing the symbolic link to bin/java, which is usually /usr.
JAVA_HOME=/usr
Note
These instructions assume that each node of your cluster uses the same configuration. If this is not the case, you may need to set JAVA_HOME separately for each node.
Edit conf/hbase-site.xml, which is the main HBase configuration file. At this time, you only need to specify the directory on the local filesystem where HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many servers are configured to delete the contents of /tmp upon reboot, so you should store the data elsewhere. The following configuration will store HBase’s data in the hbase directory, in the home directory of the user called testuser. Paste the tags beneath the tags, which should be empty in a new HBase install.
編輯HBase的主要設定檔conf/hbase-site.xml。在這時,需要指出讓HBase和Zookeeper寫入資料的檔案夾。預設情況下,一個新檔案會被建立到/tmp下。因為很多服務被配置的時候會刪除tmp下的內容,所以最好把資料放到別的地方。
Example 1.2. Example hbase-site.xml for Standalone HBase
<configuration> <property> <name>hbase.rootdir</name> <value>file:///home/testuser/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/testuser/zookeeper</value> </property></configuration>
You do not need to create the HBase data directory. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.
The bin/start-hbase.sh script is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. You can use the jps command to verify that you have one running process called HMaster. In standalone mode HBase runs all daemons within this single JVM, i.e. the HMaster, a single HRegionServer, and the ZooKeeper daemon.
Note
Java needs to be installed and available. If you get an error indicating that Java is not installed, but it is on your system, perhaps in a non-standard location, edit theconf/hbase-env.sh file and modify the JAVA_HOME setting to point to the directory that contains bin/java your system.
Procedure 1.2. Use HBase For the First Time
Connect to HBase.
Connect to your running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. In this example, some usage and version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a > character.
進入到hbase shell當中,使用HBase指令碼對錶進行操作
$ ./bin/hbase shellhbase(main):001:0>
Display HBase Shell Help Text.
Type help and press Enter, to display some basic usage information for HBase Shell, as well as several example commands. Notice that table names, rows, columns all must be enclosed in quote characters.
Create a table.
Use the create command to create a new table. You must specify the table name and the ColumnFamily name.
建立表,資料庫名為test,cf為列的名字
hbase> create 'test', 'cf' 0 row(s) in 1.2200 seconds
List Information About your Table
Use the list command to
顯示名字為test的表
hbase> list 'test'TABLEtest1 row(s) in 0.0350 seconds=> ["test"]
Put data into your table.
To put data into your table, use the put command.
hbase> put 'test', 'row1', 'cf:a', 'value1'0 row(s) in 0.1770 secondshbase> put 'test', 'row2', 'cf:b', 'value2'0 row(s) in 0.0160 secondshbase> put 'test', 'row3', 'cf:c', 'value3'0 row(s) in 0.0260 seconds
Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.
Scan the table for all data at once.
One of the ways to get data from HBase is to scan. Use the scan command to scan the table for data. You can limit your scan, but for now, all data is fetched.
hbase> scan 'test'ROW COLUMN+CELL row1 column=cf:a, timestamp=1403759475114, value=value1 row2 column=cf:b, timestamp=1403759492807, value=value2 row3 column=cf:c, timestamp=1403759503155, value=value33 row(s) in 0.0440 seconds
Get a single row of data.
To get a single row of data at a time, use the get command.
hbase> get 'test', 'row1'COLUMN CELL cf:a timestamp=1403759475114, value=value11 row(s) in 0.0230 seconds
Disable a table.
If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the disable command. You can re-enable it using the enablecommand.
hbase> disable ‘test’
0 row(s) in 1.6270 seconds
hbase> enable ‘test’
0 row(s) in 0.4500 seconds
Disable the table again if you tested the enable command above:
hbase> disable 'test'0 row(s) in 1.6270 seconds
Drop the table.
To drop (delete) a table, use the drop command.
刪除表
hbase> drop 'test'0 row(s) in 0.2900 seconds
Exit the HBase Shell.
To exit the HBase Shell and disconnect from your cluster, use the quit command. HBase is still running in the background.
Procedure 1.3. Stop HBase
In the same way that the bin/start-hbase.sh script is provided to conveniently start all HBase daemons, the bin/stop-hbase.sh script stops them.
./bin/stop−hbase.shstoppinghbase……………….. ./bin/stop-hbase.sh stopping hbase………………..
After issuing the command, it can take several minutes for the processes to shut down. Use the jps to be sure that the HMaster and HRegionServer processes are shut down. 1.2.3. Intermediate - Pseudo-Distributed Local Install
After working your way through Section 1.2, “Quick Start - Standalone HBase”, you can re-configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the hbase.rootdir property as described inSection 1.2, “Quick Start - Standalone HBase”, your data is still stored in /tmp/. In this walk-through, we store your data in HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to continue storing your data in the local filesystem.
Hadoop Configuration
This procedure assumes that you have configured Hadoop and HDFS on your local system and or a remote system, and that they are running and available. It also assumes you are using Hadoop 2. Currently, the documentation on the Hadoop website does not include a quick start for Hadoop 2, but the guide at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide is a good starting point.
Stop HBase if it is running.
If you have just finished Section 1.2, “Quick Start - Standalone HBase” and HBase is still running, stop it. This procedure will create a totally new directory where HBase will store its data, so any databases you created before will be lost.
Configure HBase.
Edit the hbase-site.xml configuration. First, add the following property. which directs HBase to run in distributed mode, with one JVM instance per daemon.
hbase.cluster.distributed
true
Next, change the hbase.rootdir from the local filesystem to the address of your HDFS instance, using the hdfs://// URI syntax. In this example, HDFS is running on the localhost at port 8020.
hbase.rootdir
hdfs://localhost:8020/hbase
You do not need to create the directory in HDFS. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.
Start HBase.
Use the bin/start-hbase.sh command to start HBase. If your system is configured correctly, the jps command should show the HMaster and HRegionServer processes running.
Check the HBase directory in HDFS.
If everything worked correctly, HBase created its directory in HDFS. In the configuration above, it is stored in /hbase/ on HDFS. You can use the hadoop fs command in Hadoop’s bin/ directory to list this directory.
$ ./bin/hadoop fs -ls /hbaseFound 7 itemsdrwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmpdrwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALsdrwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corruptdrwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.versiondrwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
Create a table and populate it with data.
You can use the HBase Shell to create a table, populate it with data, scan and get values from it, using the same procedure as in Procedure 1.2, “Use HBase For the First Time”.
Start and stop a backup HBase Master (HMaster) server.
Note
Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-distributed cluster does not make sense for production. This step is offered for testing and learning purposes only.
The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster, use the local-master-backup.sh. For each backup master you want to start, add a parameter representing the port offset for that master. Each HMaster uses three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035.
$ ./bin/local-master-backup.sh 2 3 5
To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like /tmp/hbase-USER-X-master.pid. The only contents of the file are the PID. You can use the kill -9 command to kill that PID. The following command will kill the master with port offset 1, but leave the cluster running:
$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
Start and stop additional RegionServers
The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generall