This article introduces the installation of HBase in a pseudo-distributed environment, then uses MapReduce programming with HBase to complete the wordcount example.
- HBase installs in a pseudo-distributed environment
First, Pre-conditions
The jdk1.6 and hadoop1.2.1 have been successfully installed.
jdk1.6+hadoop1.2.1 in the pseudo-distributed environment, the specific installation methods see:Hadoop1.2.1 Installation--single-node mode and single-machine pseudo-distribution mode
Second, Environment
- Vmware®workstation 10.04
- Ubuntu14.04 32-bit
- Java JDK 1.6.0
- hadoop1.2.1
- hbase0.94.26
Third, HBase0.94 installation steps under pseudo-distributed
(1) Download the hbase0.94.26 tar package and unzip it
TAR-ZXVF HBASE-0.94.26.TAR.G
(2) Go to {hbase}/conf directory modify hbase-site.xml
<configuration><property> <name>hbase.rootdir</name> <value>hdfs:// localhost:9000/hbase</value> <!--port number and IP address to configure parameter FS with Hadoop. Default. Name consistent--></property><property> <name>hbase.cluster.distributed</name > <value>true</value></property>
<property>
<name>dfs.replication</name>
<value>1</value> (pseudo-distribution set to 1)
</ Property >
</configuration>
(3) Go to {hbase}/conf directory to modify hbase-env.sh file
Export java_home=/usr/lib/jvm/{JDK} #jdk安装路径 export hbase_classpath=/etc/Hadoop export Hbase_ Manages_zk=true
(4) Let hbase0.94.26 support hadoop1.2.1
HBASE0.94.26 supports hadoop1.0.4 by default, and we can replace the Hadoop-core to support hadoop1.2.1.
A. Copy the Hadoop-core-1.2.1.jar file under the Hadoop home directory to the Hbase/lib directory and delete the Hadoop-core-1.0.4.jar file from the Hbase/lib directory.
B. Copy the Commons-collections-3.2.1.jar and Commons-configuration-1.6.jar files from the Hadoop/lib directory to the Hbase/lib directory
rm/home/u14/hbase-0.94.26/lib/hadoop-core-1.0.4/home/u14/hadoop/hadoop-core-1.2.1.jar/home/u14/ hbase-0.94.26//home/u14/hadoop/lib/commons-collections-3.2.1.jar/home/u14/hbase-0.94.26// Home/u14/hadoop/lib/commons-configuration-1.6.jar/home/u14/hbase-0.94.26/lib
(5) Start HBase
A. Start Hadoop first
B. Start HBase
Enter the Bin folder under the decompression directory of HBase to execute the start-hbase.sh script
bin/start-hbase.sh
To view related processes with the JPS command:
Secondarynamenode DataNode hquorumpeer tasktracker jobtracker Jps hregionserver hmaster NameNode
C. Enter shell mode to manipulate hbase
Bin/hbase Shell
D. Stop hbase: Stop HBase First and then stop Hadoop
stop-hbase.shstop-all.sh
- Developing HBase applications with eclipse
A. Create a new Java project HBase in Eclipse, then select the project Properties, Libraries->add External JARs ..., and then select the relevant jar package under {hbase}/lib, If it's just for testing, it's a little easier to pick all the jars.
B. Add a folder conf under Project HBase, copy the HBase cluster profile hbase-site.xml to the directory, and then select the project properties at Libraries->add Class Folder, selecting the Conf directory you just added.
- Combine MapReduce with HBase to complete the wordcount example.
In this example, the input file is:
User/u14/hbasetest/file01:hello World Bye World
User/u14/hbasetest/file02:hello Hadoop bye Hadoop
Program thought: The program collects the data from the file first, calculates and computes after shuffle completes, and finally stores the result in HBase.
1 Importjava.io.IOException;2 3 Importorg.apache.hadoop.conf.Configuration;4 ImportOrg.apache.hadoop.fs.Path;5 Importorg.apache.hadoop.hbase.HBaseConfiguration;6 ImportOrg.apache.hadoop.hbase.HColumnDescriptor;7 ImportOrg.apache.hadoop.hbase.HTableDescriptor;8 Importorg.apache.hadoop.hbase.client.HBaseAdmin;9 ImportOrg.apache.hadoop.hbase.client.Put;Ten ImportOrg.apache.hadoop.hbase.mapreduce.TableOutputFormat; One ImportOrg.apache.hadoop.hbase.mapreduce.TableReducer; A Importorg.apache.hadoop.hbase.util.Bytes; - Importorg.apache.hadoop.io.IntWritable; - Importorg.apache.hadoop.io.LongWritable; the Importorg.apache.hadoop.io.NullWritable; - ImportOrg.apache.hadoop.io.Text; - ImportOrg.apache.hadoop.mapreduce.Job; - ImportOrg.apache.hadoop.mapreduce.Mapper; + ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat; - ImportOrg.apache.hadoop.mapreduce.lib.input.TextInputFormat; + A Public classWordcounthbase { at Public Static classMapextendsMapper<longwritable,text, Text, intwritable>{ - PrivateIntwritable i =NewIntwritable (1); - Public voidmap (longwritable key, Text value, context context) - throwsIOException, interruptedexception{ -String s[] = value.tostring (). Trim (). Split (""); - for(String m:s) { inContext.write (NewText (M), i); - } to } + } - the Public Static classReduceextendsTablereducer<text, Intwritable, nullwritable>{ * Public voidReduce (Text key, iterable<intwritable>values, context context) $ throwsIOException, interruptedexception{Panax Notoginseng intsum = 0; - for(intwritable i:values) { theSum + =i.get (); + } APut put =NewPut (Bytes.tobytes (key.tostring ()));//put instantiation, with each word stored on one line thePut.add (Bytes.tobytes ("content"), Bytes.tobytes ("Count"), +Bytes.tobytes (string.valueof (sum)));//column family is content, column modifier is count, column value is numeric - Context.write (Nullwritable.get (), put); $ } $ } - - Public Static voidCreatehbasetable (String tableName)throwsioexception{ theHtabledescriptor HTD =NewHtabledescriptor (tableName); -Hcolumndescriptor col =NewHcolumndescriptor ("Content");Wuyi htd.addfamily (col); thehbaseconfiguration config =Newhbaseconfiguration (); -Hbaseadmin admin =Newhbaseadmin (config); Wu if(Admin.tableexists (tableName)) { -SYSTEM.OUT.PRINTLN ("table exists, trying recreate table!"); About admin.disabletable (tableName); $ admin.deletetable (tableName); - } -System.out.println ("Create new table:" +tableName); - admin.createtable (HTD); A } + the Public Static voidMain (String args[])throwsexception{ -String tableName = "Wordcounth"; $Configuration conf =NewConfiguration (); the Conf.set (tableoutputformat.output_table, tableName); the createhbasetable (tableName); theJob Job =NewJob (conf, "Wordcounthbase"); theJob.setjarbyclass (wordcounthbase.class); -Job.setnumreducetasks (3); inJob.setmapperclass (Map.class); theJob.setreducerclass (Reduce.class); theJob.setmapoutputkeyclass (Text.class); AboutJob.setmapoutputvalueclass (intwritable.class); theJob.setinputformatclass (Textinputformat.class); theJob.setoutputformatclass (Tableoutputformat.class); theFileinputformat.addinputpath (Job,NewPath (args[0])); +System.exit (Job.waitforcompletion (true)? 0:1); - } the}
After the program runs successfully, check the output with the HBase shell:
(d) Example of running wordcount under pseudo-distributed jdk1.6+hadoop1.2.1+hbase0.94+eclipse