標籤:
本篇先介紹HBase在偽分布式環境下的安裝方式,然後將MapReduce編程和HBase結合起來使用,完成WordCount這個例子。
一、 前提條件
已經成功地安裝了jdk1.6和hadoop1.2.1。
Jdk1.6+Hadoop1.2.1在偽分布環境下具體的安裝方法見:Hadoop1.2.1安裝——單節點方式和單機偽分布方式
二、 環境
- VMware® Workstation 10.04
- Ubuntu14.04 32位
- Java JDK 1.6.0
- hadoop1.2.1
- hbase0.94.26
三、 HBase0.94偽分布式下的安裝步驟
(1)下載hbase0.94.26的tar包並解壓
tar -zxvf hbase-0.94.26.tar.g
(2)去{hbase}/conf目錄修改hbase-site.xml
<configuration><property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> <!-- 連接埠號碼和ip地址要與hadoop配置參數fs.default.name一致 --></property><property> <name>hbase.cluster.distributed</name> <value>true</value></property>
<property>
<name>dfs.replication</name>
<value>1</value> (偽分布設定為1)
</property>
</configuration>
(3)去{hbase}/conf目錄修改hbase-env.sh檔案
export JAVA_HOME=/usr/lib/jvm/{jdk} #jdk安裝路徑 export HBASE_CLASSPATH=/etc/hadoop export HBASE_MANAGES_ZK=true
(4)讓hbase0.94.26支援hadoop1.2.1
hbase0.94.26預設支援的是hadoop1.0.4,我們可以用替換hadoop-core的方式讓其支援hadoop1.2.1.
a. 將hadoop主目錄下的hadoop-core-1.2.1.jar檔案複製到hbase/lib目錄下去,將hbase/lib 目錄下內建的 hadoop-core-1.0.4.jar檔案刪除,
b. 再將hadoop/lib目錄下的commons-collections-3.2.1.jar和commons-configuration-1.6.jar檔案複製到 hbase/lib目錄下去
rm /home/u14/hbase-0.94.26/lib/hadoop-core-1.0.4.jarcp /home/u14/hadoop/hadoop-core-1.2.1.jar /home/u14/hbase-0.94.26/libcp /home/u14/hadoop/lib/commons-collections-3.2.1.jar /home/u14/hbase-0.94.26/libcp /home/u14/hadoop/lib/commons-configuration-1.6.jar /home/u14/hbase-0.94.26/lib
(5)啟動HBase
a. 先啟動hadoop
b. 啟動Hbase
進入hbase的解壓目錄下的bin檔案夾,執行start-hbase.sh指令碼
bin/start-hbase.sh
用jps命令查看相關進程:
SecondaryNameNode DataNode HQuorumPeer TaskTracker JobTracker Jps HRegionServer HMaster NameNode
c. 進入shell模式,操作hbase
bin/hbase shell
d. 停止hbase:先停止hbase,再停止hadoop
stop-hbase.shstop-all.sh
在這個例子中,輸入檔案為:
user/u14/hbasetest/file01: hello world bye world
user/u14/hbasetest/file02: hello hadoop bye hadoop
程式思想:程式首先從檔案中收集資料,在shuffle完成之後進行統計並計算,最後將計算結果儲存到hbase中。
1 import java.io.IOException; 2 3 import org.apache.hadoop.conf.Configuration; 4 import org.apache.hadoop.fs.Path; 5 import org.apache.hadoop.hbase.HBaseConfiguration; 6 import org.apache.hadoop.hbase.HColumnDescriptor; 7 import org.apache.hadoop.hbase.HTableDescriptor; 8 import org.apache.hadoop.hbase.client.HBaseAdmin; 9 import org.apache.hadoop.hbase.client.Put;10 import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;11 import org.apache.hadoop.hbase.mapreduce.TableReducer;12 import org.apache.hadoop.hbase.util.Bytes;13 import org.apache.hadoop.io.IntWritable;14 import org.apache.hadoop.io.LongWritable;15 import org.apache.hadoop.io.NullWritable;16 import org.apache.hadoop.io.Text;17 import org.apache.hadoop.mapreduce.Job;18 import org.apache.hadoop.mapreduce.Mapper;19 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;20 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;21 22 public class WordCountHBase {23 public static class Map extends Mapper<LongWritable,Text, Text, IntWritable>{24 private IntWritable i = new IntWritable(1);25 public void map(LongWritable key, Text value, Context context) 26 throws IOException, InterruptedException{27 String s[] = value.toString().trim().split(" ");28 for(String m: s){29 context.write(new Text(m), i);30 }31 }32 }33 34 public static class Reduce extends TableReducer<Text, IntWritable, NullWritable>{35 public void reduce(Text key, Iterable<IntWritable> values, Context context) 36 throws IOException, InterruptedException{37 int sum = 0;38 for(IntWritable i: values){39 sum += i.get();40 }41 Put put = new Put(Bytes.toBytes(key.toString())); //put執行個體化,每一個詞存一行42 put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),43 Bytes.toBytes(String.valueOf(sum))); //列族為content,列修飾符為count,列值為數值44 context.write(NullWritable.get(), put);45 }46 }47 48 public static void createHBaseTable(String tableName) throws IOException{49 HTableDescriptor htd = new HTableDescriptor(tableName);50 HColumnDescriptor col = new HColumnDescriptor("content");51 htd.addFamily(col);52 HBaseConfiguration config = new HBaseConfiguration();53 HBaseAdmin admin = new HBaseAdmin(config);54 if(admin.tableExists(tableName)){55 System.out.println("table exists, trying recreate table!");56 admin.disableTable(tableName);57 admin.deleteTable(tableName);58 }59 System.out.println("create new table: "+ tableName);60 admin.createTable(htd);61 }62 63 public static void main(String args[]) throws Exception{64 String tableName = "wordcountH";65 Configuration conf = new Configuration();66 conf.set(TableOutputFormat.OUTPUT_TABLE, tableName);67 createHBaseTable(tableName);68 Job job = new Job(conf, "WordCountHbase");69 job.setJarByClass(WordCountHBase.class);70 job.setNumReduceTasks(3);71 job.setMapperClass(Map.class);72 job.setReducerClass(Reduce.class);73 job.setMapOutputKeyClass(Text.class);74 job.setMapOutputValueClass(IntWritable.class);75 job.setInputFormatClass(TextInputFormat.class);76 job.setOutputFormatClass(TableOutputFormat.class);77 FileInputFormat.addInputPath(job, new Path(args[0]));78 System.exit(job.waitForCompletion(true)?0:1);79 }80 }
程式成功運行後,通過Hbase Shell檢查輸出結果:
(四)偽分布式下jdk1.6+Hadoop1.2.1+HBase0.94+Eclipse下運行wordCount例子