HBase MapReduce執行個體分析

來源:互聯網
上載者:User

  跟Hadoop的無縫整合使得使用MapReduce對HBase的資料進行分散式運算非常方便,本文將介紹HBase下 MapReduce開發要點。很好理解本文前提是你對Hadoop MapReduce有一定的瞭解,如果你是初次接觸Hadoop MapReduce編程,可以參考 "第一個MapReduce應用" 這篇文章來建立基本概念。

一、Java代碼

package hbase;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.client.HBaseAdmin;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;import org.apache.hadoop.hbase.mapreduce.TableReducer;import org.apache.hadoop.hbase.util.Bytes;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;public class WordCountHBase {    public static class Map extends            Mapper<LongWritable, Text, Text, IntWritable> {        private IntWritable i = new IntWritable(1);        public void map(LongWritable key, Text value, Context context)                throws IOException, InterruptedException {            String s[] = value.toString().trim().split(" ");            // 將輸入的每行以空格分開            for (String m : s) {                context.write(new Text(m), i);            }        }    }    public static class Reduce extends            TableReducer<Text, IntWritable, NullWritable> {        public void reduce(Text key, Iterable<IntWritable> values,                Context context) throws IOException, InterruptedException {            int sum = 0;            for (IntWritable i : values) {                sum += i.get();            }            Put put = new Put(Bytes.toBytes(key.toString()));            // Put執行個體化,每一個詞存一行            put.add(Bytes.toBytes("content"), Bytes.toBytes("count"),                    Bytes.toBytes(String.valueOf(sum)));            // 列族為content,列為count,列值為數目            context.write(NullWritable.get(), put);        }    }    public static void createHBaseTable(String tableName) throws IOException {        HTableDescriptor htd = new HTableDescriptor(tableName);        HColumnDescriptor col = new HColumnDescriptor("content");        htd.addFamily(col);        Configuration conf = HBaseConfiguration.create();        conf.set("hbase.zookeeper.quorum", "libin2");        HBaseAdmin admin = new HBaseAdmin(conf);        if (admin.tableExists(tableName)) {            System.out.println("table exists, trying to recreate table......");            admin.disableTable(tableName);            admin.deleteTable(tableName);        }        System.out.println("create new table:" + tableName);        admin.createTable(htd);    }    public static void main(String[] args) throws IOException,            InterruptedException, ClassNotFoundException {        String tableName = "WordCount";        Configuration conf = new Configuration();        conf.set(TableOutputFormat.OUTPUT_TABLE, tableName);        createHBaseTable(tableName);        String input = args[0];        Job job = new Job(conf, "WordCount table with " + input);        job.setJarByClass(WordCountHBase.class);        job.setNumReduceTasks(3);        job.setMapperClass(Map.class);        job.setReducerClass(Reduce.class);        job.setMapOutputKeyClass(Text.class);        job.setMapOutputValueClass(IntWritable.class);        job.setInputFormatClass(TextInputFormat.class);        job.setOutputFormatClass(TableOutputFormat.class);        FileInputFormat.addInputPath(job, new Path(input));        System.exit(job.waitForCompletion(true) ? 0 : 1);    }}

 

二、把java代碼打成jar包

如果同時用到了兩個jar包,需要在兩個jar包之間加一個":"分隔字元。

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.