Business Development test hbase journey 4: hbase mapreduce instance analysis

Source: Internet
Author: User
Tags hadoop mapreduce

Hbase may be used because of business needs and real-time statistics requirements. Therefore, we have reposted some articles on hbase.

Reproduced from: Taobao QA Team, original address: http://qa.taobao.com /? P = 13914

Seamless integration with hadoop makes it very convenient to use mapreduce to perform distributed computing on hbase data. This article will introduce the previous blog examples to introduce the key points of mapreduce development under hbase. Good understanding of this article is the premise that you have a certain understanding of hadoop mapreduce, if you are the first contact hadoop mapreduce programming, you can refer to the http://qa.taobao.com /? P = 10523 this article establishes basic concepts.
Hbase mapreduce core class Introduction
First, let's review the basic programming model of mapreduce,

It can be seen that the most basic operation is to process kV pairs through Mapper and reducer. The shuer output is converted into reducer input after shuffle and sort. In addition to Mapper and reducer, the other two important concepts are inputformat and outputformat, which define the inputs and outputs related to map-reduce. Hbase extends (inherits) these classes to facilitate mapreduce tasks to read and write data in htable.

Instance analysis
We still use the original blog example for example analysis. The business requirement is as follows: Find the person with the same interest. We simply define that if the tag of the article between author is the same, the two are considered to be of the same interest and the analysis results are saved to hbase. In addition to the blog table described above, we add a table named tag_friend. The rowkey is the tag, and the value is the authors. This is probably the case below.


We omitted some column data unrelated to the analysis. The above data is analyzed by mapreduce based on the business requirements described above. The following results should be obtained:

The actual calculation process is analyzed as follows:
Code Implementation
With the above analysis, the code implementation is relatively simple. Only a few steps are required.

  • Define the Mapper class to inherit tablemapper. The input/output Kv of map is consistent with the preceding analysis.

    public static class Mapper extends TableMapper <ImmutableBytesWritable, ImmutableBytesWritable> {
     public Mapper() {}
     @Override
     public void map(ImmutableBytesWritable row, Result values,Context context) throws IOException {
        ImmutableBytesWritable value = null;
        String[] tags = null;
        for (KeyValue kv : values.list()) {
          if ("author".equals(Bytes.toString(kv.getFamily()))
          && "nickname".equals(Bytes.toString(kv.getQualifier()))) {
          value = new ImmutableBytesWritable(kv.getValue());
          }
          if ("article".equals(Bytes.toString(kv.getFamily()))
          && "tags".equals(Bytes.toString(kv.getQualifier()))) {
            tags = Bytes.toString(kv.getValue()).split(",");
          }
       }
        for (int i = 0; i < tags.length; i++) {
          ImmutableBytesWritable key = new ImmutableBytesWritable(
          Bytes.toBytes(tags[i].toLowerCase()));
          try {
              context.write(key,value);
          } catch (InterruptedException e) {
             throw new IOException(e);
            }
          }
       }
    }

  • The reducer class is defined to inherit tablereducer, and the input and output Kv of reduce are consistent with those analyzed above.


    public static class Reducer extends TableReducer <ImmutableBytesWritable, ImmutableBytesWritable, ImmutableBytesWritable> {
     @Override
     public void reduce(ImmutableBytesWritable key,Iterable values,
       Context context) throws IOException, InterruptedException {
      String friends="";
      for (ImmutableBytesWritable val : values) {
       friends += (friends.length()>0?",":"")+Bytes.toString(val.get());
      }
      Put put = new Put(key.get());
      put.add(Bytes.toBytes("person"), Bytes.toBytes("nicknames"),
      Bytes.toBytes(friends));
      context.write(key, put);
     }
    }

  • When submitting a job, set inputformat to tableinputformat and outputformat to tableoutputformat. You can use the tablemapreduceutil class to simplify encoding.

    public static void main(String[] args) throws Exception {
     Configuration conf = new Configuration();
     conf = HBaseConfiguration.create(conf);
     Job job = new Job(conf, "HBase_FindFriend");
     job.setJarByClass(FindFriend.class);
     Scan scan = new Scan();
     scan.addColumn(Bytes.toBytes("author"),Bytes.toBytes("nickname"));
     scan.addColumn(Bytes.toBytes("article"),Bytes.toBytes("tags"));
     TableMapReduceUtil.initTableMapperJob("blog", scan,FindFriend.Mapper.class,
      ImmutableBytesWritable.class, ImmutableBytesWritable.class, job);
     TableMapReduceUtil.initTableReducerJob("tag_friend",FindFriend.Reducer.class, job);
     System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

    Summary
    This article demonstrates how to use mapreduce to analyze hbase data through instance analysis. Note that this is only a conventional method (the data in the analysis table is stored in another table), which is actually not limited to this, however, other methods are similar. If you do this, you certainly want to run it immediately to see the results. In the next article, we will introduce how to run mapreduce tasks on a local machine in a simulated Cluster Environment for testing.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.