Business Development test hbase journey 4: hbase mapreduce instance analysis

Last Update:2018-12-04 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hbase may be used because of business needs and real-time statistics requirements. Therefore, we have reposted some articles on hbase.

Reproduced from: Taobao QA Team, original address: http://qa.taobao.com /? P = 13914

Seamless integration with hadoop makes it very convenient to use mapreduce to perform distributed computing on hbase data. This article will introduce the previous blog examples to introduce the key points of mapreduce development under hbase. Good understanding of this article is the premise that you have a certain understanding of hadoop mapreduce, if you are the first contact hadoop mapreduce programming, you can refer to the http://qa.taobao.com /? P = 10523 this article establishes basic concepts.
Hbase mapreduce core class Introduction
First, let's review the basic programming model of mapreduce,

It can be seen that the most basic operation is to process kV pairs through Mapper and reducer. The shuer output is converted into reducer input after shuffle and sort. In addition to Mapper and reducer, the other two important concepts are inputformat and outputformat, which define the inputs and outputs related to map-reduce. Hbase extends (inherits) these classes to facilitate mapreduce tasks to read and write data in htable.

Instance analysis
We still use the original blog example for example analysis. The business requirement is as follows: Find the person with the same interest. We simply define that if the tag of the article between author is the same, the two are considered to be of the same interest and the analysis results are saved to hbase. In addition to the blog table described above, we add a table named tag_friend. The rowkey is the tag, and the value is the authors. This is probably the case below.

We omitted some column data unrelated to the analysis. The above data is analyzed by mapreduce based on the business requirements described above. The following results should be obtained:

The actual calculation process is analyzed as follows:
Code Implementation
With the above analysis, the code implementation is relatively simple. Only a few steps are required.

Define the Mapper class to inherit tablemapper. The input/output Kv of map is consistent with the preceding analysis.

public static class Mapper extends TableMapper <ImmutableBytesWritable, ImmutableBytesWritable> { public Mapper() {} @Override public void map(ImmutableBytesWritable row, Result values,Context context) throws IOException { ImmutableBytesWritable value = null; String[] tags = null; for (KeyValue kv : values.list()) { if ("author".equals(Bytes.toString(kv.getFamily())) && "nickname".equals(Bytes.toString(kv.getQualifier()))) { value = new ImmutableBytesWritable(kv.getValue()); } if ("article".equals(Bytes.toString(kv.getFamily())) && "tags".equals(Bytes.toString(kv.getQualifier()))) { tags = Bytes.toString(kv.getValue()).split(","); } } for (int i = 0; i < tags.length; i++) { ImmutableBytesWritable key = new ImmutableBytesWritable( Bytes.toBytes(tags[i].toLowerCase())); try { context.write(key,value); } catch (InterruptedException e) { throw new IOException(e); } } } }

The reducer class is defined to inherit tablereducer, and the input and output Kv of reduce are consistent with those analyzed above.

public static class Reducer extends TableReducer <ImmutableBytesWritable, ImmutableBytesWritable, ImmutableBytesWritable> { @Override public void reduce(ImmutableBytesWritable key,Iterable values, Context context) throws IOException, InterruptedException { String friends=""; for (ImmutableBytesWritable val : values) { friends += (friends.length()>0?",":"")+Bytes.toString(val.get()); } Put put = new Put(key.get()); put.add(Bytes.toBytes("person"), Bytes.toBytes("nicknames"), Bytes.toBytes(friends)); context.write(key, put); } }

When submitting a job, set inputformat to tableinputformat and outputformat to tableoutputformat. You can use the tablemapreduceutil class to simplify encoding.

public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); conf = HBaseConfiguration.create(conf); Job job = new Job(conf, "HBase_FindFriend"); job.setJarByClass(FindFriend.class); Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("author"),Bytes.toBytes("nickname")); scan.addColumn(Bytes.toBytes("article"),Bytes.toBytes("tags")); TableMapReduceUtil.initTableMapperJob("blog", scan,FindFriend.Mapper.class, ImmutableBytesWritable.class, ImmutableBytesWritable.class, job); TableMapReduceUtil.initTableReducerJob("tag_friend",FindFriend.Reducer.class, job); System.exit(job.waitForCompletion(true) ? 0 : 1); }

Summary
This article demonstrates how to use mapreduce to analyze hbase data through instance analysis. Note that this is only a conventional method (the data in the analysis table is stored in another table), which is actually not limited to this, however, other methods are similar. If you do this, you certainly want to run it immediately to see the results. In the next article, we will introduce how to run mapreduce tasks on a local machine in a simulated Cluster Environment for testing.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More