業務開發測試HBase之旅四：HBase MapReduce執行個體分析

最後更新：2018-12-04 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

現在有業務需求有即時性統計需求，可能要用到Hbase，所以特轉載了一些關於hbase的文章

轉載自：Taobao QA Team，原文地址：http://qa.taobao.com/?p=13914

跟Hadoop的無縫整合使得使用MapReduce對HBase的資料進行分散式運算非常方便，本文將以前面的blog樣本，介紹HBase下MapReduce開發要點。很好理解本文前提是你對Hadoop MapReduce有一定的瞭解，如果你是初次接觸Hadoop MapReduce編程，可以參考http://qa.taobao.com/?p=10523 這篇文章來建立基本概念。
HBase MapReduce核心類介紹
首先一起來回顧下MapReduce的基本編程模型，

可以看到最基本的是通過Mapper和Reducer來處理KV對，Mapper的輸出經Shuffle及Sort後變為Reducer的輸入。除了Mapper和Reducer外，另外兩個重要的概念是InputFormat和OutputFormat，定義了Map-Reduce的輸入和輸出相關的東西。HBase通過對這些類的擴充（繼承）來方便MapReduce任務來讀寫HTable中的資料。

執行個體分析
我們還是以最初的blog例子來進行樣本分析，業務需求是這樣：找到具有相同興趣的人，我們簡單定義為如果author之間article的tag相同，則認為兩者有相同興趣，將分析結果儲存到HBase。除了上面介紹的blog表外，我們新增一張表tag_friend，RowKey為tag，Value為authors,大概就下面這樣。

我們省略了一些跟分析無關的Column資料，上面的資料按前面描述的業務需求經過MapReduce分析，應該得到下面的結果

實際的運算過程分析如下
代碼實現
有了上面的分析，代碼實現就比較簡單了。只需以下幾步

定義Mapper類繼承TableMapper，map的輸入輸出KV跟上面的分析一致。

public static class Mapper extends TableMapper <ImmutableBytesWritable, ImmutableBytesWritable> { public Mapper() {} @Override public void map(ImmutableBytesWritable row, Result values,Context context) throws IOException { ImmutableBytesWritable value = null; String[] tags = null; for (KeyValue kv : values.list()) { if ("author".equals(Bytes.toString(kv.getFamily())) && "nickname".equals(Bytes.toString(kv.getQualifier()))) { value = new ImmutableBytesWritable(kv.getValue()); } if ("article".equals(Bytes.toString(kv.getFamily())) && "tags".equals(Bytes.toString(kv.getQualifier()))) { tags = Bytes.toString(kv.getValue()).split(","); } } for (int i = 0; i < tags.length; i++) { ImmutableBytesWritable key = new ImmutableBytesWritable( Bytes.toBytes(tags[i].toLowerCase())); try { context.write(key,value); } catch (InterruptedException e) { throw new IOException(e); } } } }

定義Reducer類繼承TableReducer，reduce的輸入輸出KV跟上面分析的一致。

public static class Reducer extends TableReducer <ImmutableBytesWritable, ImmutableBytesWritable, ImmutableBytesWritable> { @Override public void reduce(ImmutableBytesWritable key,Iterable values, Context context) throws IOException, InterruptedException { String friends=""; for (ImmutableBytesWritable val : values) { friends += (friends.length()>0?",":"")+Bytes.toString(val.get()); } Put put = new Put(key.get()); put.add(Bytes.toBytes("person"), Bytes.toBytes("nicknames"), Bytes.toBytes(friends)); context.write(key, put); } }

在提交作業時設定inputFormat為TableInputFormat,設定outputFormat為TableOutputFormat，可以藉助TableMapReduceUtil類來簡化編碼。

public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); conf = HBaseConfiguration.create(conf); Job job = new Job(conf, "HBase_FindFriend"); job.setJarByClass(FindFriend.class); Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("author"),Bytes.toBytes("nickname")); scan.addColumn(Bytes.toBytes("article"),Bytes.toBytes("tags")); TableMapReduceUtil.initTableMapperJob("blog", scan,FindFriend.Mapper.class, ImmutableBytesWritable.class, ImmutableBytesWritable.class, job); TableMapReduceUtil.initTableReducerJob("tag_friend",FindFriend.Reducer.class, job); System.exit(job.waitForCompletion(true) ? 0 : 1); }

小結
本文通過執行個體分析示範了使用MapReduce分析HBase的資料，需要注意的這隻是一種常規的方式（分析表中的資料存到另外的表中），實際上不局限於此，不過其他方式跟此類似。如果你進行到這裡，你肯定想要馬上運行它看看結果，在下篇文章中將介紹如何在類比叢集環境下本機運行MapReduce任務進行測試。

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More