Hbase + Mapreduce + Eclipse instance

Last Update:2015-01-09 Source: Internet

Author: User

Tags static class

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The previous blog mentions the eclipse operation stand-alone version of HBase is not familiar with friends can go to see

Eclipse connects and operates a single version of HBase

This article describes a Mapreduce that reads data from HBase and computes the column similar to WordCount but the input at this point is read from HBase

First you need to create an input source

Start HBase, open hbase shell here my config file is no longer a single machine, but HDFs is used as the file system

<span style= "FONT-SIZE:18PX;" ><configuration><property><name>hbase.rootdir</name><value>hdfs://localhost The location where the:9000/hbase</value><description> data is stored. </description></property><property><name>dfs.replication</name><value>1 </value><description> specifies that the number of replicas is 1 because of pseudo-distributed. </description></property></configuration></span>

CREATE table after entering HBase shell

<span style= "FONT-SIZE:18PX;" >hbase (main):007:0> create ' data_input ', ' message ' 0 row (s) in 1.1110 secondshbase (main):008:0> create ' Data_ Output ',{name=> ' message ', version=>1}0 row (s) in 1.0900 seconds</span>

Data_input tables are used to store input data for MapReduce

Data_output the output data used to store MapReduce

Then generate random data into the Data_inout table, where you use eclipse to manipulate hbase into the table data_input and write the data code as follows:

<span style= "FONT-SIZE:18PX;" >package Hbase_mapred1;import Java.util.random;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.client.put;import Org.apache.hadoop.hbase.util.bytes;public class Importer1 {public static                void Main (string[] args) throws Exception {String [] pages = {"/", "/a.html", "/b.html", "/c.html"};                           Hbaseconfiguration hbaseconfig = new Hbaseconfiguration ();        Configuration hbaseconfig=hbaseconfiguration.create ();        htable htable = new Htable (hbaseconfig, "data_input");        Htable.setautoflush (FALSE);                Htable.setwritebuffersize (1024 * 1024 * 12);        int totalrecords = 100000;        int maxid = totalrecords/1000;        Random rand = new Random ();        SYSTEM.OUT.PRINTLN ("Importing" + Totalrecords + "records ...."); for (int i=0; i < TotalrecordS            i++) {int userID = Rand.nextint (MAXID) + 1;            byte [] Rowkey = Bytes.add (Bytes.tobytes (UserID), bytes.tobytes (i));            String randompage = Pages[rand.nextint (pages.length)];            Put put = new put (Rowkey);            Put.add (bytes.tobytes ("message"), Bytes.tobytes ("page"), Bytes.tobytes (Randompage));        Htable.put (Put);        } htable.flushcommits ();        Htable.close ();    System.out.println ("Done"); }}</span>

So far, the data has been written to the table Data_input table, followed by the table's data

as input data for MapReduce

The code is as follows:

<span style= "FONT-SIZE:18PX;" >package Hbase_mapred1;import Java.io.ioexception;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.client.put;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.client.scan;import Org.apache.hadoop.hbase.filter.firstkeyonlyfilter;import Org.apache.hadoop.hbase.io.immutablebyteswritable;import Org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; Import Org.apache.hadoop.hbase.mapreduce.tablemapper;import org.apache.hadoop.hbase.mapreduce.TableReducer; Import Org.apache.hadoop.hbase.util.bytes;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.mapreduce.job;public class FreqCounter1 {static Class Mapper1 extends Tablemapper<immutablebytes        Writable, intwritable> {private int numrecords = 0;        Private static final Intwritable one = new intwritable (1); @Override public void Map (immutablebyteswritable row, Result values, context context) throws IoexceptIon {//extract UserKey from the Compositekey (userId + counter) immutablebyteswritable UserKey = n            EW immutablebyteswritable (Row.get (), 0, Bytes.sizeof_int);            try {context.write (UserKey, one);            } catch (Interruptedexception e) {throw new IOException (e);            } numrecords++;            if ((numrecords% 10000) = = 0) {context.setstatus ("mapper processed" + NumRecords + "records so Far"); }}} public static class Reducer1 extends Tablereducer<immutablebyteswritable, intwritable, Im mutablebyteswritable> {public void reduce (immutablebyteswritable key, iterable<intwritable> values, Conte            XT context) throws IOException, interruptedexception {int sum = 0;            for (intwritable val:values) {sum + = Val.get ();            Put put = new put (Key.get ()); Put.add (bytes.tobytes("message"), Bytes.tobytes ("Total"), bytes.tobytes (sum));            System.out.println (String.Format ("Stats:key:%d, Count:%d", Bytes.toint (Key.get ()), sum));        Context.write (key, put); }} public static void Main (string[] args) throws Exception {hbaseconfiguration conf = new Hbaseconfigur        Ation ();        Job Job = new Job (conf, "Hbase_freqcounter1");        Job.setjarbyclass (Freqcounter1.class);   Scan scan = new scan (); String columns = "Details";        Comma seperated//scan.addcolumns (columns);        Scan.setfilter (New Firstkeyonlyfilter ()); Tablemapreduceutil.inittablemapperjob ("Data_input", scan, Mapper1.class, Immutablebyteswritable.class, IntW        Ritable.class, Job);        Tablemapreduceutil.inittablereducerjob ("Data_output", Reducer1.class, Job);    System.exit (Job.waitforcompletion (true)? 0:1); }}</span>

The approximate logic is:

The map stage reads the data from the datainput and then marks it as 1 the students familiar with the map reduce process should be easily understood

The same key in the data input table is combined in the shuffle phase and the reduce phase reads how many of the same keys are added together to get a total of that total in the Data_output table

Finally, to verify the results,

Because data in hbase cannot be read directly, a program converts the data in hbase into a readable data format code as follows

<span style= "FONT-SIZE:18PX;" >package Hbase_mapred1;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.client.resultscanner;import Org.apache.hadoop.hbase.client.scan;import Org.apache.hadoop.hbase.io.immutablebyteswritable;import Org.apache.hadoop.hbase.util.bytes;public Class Printusercount {public static void main (string[] args) throws Exception {hbaseconfiguration conf = new Hbaseco        Nfiguration ();        htable htable = new htable (conf, "data_output");        Scan scan = new scan ();        Resultscanner scanner = Htable.getscanner (scan);        Result R;            while (((R = Scanner.next ()) = null)) {immutablebyteswritable b = r.getbytes ();            byte[] key = R.getrow ();            int userId = Bytes.toint (key);            byte[] Totalvalue = R.getvalue (bytes.tobytes ("message"), Bytes.tobytes ("total"); int count = ByteS.toint (Totalvalue);        System.out.println ("key:" + userid+ ", Count:" + count);        } scanner.close ();    Htable.close (); }}</span>

<span style= "FONT-SIZE:18PX;" >key:1,  count:1007key:2,  count:1034key:3,  count:962key:4,  count:1001key:5,  count: 1024key:6,  count:1033key:7,  count:984key:8,  count:987key:9,  count:988key:10,  count: 990key:11,  count:1069key:12,  count:965key:13,  count:1000key:14,  count:998key:15,  Count:1002key:16,  count:983 ... </span>

Note that the corresponding package requires the Import program directory structure to be

Hbase + Mapreduce + Eclipse instance

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More