Hbase + Mapreduce + Eclipse instance

Source: Internet
Author: User
Tags static class

The previous blog mentions the eclipse operation stand-alone version of HBase is not familiar with friends can go to see

Eclipse connects and operates a single version of HBase


This article describes a Mapreduce that reads data from HBase and computes the column similar to WordCount but the input at this point is read from HBase


First you need to create an input source

Start HBase, open hbase shell here my config file is no longer a single machine, but HDFs is used as the file system


<span style= "FONT-SIZE:18PX;" ><configuration><property><name>hbase.rootdir</name><value>hdfs://localhost The location where the:9000/hbase</value><description> data is stored. </description></property><property><name>dfs.replication</name><value>1 </value><description> specifies that the number of replicas is 1 because of pseudo-distributed. </description></property></configuration></span>

CREATE table after entering HBase shell

<span style= "FONT-SIZE:18PX;" >hbase (main):007:0> create ' data_input ', ' message ' 0 row (s) in 1.1110 secondshbase (main):008:0> create ' Data_ Output ',{name=> ' message ', version=>1}0 row (s) in 1.0900 seconds</span>

Data_input tables are used to store input data for MapReduce

Data_output the output data used to store MapReduce


Then generate random data into the Data_inout table, where you use eclipse to manipulate hbase into the table data_input and write the data code as follows:


<span style= "FONT-SIZE:18PX;" >package Hbase_mapred1;import Java.util.random;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.client.put;import Org.apache.hadoop.hbase.util.bytes;public class Importer1 {public static                void Main (string[] args) throws Exception {String [] pages = {"/", "/a.html", "/b.html", "/c.html"};                           Hbaseconfiguration hbaseconfig = new Hbaseconfiguration ();        Configuration hbaseconfig=hbaseconfiguration.create ();        htable htable = new Htable (hbaseconfig, "data_input");        Htable.setautoflush (FALSE);                Htable.setwritebuffersize (1024 * 1024 * 12);        int totalrecords = 100000;        int maxid = totalrecords/1000;        Random rand = new Random ();        SYSTEM.OUT.PRINTLN ("Importing" + Totalrecords + "records ...."); for (int i=0; i < TotalrecordS            i++) {int userID = Rand.nextint (MAXID) + 1;            byte [] Rowkey = Bytes.add (Bytes.tobytes (UserID), bytes.tobytes (i));            String randompage = Pages[rand.nextint (pages.length)];            Put put = new put (Rowkey);            Put.add (bytes.tobytes ("message"), Bytes.tobytes ("page"), Bytes.tobytes (Randompage));        Htable.put (Put);        } htable.flushcommits ();        Htable.close ();    System.out.println ("Done"); }}</span>

So far, the data has been written to the table Data_input table, followed by the table's data

as input data for MapReduce

The code is as follows:


<span style= "FONT-SIZE:18PX;" >package Hbase_mapred1;import Java.io.ioexception;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.client.put;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.client.scan;import Org.apache.hadoop.hbase.filter.firstkeyonlyfilter;import Org.apache.hadoop.hbase.io.immutablebyteswritable;import Org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; Import Org.apache.hadoop.hbase.mapreduce.tablemapper;import org.apache.hadoop.hbase.mapreduce.TableReducer; Import Org.apache.hadoop.hbase.util.bytes;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.mapreduce.job;public class FreqCounter1 {static Class Mapper1 extends Tablemapper<immutablebytes        Writable, intwritable> {private int numrecords = 0;        Private static final Intwritable one = new intwritable (1); @Override public void Map (immutablebyteswritable row, Result values, context context) throws IoexceptIon {//extract UserKey from the Compositekey (userId + counter) immutablebyteswritable UserKey = n            EW immutablebyteswritable (Row.get (), 0, Bytes.sizeof_int);            try {context.write (UserKey, one);            } catch (Interruptedexception e) {throw new IOException (e);            } numrecords++;            if ((numrecords% 10000) = = 0) {context.setstatus ("mapper processed" + NumRecords + "records so Far"); }}} public static class Reducer1 extends Tablereducer<immutablebyteswritable, intwritable, Im mutablebyteswritable> {public void reduce (immutablebyteswritable key, iterable<intwritable> values, Conte            XT context) throws IOException, interruptedexception {int sum = 0;            for (intwritable val:values) {sum + = Val.get ();            Put put = new put (Key.get ()); Put.add (bytes.tobytes("message"), Bytes.tobytes ("Total"), bytes.tobytes (sum));            System.out.println (String.Format ("Stats:key:%d, Count:%d", Bytes.toint (Key.get ()), sum));        Context.write (key, put); }} public static void Main (string[] args) throws Exception {hbaseconfiguration conf = new Hbaseconfigur        Ation ();        Job Job = new Job (conf, "Hbase_freqcounter1");        Job.setjarbyclass (Freqcounter1.class);   Scan scan = new scan (); String columns = "Details";        Comma seperated//scan.addcolumns (columns);        Scan.setfilter (New Firstkeyonlyfilter ()); Tablemapreduceutil.inittablemapperjob ("Data_input", scan, Mapper1.class, Immutablebyteswritable.class, IntW        Ritable.class, Job);        Tablemapreduceutil.inittablereducerjob ("Data_output", Reducer1.class, Job);    System.exit (Job.waitforcompletion (true)? 0:1); }}</span>

The approximate logic is:


The map stage reads the data from the datainput and then marks it as 1 the students familiar with the map reduce process should be easily understood


The same key in the data input table is combined in the shuffle phase and the reduce phase reads how many of the same keys are added together to get a total of that total in the Data_output table


Finally, to verify the results,

Because data in hbase cannot be read directly, a program converts the data in hbase into a readable data format code as follows


<span style= "FONT-SIZE:18PX;" >package Hbase_mapred1;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.client.resultscanner;import Org.apache.hadoop.hbase.client.scan;import Org.apache.hadoop.hbase.io.immutablebyteswritable;import Org.apache.hadoop.hbase.util.bytes;public Class Printusercount {public static void main (string[] args) throws Exception {hbaseconfiguration conf = new Hbaseco        Nfiguration ();        htable htable = new htable (conf, "data_output");        Scan scan = new scan ();        Resultscanner scanner = Htable.getscanner (scan);        Result R;            while (((R = Scanner.next ()) = null)) {immutablebyteswritable b = r.getbytes ();            byte[] key = R.getrow ();            int userId = Bytes.toint (key);            byte[] Totalvalue = R.getvalue (bytes.tobytes ("message"), Bytes.tobytes ("total"); int count = ByteS.toint (Totalvalue);        System.out.println ("key:" + userid+ ", Count:" + count);        } scanner.close ();    Htable.close (); }}</span>


<span style= "FONT-SIZE:18PX;" >key:1,  count:1007key:2,  count:1034key:3,  count:962key:4,  count:1001key:5,  count: 1024key:6,  count:1033key:7,  count:984key:8,  count:987key:9,  count:988key:10,  count: 990key:11,  count:1069key:12,  count:965key:13,  count:1000key:14,  count:998key:15,  Count:1002key:16,  count:983 ... </span>



Note that the corresponding package requires the Import program directory structure to be



Hbase + Mapreduce + Eclipse instance

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.