Last Update:2014-12-22
Source: Internet
Author: User
Keywords
nbsp
commodity
value
realization
&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; Reprint please indicate the source: http://blog.csdn.net/xiaojimanman/article/details/40184581
I've been reading Hadoop-related books for the last few days, and I'm feeling a little bit at the moment, and I've compiled a statistical-related product myself, modelled on the WordCount program.
Requirements Description:
According to the supermarket sales list, calculate the degree of association between the goods (that is, the number of buy A and b goods at the same time).
Data format:
The supermarket sales list is simplified to the following format: One line represents a list, each item is separated by "," as shown in the following figure:
Requirements Analysis:
The requirement is computed using the MapReduce in Hadoop.
The map function mainly splits the associated goods, the output is the key for the commodity a,value as product B, for the first three results of the split results as shown in the following figure:
Here in order to statistics and A, b two items want to be associated with the goods, so the product A, b relationship between the output of two results that is B, B.
The reduce function is divided into commodity a-related merchandise, that is, the number of occurrences of each commodity in value, the output of the key for the commodity a| commodity b,value for the number of times the combination appears. For the 5 records mentioned above, we analyze the key value of r in the map output:
By processing the map function, the records shown in the following figure are obtained:
In reduce, the value of the map output is grouped together, and the resulting results are shown in the following figure
The product a B as a key, the number of combinations as value output, the output of the following figure shows:
To the requirements of the implementation of the process of analysis to the end of the current, the following look at the specific code implementation
Code implementation:
About the code does not do a detailed introduction, specific reference to the comments in the code.
[Java] View plaincopypackage com; import java.io.ioexception; import java.util.hashmap; import java.util.map.entry; import org.apache.hadoop.conf.configuration; import org.apache.hadoop.conf.configured; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.longwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapreduce.job; import org.apache.hadoop.mapreduce.mapper; import org.apache.hadoop.mapreduce.reducer; import org.apache.hadoop.mapreduce.lib.input.fileinputformat; import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; import org.apache.hadoop.mapreduce.lib.output.textoutputformat; import org.apache.hadoop.util.tool; import org.apache.hadoop.util.toolrunner; PUBLIC&NBSP;CLASS&NBSP;TEST&NBSP;extends configured implements tool{ /** * Map class, implementing data preprocessing * output key for commodity a value for related products b * @author lulei */ Public static class mapt extends Mapper<LongWritable, Text, Text, Text> { Public void map (Longwritable key, text value, context context) throws ioexception, interruptedexception{ string line = Value.tostring (); if ( line == null | | ". Equals (line)) { //split product String []vs.nbsp;= line.split (","); //22 combination, form a record & nbsp; for (int i = 0; i < (vs.length - 1); i++) { if ("". Equals (Vs[i)) {//excluding NULL records continue; } for (int j = i+1; j < vs.length; j++) { if ("". Equals (Vs[j)) { continue; } //output result & nbsp; Context.write (New text (vs[i)), new text (Vs[j]); Context.write (New text (Vs[j]), new text (vs[i)); } } } } /** * reduce class, implementation data Count * output key for commodity a| B value for this association * @author lulei */ Public static class reducet extends reducer<text, text, text, intwritable> { Private int count; /** * initialization */ Public void setup (Context context) { //Get minimum number of records from parameters string countstr = context.getconfiguration (). Get ("Count"); try { This.count = integer.parseint (COUNTSTR); } catch (Exception e) { this.count = 0; } } Public void reduce (text key, iterable<text > values, context context) throws IOException, InterruptedException{ string keystr = key.tostring (); Hashmap<string, integer> hashmap = new hashmap<string, Integer> (); //Use hash statistics B product times for (text value : values) { string valuestr = value.tostring (); if (Hashmap.containskey (VALUESTR)) { Hashmap.put (Valuestr, hashmap.get (VALUESTR) + 1) &nbSp; } else { Hashmap.put (valuestr, 1); } } //output result for (Entry<string, integer> entry : hashmap.entryset ()) { if (Entry.getvalue () >= This.count) {//only output times not less than minimum Context.write (New text (keystr + "|") + entry.getkey ()), new intwritable (Entry.getvalue ()); } } } } @Override Public int run (string[] arg0) throws Exception { // todo auto-generated method stub configuration conf = getconf (); Conf.set ("Count", arg0[2]); JOB job = new job (conf); Job.setjobname ("Jobtest"); Job.setoutputformatclass (textoutputformat.class); Job.setoutputkeyclass (text.class); Job.setoutputvalueclass ( Text.class); Job.setmapperclass (Mapt.class); Job.setreducerclass (Reducet.class); Fileinputformat.addinputpath (job, new Path (Arg0[0])); Fileoutputformat.setoutputpath (Job, new path (arg0[1)); Job.waitforcompletion (true); return job.issuccessful () ? 0 : 1; } /** * @param args */ Public static void main ( String[] args) { // todo auto-generated method stub if (args.length != 3) { System.exit ( -1); } try { int res = Toolrunner.run (New configuration (), new test (), args); System.exit (RES); } catch (exception e) { // todo auto-generated catch block E.printstacktrace (); } } }
Upload run:
Package the program into a jar file and upload it to the cluster. The test data is also uploaded to the HDFs Distributed file system.
Command run screenshot as shown in the following illustration:
View the appropriate HDFs file system after the run is completed, as shown in the following illustration:
To this a complete MapReduce program is completed, on the learning of Hadoop, they will continue to ~