The second order of the Hadoop MapReduce Programming API Starter Series

Last Update:2016-12-12 Source: Internet

Author: User

Tags shuffle stub hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Not much to say, directly on the code.

2016-12-12 17:04:32,012 INFO [org.apache.hadoop.metrics.jvm.JvmMetrics]-Initializing JVM metrics with processname= Jobtracker, sessionid=
2016-12-12 17:04:33,056 WARN [Org.apache.hadoop.mapreduce.JobSubmitter]-Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with Toolrunner to remedy this.
2016-12-12 17:04:33,059 WARN [org.apache.hadoop.mapreduce.JobSubmitter]-No job jar file set. User classes May is not found. See Job or Job#setjar (String).
2016-12-12 17:04:33,083 INFO [Org.apache.hadoop.mapreduce.lib.input.FileInputFormat]-Total input paths to process:1
2016-12-12 17:04:33,161 INFO [Org.apache.hadoop.mapreduce.JobSubmitter]-Number of splits:1
2016-12-12 17:04:33,562 INFO [Org.apache.hadoop.mapreduce.JobSubmitter]-Submitting tokens for job:job_ local1173601391_0001
2016-12-12 17:04:34,242 INFO [org.apache.hadoop.mapreduce.Job]-the URL to track the job:http://localhost:8080/
2016-12-12 17:04:34,244 INFO [Org.apache.hadoop.mapreduce.Job]-Running job:job_local1173601391_0001
2016-12-12 17:04:34,247 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-outputcommitter set in config null
2016-12-12 17:04:34,264 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Outputcommitter is Org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2016-12-12 17:04:34,371 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Waiting for map tasks
2016-12-12 17:04:34,373 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Starting task:attempt_local1173601391_0001_ M_000000_0
2016-12-12 17:04:34,439 INFO [Org.apache.hadoop.yarn.util.ProcfsBasedProcessTree]-Procfsbasedprocesstree currently is supported only on Linux.
2016-12-12 17:04:34,667 INFO [Org.apache.hadoop.mapred.Task]-Using resourcecalculatorprocesstree: Org.apache.hadoop.yarn.util[email protected]
2016-12-12 17:04:34,676 INFO [org.apache.hadoop.mapred.MapTask]-processing split:file:/d:/code/myeclipsejavacode/ mymapreduce/data/secondarysort/secondarysort.txt:0+120
2016-12-12 17:04:34,762 INFO [Org.apache.hadoop.mapred.MapTask]-(EQUATOR) 0 kvi 26214396 (104857584)
2016-12-12 17:04:34,763 INFO [Org.apache.hadoop.mapred.MapTask]-mapreduce.task.io.sort.mb:100
2016-12-12 17:04:34,763 INFO [org.apache.hadoop.mapred.MapTask]-Soft limit at 83886080
2016-12-12 17:04:34,763 INFO [Org.apache.hadoop.mapred.MapTask]-bufstart = 0; Bufvoid = 104857600
2016-12-12 17:04:34,763 INFO [Org.apache.hadoop.mapred.MapTask]-kvstart = 26214396; Length = 6553600
2016-12-12 17:04:34,771 INFO [org.apache.hadoop.mapred.MapTask]-Map output Collector class = Org.apache.hadoop.mapred.maptask$mapoutputbuffer
2016-12-12 17:04:34,789 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-
2016-12-12 17:04:34,789 INFO [Org.apache.hadoop.mapred.MapTask]-Starting flush of map output
2016-12-12 17:04:34,789 INFO [Org.apache.hadoop.mapred.MapTask]-Spilling map output
2016-12-12 17:04:34,789 INFO [Org.apache.hadoop.mapred.MapTask]-bufstart = 0; Bufend = 216; Bufvoid = 104857600
2016-12-12 17:04:34,790 INFO [Org.apache.hadoop.mapred.MapTask]-Kvstart = 26214396 (104857584); Kvend = 26214328 (104857312); Length = 69/6553600
2016-12-12 17:04:34,809 INFO [Org.apache.hadoop.mapred.MapTask]-finished spill 0
2016-12-12 17:04:34,818 INFO [Org.apache.hadoop.mapred.Task]-Task:attempt_local1173601391_0001_m_000000_0 is done. and is in the process of committing
2016-12-12 17:04:34,838 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-map
2016-12-12 17:04:34,838 INFO [org.apache.hadoop.mapred.Task]-Task ' attempt_local1173601391_0001_m_000000_0 ' done.
2016-12-12 17:04:34,838 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Finishing task:attempt_local1173601391_0001 _m_000000_0
2016-12-12 17:04:34,839 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Map task executor complete.
2016-12-12 17:04:34,846 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Waiting for reduce tasks
2016-12-12 17:04:34,846 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Starting task:attempt_local1173601391_0001_ R_000000_0
2016-12-12 17:04:34,864 INFO [Org.apache.hadoop.yarn.util.ProcfsBasedProcessTree]-Procfsbasedprocesstree currently is supported only on Linux.
2016-12-12 17:04:34,950 INFO [Org.apache.hadoop.mapred.Task]-Using resourcecalculatorprocesstree: [Email protected]
2016-12-12 17:04:34,954 INFO [Org.apache.hadoop.mapred.ReduceTask]-Using shuffleconsumerplugin: [Email protected]
2016-12-12 17:04:34,974 INFO [Org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]-Mergermanager:memorylimit =1327077760, maxsingleshufflelimit=331769440, mergethreshold=875871360, iosortfactor=10, memtomemmergeoutputsthreshold=10
2016-12-12 17:04:35,011 INFO [Org.apache.hadoop.mapreduce.task.reduce.EventFetcher]-attempt_local1173601391_0001_ R_000000_0 Thread Started:eventfetcher for fetching MAP completion Events
2016-12-12 17:04:35,048 INFO [Org.apache.hadoop.mapreduce.task.reduce.LocalFetcher]-localfetcher#1 about to shuffle Output of Map Attempt_local1173601391_0001_m_000000_0 decomp:254 len:258 to MEMORY
2016-12-12 17:04:35,060 INFO [org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput]-Read 254 bytes from Map-output for Attempt_local1173601391_0001_m_000000_0
2016-12-12 17:04:35,123 INFO [Org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]-Closeinmemoryfile Map-output of size:254, Inmemorymapoutputs.size () 1, commitmemory-0, Usedmemory->254
2016-12-12 17:04:35,125 INFO [Org.apache.hadoop.mapreduce.task.reduce.EventFetcher]-Eventfetcher is interrupted. Returning
2016-12-12 17:04:35,126 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-1/1 copied.
2016-12-12 17:04:35,126 INFO [Org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]-Finalmerge called with 1 In-memory map-outputs and 0 on-disk map-outputs
2016-12-12 17:04:35,136 INFO [Org.apache.hadoop.mapred.Merger]-Merging 1 sorted segments
2016-12-12 17:04:35,137 INFO [Org.apache.hadoop.mapred.Merger]-down to the last Merge-pass, with 1 segments left of Tota L size:244 bytes
2016-12-12 17:04:35,139 INFO [Org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]-merged 1 segments, 254 bytes To disk to satisfy reduce memory limit
2016-12-12 17:04:35,139 INFO [Org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]-Merging 1 files, 258 bytes From disk
2016-12-12 17:04:35,140 INFO [Org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]-merging 0 segments, 0 bytes From memory to reduce
2016-12-12 17:04:35,141 INFO [Org.apache.hadoop.mapred.Merger]-Merging 1 sorted segments
2016-12-12 17:04:35,142 INFO [Org.apache.hadoop.mapred.Merger]-down to the last Merge-pass, with 1 segments left of Tota L size:244 bytes
2016-12-12 17:04:35,143 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-1/1 copied.
2016-12-12 17:04:35,150 INFO [org.apache.hadoop.conf.Configuration.deprecation]-Mapred.skip.on is deprecated. Instead, use Mapreduce.job.skiprecords
2016-12-12 17:04:35,158 INFO [Org.apache.hadoop.mapred.Task]-Task:attempt_local1173601391_0001_r_000000_0 is done. and is in the process of committing
2016-12-12 17:04:35,160 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-1/1 copied.
2016-12-12 17:04:35,160 INFO [Org.apache.hadoop.mapred.Task]-Task Attempt_local1173601391_0001_r_000000_0 is allowed To commit now
2016-12-12 17:04:35,166 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]-Saved output of task ' Attempt_local1173601391_0001_r_000000_0 ' to File:/d:/code/myeclipsejavacode/mymapreduce/out/secondarysort/_ temporary/0/task_local1173601391_0001_r_000000
2016-12-12 17:04:35,167 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Reduce > reduce
2016-12-12 17:04:35,167 INFO [org.apache.hadoop.mapred.Task]-Task ' attempt_local1173601391_0001_r_000000_0 ' done.
2016-12-12 17:04:35,167 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Finishing task:attempt_local1173601391_0001 _r_000000_0
2016-12-12 17:04:35,168 INFO [Org.apache.hadoop.mapred.LocalJobRunner]-Reduce task executor complete.
2016-12-12 17:04:35,248 INFO [Org.apache.hadoop.mapreduce.Job]-Job job_local1173601391_0001 running in Uber Mode:false
2016-12-12 17:04:35,249 INFO [Org.apache.hadoop.mapreduce.Job]-map 100% reduce 100%
2016-12-12 17:04:35,251 INFO [Org.apache.hadoop.mapreduce.Job]-Job job_local1173601391_0001 completed successfully
2016-12-12 17:04:35,271 INFO [Org.apache.hadoop.mapreduce.Job]-counters:33
File System Counters
File:number of bytes read=1186
File:number of bytes written=394623
File:number of Read operations=0
File:number of Large Read operations=0
File:number of Write Operations=0
Map-reduce Framework
Map input Records=18
Map Output records=18
Map Output bytes=216
Map output materialized bytes=258
Input Split bytes=145
Combine input Records=0
Combine Output Records=0
Reduce input groups=4
Reduce Shuffle bytes=258
Reduce input Records=18
Reduce Output records=18
Spilled records=36
Shuffled Maps =1
Failed shuffles=0
Merged Map Outputs=1
GC time Elapsed (ms) =0
CPU Time Spent (ms) =0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes) =534773760
Shuffle Errors
Bad_id=0
Connection=0
Io_error=0
Wrong_length=0
Wrong_map=0
Wrong_reduce=0
File Input Format Counters
Bytes read=120
File Output Format Counters
Bytes written=115

Code

Package zhouls.bigdata.myMapReduce.SecondarySort;

Import Java.io.DataInput;
Import Java.io.DataOutput;
Import java.io.IOException;
Import org.apache.hadoop.io.WritableComparable;

The first step: Customizing the Intpair class, encapsulating the Key/value in the sample data as a whole as a key, implementing the Writablecomparable interface and overriding its methods.
/**
* Self-defined key class should implement Writablecomparable interface
*/
public class Intpair implements writablecomparable< intpair>
{
int first;//First member variable
int second;//Second member variable
public void Set (int left, int. right)
{
First = left;
second = right;
}
public int GetFirst ()
{
return first;
}
public int Getsecond ()
{
return second;
}

Deserialization, converting from binary in stream to Intpair
public void ReadFields (Datainput in) throws IOException
{
First = In.readint ();
Second = In.readint ();
}

serialization, converting Intpair to binary with streaming
public void Write (DataOutput out) throws IOException
{
Out.writeint (first);
Out.writeint (second);
}

Comparison of key
public int compareTo (Intpair o)
{
TODO auto-generated Method Stub
if (first! = O.first)
{
Return first < O.first? -1:1;
}else if (second! = O.second)
{
Return Second < O.second? -1:1;
}else
{
return 0;
}
}

@Override
public int hashcode ()
{
Return First * 157 + second;
}
@Override
public boolean equals (Object right)
{
if (right = = null)
return false;
if (this = right)
return true;
if (right instanceof Intpair)
{
Intpair r = (Intpair) right;
return R.first = = First && R.second = = Second;
}else
{
return false;
}
}
}

Package zhouls.bigdata.myMapReduce.SecondarySort;

Import Zhouls.bigdata.myMapReduce.Join.JoinRecordAndStationName;

Import java.io.IOException;

Import Java.util.StringTokenizer;

Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.conf.Configured;
Import Org.apache.hadoop.fs.FileSystem;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import org.apache.hadoop.io.WritableComparable;
Import Org.apache.hadoop.io.WritableComparator;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Partitioner;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

Import Org.apache.hadoop.util.Tool;
Import Org.apache.hadoop.util.ToolRunner;

public class Secondarysort extends configured implements Tool
{
Custom Map
public static class Map extends mapper< longwritable, Text, Intpair, intwritable>
{
Private final Intpair Intkey = new Intpair ();
Private final intwritable intvalue = new intwritable ();

public void Map (longwritable key, Text value, Context context) throws IOException, Interruptedexception
{
String line = value.tostring ();
StringTokenizer tokenizer = new StringTokenizer (line);
int left = 0;
int right = 0;
if (Tokenizer.hasmoretokens ())
{
left = Integer.parseint (Tokenizer.nexttoken ());
if (Tokenizer.hasmoretokens ())
right = Integer.parseint (Tokenizer.nexttoken ());
Intkey.set (left, right);
Intvalue.set (right);
Context.write (Intkey, intvalue);
}
}
}

The second step: Custom partition Function Class Firstpartitioner, based on the first implementation of Intpair partition.
/**
* Partition function class. Determine partition according to first.
*/
public static class Firstpartitioner extends partitioner< Intpair, intwritable>
{
@Override
public int getpartition (Intpair key, intwritable value,int numpartitions)
{
Return Math.Abs (Key.getfirst () * 127)% Numpartitions;
}
}

Step three: Custom Sortcomparator implement first and second sorting in the Intpair class. This method is not used in this course, but is implemented using the CompareTo () method in Intpair.
Fourth step: Customize the Groupingcomparator class to enable grouping of data within the partition.
/**
* Inherit Writablecomparator
*/
public static class Groupingcomparator extends Writablecomparator
{
Protected Groupingcomparator ()
{
Super (Intpair.class, true);
}
@Override
Compare, Writablecomparables.
public int Compare (writablecomparable W1, writablecomparable W2)
{
Intpair ip1 = (intpair) W1;
Intpair ip2 = (intpair) W2;
int L = Ip1.getfirst ();
int r = Ip2.getfirst ();
return L = = r? 0: (L < R -1:1);
}
}

//Custom Reduce
public static class Reduce extends reducer< Intpair, intwritable, Text, Intwritable>
{
Private final text left = new text ();
public void reduce (Intpair key, iterable< intwritable> Values,context Context) throws IOException, interrupted Exception
{
Left.set (integer.tostring (Key.getfirst ()));
for (intwritable val:values)
{
Context.write (left, Val);
}
}
}

public int run (string[] args) throws Exception
{
//TODO auto-generated method Stu b
Configuration conf = new configuration ();
Path mypath=new path (args[1]);
FileSystem HDFs = Mypath.getfilesystem (conf);
if (hdfs.isdirectory (mypath))
{
Hdfs.delete (MyPath, True);
}

Job Job = new Job (conf, "Secondarysort");
Job.setjarbyclass (Secondarysort.class);

Fileinputformat.setinputpaths (Job, New Path (args[0]));//input path
Fileoutputformat.setoutputpath (Job, New Path ( ARGS[1]));//Output path

Job.setmapperclass (Map.class);//Mapper
Job.setreducerclass (Reduce.class);//Reducer
Job.setnumreducertask (3);

Job.setpartitionerclass (Firstpartitioner.class);//partition function
Job.setsortcomparatorclass (Keycomparator.class);//This course does not have custom sortcomparator, but instead uses Intpair's own sort
Job.setgroupingcomparatorclass (Groupingcomparator.class);//Group function

Job.setmapoutputkeyclass (Intpair.class);
Job.setmapoutputvalueclass (Intwritable.class);

Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);

Job.setinputformatclass (Textinputformat.class);
Job.setoutputformatclass (Textoutputformat.class);

Return Job.waitforcompletion (True)? 0:1;
}

/**
* @param args
* @throws Exception
*/
public static void Main (string[] args) throws Exception
{
TODO auto-generated Method Stub

String[] args0={"Hdfs://hadoopmaster:9000/secondarysort/secondarysort.txt",
"Hdfs://hadoopmaster:9000/out/secondarysort"};

String[] args0={"./data/secondarysort/secondarysort.txt",
"./out/secondarysort"};

int EC =toolrunner.run (new Configuration (), New Secondarysort (), ARGS0);
System.exit (EC);
}
}

The second order of the Hadoop MapReduce Programming API Starter Series

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More