How to submit a MapReduce compute task for yarn through a Java program

Last Update:2014-11-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Due to project requirements, the compute task of yarn's mapreduce needs to be submitted through a Java program. Unlike the generic commit mapreduce task through the jar package, it requires a small change to submit the MapReduce task through the program, as described in the following code.

Here is the MapReduce main program, there are a few points to mention:

1, in the program, I read the file into the format set to Wholefileinputformat, that is, the file is not sliced.

2, in order to control the processing process of reduce, map output key format is the combination of key format. Unlike the regular <key,value>, this changes to the <textpair,value>,textpair format of <key1,key2>.

3, in order to adapt to the combination of keys, re-set the grouping function, that is, Groupcomparator. The grouping rule is that the data is assigned to a reduce container as long as the key1 in Textpair is the same (without requiring the same key2). Thus, when the same Key1 data enters the reduce container, the Key2 plays a role in the data identification.

Package Web.hadoop;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.byteswritable;import Org.apache.hadoop.io.writablecomparable;import Org.apache.hadoop.io.writablecomparator;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.jobstatus;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.partitioner;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.mapreduce.lib.output.nulloutputformat;import util. Utils;public class Gemimain {public Gemimain () {job = null;} Public Job job;public Static class Namepartitioner Extendspartitioner<textpair, byteswritable> {@Overridepublic int getpartition (Textpair key, byteswritable value,int numpartitions) {return Math.Abs (Key.getfirst (). HashCode () * 127) % Numpartitions;}} /** * Grouping settings class, as long as two TExtpair the first key is the same, they belong to the same group. Their value is placed in a value iterator, * and then into the reducer's reduce method. * * @author HDUser * */public static class Groupcomparator extends Writablecomparator {public groupcomparator () {Super (T Extpair.class, True);} @Overridepublic int Compare (writablecomparable A, writablecomparable b) {Textpair T1 = (Textpair) A; Textpair t2 = (Textpair) b;//compares the same return 0, the comparison is different return -1return T1.getfirst (). CompareTo (T2.getfirst ()); As long as the first field is the same, it becomes the same group}}public Boolean runjob (string[] args) throws Ioexception,classnotfoundexception, interruptedexception {Configuration conf = new configuration ();//Set the Outputath variable in conf to get the value of the parameter in the Reduce function conf.set ( "OutputPath", args[args.length-1].tostring ());//Set HDFS, each time the task generates the product's quality file in the same folder. The second-to-last origin of the args array is the folder Conf.set ("Qualityfolder", Args[args.length-2].tostring ()) where the quality file resides, or the root path of the Web project if it is run in the server If debugging in Java application, read the configuration file under the/opt/hadoop-2.5.0/etc/hadoop/directory//mapreduceprogress mprogress = new mapreduceprogress ();// String rootpath= Mprogress.rootpath; String rootpath= "/opt/hadoop-2.5.0/etc/hadoop/", Conf.addresource (New Path (rootpath+" Yarn-site.xml ")), Conf.addresource (New Path (rootpath+" Core-site.xml ")), Conf.addresource (New Path (rootpath+" Hdfs-site.xml ")), Conf.addresource (New Path (rootpath+ Mapred-site.xml ") This.job = new Job (conf); Job.setjobname (" Job Name: "+ args[0]); Job.setjarbyclass (Gemimain.class); Job.setmapperclass (Gemimapper.class); Job.setmapoutputkeyclass (Textpair.class); Job.setmapoutputvalueclass ( Byteswritable.class);//Set Partitionjob.setpartitionerclass (Namepartitioner.class);// Group Job.setgroupingcomparatorclass (Groupcomparator.class) After partitioning according to specified conditions; Job.setreducerclass (Gemireducer.class); Job.setinputformatclass (Wholefileinputformat.class); Job.setoutputformatclass (nulloutputformat.class);// Job.setoutputkeyclass (Nullwritable.class);//Job.setoutputvalueclass (Text.class); Job.setnumreducetasks (8);// Set the path for the computed input data for (int i = 1; i < args.length-2; i++) {Fileinputformat.addinputpath (Job, New Path (Args[i]));} The last element of the args array is the output path fileoutputformat.setoutputPath (Job, New Path (args[args.length-1])); Boolean flag = Job.waitforcompletion (true); return flag;} @SuppressWarnings ("static-access") public static void main (string[] args) throws Classnotfoundexception,ioexception, interruptedexception {string[] inputpaths = new string[] {"Normalizejob", "hdfs://192.168.168.101:9000/user/hduser/ Red1/"," hdfs://192.168.168.101:9000/user/hduser/nir1/"," quality11111 "," Hdfs://192.168.168.101:9000/user/hduser /test "};       Gemimain test = new Gemimain (); Boolean result = Test.runjob (inputpaths); }}

The following is the Textpair class

public class Textpair implements writablecomparable<textpair> {private text first;private text second;public Textpair () {Set (new text (), new text ());} Public Textpair (string first, string second) {Set (new text (first), new text (second));} Public Textpair (text first, text second) {Set (first, second);} public void Set (text first, text second) {This.first = First;this.second = Second;} Public Text GetFirst () {return first;} Public Text Getsecond () {return second;} @Overridepublic void Write (DataOutput out), throws IOException {First.write (out), Second.write (out);} @Overridepublic void ReadFields (Datainput in) throws IOException {First.readfields (in); Second.readfields (in);} @Overridepublic int hashcode () {return First.hashcode () * 163 + Second.hashcode ();} @Overridepublic boolean equals (Object o) {if (o instanceof textpair) {textpair TP = (textpair) O;return first.equals (tp.fi RST) && second.equals (Tp.second);} return false;} @Overridepublic String toString () {return first + "\ T" + second;}@Override/**a.compareto (B) * If the comparison is the same, the comparison result is 0 * If A is greater than B, then the comparison result is 1 * If A is less than B, then the comparison result is-1 * */public int compareTo (textpair tp) {int CMP = First.compareto (Tp.first); if (cmp! = 0) {return CMP;} At this point the implementation is in ascending order return Second.compareto (Tp.second);}}

The following is Wholefileinputformat, whose control data is not sliced during the mapreduce process

Package Web.hadoop;import java.io.IOException;  Import Org.apache.hadoop.fs.Path;  Import org.apache.hadoop.io.BytesWritable;  Import Org.apache.hadoop.io.Text;  Import Org.apache.hadoop.mapreduce.InputSplit;  Import Org.apache.hadoop.mapreduce.JobContext;  Import Org.apache.hadoop.mapreduce.RecordReader;  Import Org.apache.hadoop.mapreduce.TaskAttemptContext; Import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;public class Wholefileinputformat extends Fileinputformat<text, byteswritable> {@Override public recordreader<text, byteswritable> createRe           Cordreader (Inputsplit arg0, Taskattemptcontext arg1) throws IOException, Interruptedexception {      TODO auto-generated Method stub return new Wholefilerecordreader ();  } @Override protected Boolean issplitable (Jobcontext context, Path filename) {//TODO auto-generated      Method stub return false; }  }

The following is the Wholefilerecordreader class

Package Web.hadoop;import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.fsdatainputstream;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.Path; Import Org.apache.hadoop.io.byteswritable;import Org.apache.hadoop.io.ioutils;import org.apache.hadoop.io.Text; Import Org.apache.hadoop.mapreduce.inputsplit;import Org.apache.hadoop.mapreduce.recordreader;import Org.apache.hadoop.mapreduce.taskattemptcontext;import Org.apache.hadoop.mapreduce.lib.input.filesplit;public Class Wholefilerecordreader extends Recordreader<text, byteswritable> {private Filesplit filesplit;private Fsdatainputstream fis;private Text key = null;private byteswritable value = null;private Boolean processed = false; @Overri depublic void Close () throws IOException {//TODO auto-generated method stub//fis.close ();} @Overridepublic Text Getcurrentkey () throws IOException, interruptedexception {//TODO auto-generated method Stubreturn T His.key;} @OverridepubliC byteswritable GetCurrentValue () throws Ioexception,interruptedexception {//TODO auto-generated method Stubreturn This . Value;} @Overridepublic void Initialize (Inputsplit inputsplit, Taskattemptcontext tacontext) throws IOException, interruptedexception {filesplit = (filesplit) inputsplit; Configuration job = Tacontext.getconfiguration (); Path file = Filesplit.getpath (); FileSystem fs = File.getfilesystem (job); FIS = fs.open (file);} @Overridepublic Boolean nextkeyvalue () {if (key = = null) {key = new Text ();} if (value = = null) {value = new byteswritable ();} if (!processed) {byte[] content = new byte[(int) filesplit.getlength ()]; Path file = Filesplit.getpath (); System.out.println (File.getname ()); Key.set (File.getname ()); try {ioutils.readfully (FIS, content, 0, content.length) ;//Value.set (content, 0, content.length); Value.set (new byteswritable (content));} catch (IOException e) {//TODO auto-generated catch Blocke.printstacktrace ();} finally {Ioutils.closestream (FIS);} processed = True;return true;}return false;} @Overridepublic float getprogress () throws IOException, interruptedexception {//TODO auto-generated method Stubreturn PR Ocessed? Filesplit.getlength (): 0;}}

How to submit a MapReduce compute task for yarn through a Java program

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to submit a MapReduce compute task for yarn through a Java program

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support