Use Oozie's workflow to perform Mr Procedures in hue

Last Update:2018-07-25 Source: Internet

Author: User

Tags map class static class

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Written in front: the institute built a set of CDH5.9 version of the Hadoop cluster, previously used to use the command line to operate, these days try to use Oozie in hue in the workflows to execute the MR Program, found stepping on a lot of pits (not used before, and did not find the corresponding tutorial, if you have to know the good tutorial may leave a Feeling Of the "excitation").
Pit 1: The standard Mr Program can normally output the correct results when executed on the Linux command line, but when executed using workflows, it is output according to the original file's row data.
The preparation of the pit 2:MR program
Pit 3: No matter the command line or workflows executes the MR Program, the result is that many files are exported, and many files are empty files.

1. Create a new workflow

Create a new workflow, named My test, can also be related to the description, the upper-right corner of the workspace can see the workflow working directory, open when it is empty, The corresponding workflow.xml and the Lib directory (which contains the dependent jar packages) and the job.properties are generated only when you submit the workflow and then the working directory.

2. Edit this workflow

Drag a MapReduce action from the actions above to drop your action here to go, then go to the MapReduce editing interface.

In this case, the jar name requires you to select a jar package that corresponds to the WordCount program you have written, and the jar must be uploaded to the HDFs directory, which I store in the/user/xudong directory.

Then click properties+ to add the corresponding attribute (usually refers to a series of attribute parameters that you set in the main method when you write the MR Program).

here are the areas to note:
A. If you execute the MR Program under the Linux command line, you need to write the main method in your program and set the job's properties (Specify the job's map and reduce class, output input, etc.), but you do not need to write the main method when using Oozie workflow in Hue. Say you only need to write the map class and the reduce class (or Partitin Class). This is pit 1.
B. Input and output parameters write ${inputdir} and ${outputdir}, so that the purpose of writing is to submit a dialog box that requires you to specify the input and output path, the same can be written in the corresponding path.

3. Submit Workflow
As shown above, when you have finished writing the input and output paths, click Submit to submit the job to run.

Attribute parameter Description:
Mapreduce.input.fileinputformat.inputdir "${inputdir}": Enter directory Parameters

Mapreduce.output.fileoutputformat.outputdir "${outputdir}": Output directory parameter

Mapreduce.job.map.class "Com.mr.simple.wordcount$tokenizermapper":

Specify the Map class (where Worcount is the name of the class, $TokenizerMapper refers to the map class)

Mapreduce.job.reduce.class "Com.mr.simple.wordcount$intsumreducer": Specify the Reduce class

Mapreduce.job.output.key.class "Org.apache.hadoop.io.Text": Specify the key output format for map and reduce

Mapreduce.job.output.value.class "org.apache.hadoop.io.LongWritable": Specify the value output format for map and reduce
(if the output format of the two is not equal, you also need to continue to add parameters to set separately)

Mapred.mapper.new-api "true" and Mapred.reducer.new-api "true": Set up using the new API

Mapreduce.job.reduces "1 (Qty)": Sets the number of tasks for reduce (after specifying the number of reduce, the output will not have a lot of empty files, Pit 3)

4.wordcount of Mr Program writing

Why this is mentioned here, is because the online blog various rewrite of the Mr Program, in the process of implementation will be a variety of errors, recommended the use of the official website of the standard writing format (recommended under the examples in Hadoop source research). The procedure is as follows:

Import java.io.IOException;

Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount {public static class Tokenizermapper extends Mapper<object, text, text, INTWRITABLE&G T
    {Private final static intwritable one = new intwritable (1);

    Private text Word = new text ();
      public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception {
      StringTokenizer ITR = new StringTokenizer (value.tostring ());
        while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());
     Context.write (Word, one); }}} public static class Intsumreducer extends Reducer<text,intwritable,text,intwritable> {PR

    Ivate intwritable result = new intwritable ();
                       public void reduce (Text key, iterable<intwritable> values, context context
      ) throws IOException, interruptedexception {int sum = 0;
      for (intwritable val:values) {sum + = Val.get ();
      } result.set (sum);
    Context.write (key, result);
  }}//NOTE: If you use Workflow execution, the main method must not write ...
    public static void Main (string[] args) throws Exception {configuration conf = new Configuration ();
    Job Job = job.getinstance (conf, "word count");
    Job.setjarbyclass (Wordcount.class);
    Job.setmapperclass (Tokenizermapper.class);
    Job.setcombinerclass (Intsumreducer.class);
    Job.setreducerclass (Intsumreducer.class);
    Job.setoutputkeyclass (Text.class);
    Job.setoutputvalueclass (Intwritable.class); Fileinputformat.addinputpath (JOB, New Path (Args[0]));
    Fileoutputformat.setoutputpath (Job, New Path (Args[1]));
  System.exit (Job.waitforcompletion (true)? 0:1); }
}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More