Mapreduce program deployment

Source: Internet
Author: User

Although we can run some encapsulated instance programs very quickly through shell commands on the Virtual Machine Client, in the application, we still need to write code and deploy it to the server. below, I will talk about the deployment process of a program through the program.

After hadoop is started, the program can be run into a jar package, and the corresponding third-party jar package is included. Run hadoop jar XXX. + driver name.

package com.mapred;import java.io.IOException;import java.io.PrintStream;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Mapper.Context;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount{  public static void main(String[] args)    throws Exception  {    Configuration conf = new Configuration();  /*  String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();    if (otherArgs.length != 2) {      System.err.println("Usage: wordcount <in> <out>");      System.exit(2);    }*/    Job job = new Job(conf, "word count");    job.setJarByClass(WordCount.class);    FileInputFormat.addInputPath(job, new Path("hdfs://ubuntu:9000/Input"));    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);       FileOutputFormat.setOutputPath(job, new Path("hdfs://ubuntu:9000/output09"));    job.waitForCompletion(true);  }  public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>  {    private IntWritable result;    public IntSumReducer()    {      this.result = new IntWritable();    }    public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException    {      int sum = 0;      for (IntWritable val : values) {        sum += val.get();      }      this.result.set(sum);      context.write(key, this.result);    }  }  public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>  {    private static final IntWritable one = new IntWritable(1);    private Text word;    public TokenizerMapper()    {      this.word = new Text();    }    public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {      StringTokenizer itr = new StringTokenizer(value.toString());      while (itr.hasMoreTokens()) {        this.word.set(itr.nextToken());        context.write(this.word, one);      }    }  }}
Pay attention to the following items during the running process:

The first thing to note is whether the file location on HDFS is correct. Remember that you only need to specify the directory name. How many detailed files are in it, and hadoop will process them for you, observe the exceptions during running.

As many exceptions occur during the running and Debugging Processes, I think these exceptions are a lot of situations. I hope that interested colleagues can communicate with me to analyze and study them together.

1: Observe the errors reported in the Virtual Machine terminal and make corresponding improvements based on the errors. Because there are many associated jars, you should pay attention to the introduction when prompting you to have less corresponding packages.

2: I deploy and run it in a virtual machine, but I have read a lot of information on the Internet, saying that data can be processed directly through eclipse, but I have not succeeded in debugging, let me know who has succeeded. I feel that I am using the version number and the virtual machine may not be properly bound.

3: Java commands (Java-jar XXX. Jar) can also be executed. In this case, you do not need to install or deploy the hadoop environment. However, my java VM always prompts that the memory is insufficient during execution. I still succeeded in the hadoop environment and overall success. You can try and talk about it. Processing Data is a bit interesting.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.