Mapreduce program deployment

Last Update:2014-08-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Although we can run some encapsulated instance programs very quickly through shell commands on the Virtual Machine Client, in the application, we still need to write code and deploy it to the server. below, I will talk about the deployment process of a program through the program.

After hadoop is started, the program can be run into a jar package, and the corresponding third-party jar package is included. Run hadoop jar XXX. + driver name.

package com.mapred;import java.io.IOException;import java.io.PrintStream;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Mapper.Context;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount{  public static void main(String[] args)    throws Exception  {    Configuration conf = new Configuration();  /*  String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();    if (otherArgs.length != 2) {      System.err.println("Usage: wordcount <in> <out>");      System.exit(2);    }*/    Job job = new Job(conf, "word count");    job.setJarByClass(WordCount.class);    FileInputFormat.addInputPath(job, new Path("hdfs://ubuntu:9000/Input"));    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);       FileOutputFormat.setOutputPath(job, new Path("hdfs://ubuntu:9000/output09"));    job.waitForCompletion(true);  }  public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>  {    private IntWritable result;    public IntSumReducer()    {      this.result = new IntWritable();    }    public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException    {      int sum = 0;      for (IntWritable val : values) {        sum += val.get();      }      this.result.set(sum);      context.write(key, this.result);    }  }  public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>  {    private static final IntWritable one = new IntWritable(1);    private Text word;    public TokenizerMapper()    {      this.word = new Text();    }    public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {      StringTokenizer itr = new StringTokenizer(value.toString());      while (itr.hasMoreTokens()) {        this.word.set(itr.nextToken());        context.write(this.word, one);      }    }  }}

Pay attention to the following items during the running process:

The first thing to note is whether the file location on HDFS is correct. Remember that you only need to specify the directory name. How many detailed files are in it, and hadoop will process them for you, observe the exceptions during running.

As many exceptions occur during the running and Debugging Processes, I think these exceptions are a lot of situations. I hope that interested colleagues can communicate with me to analyze and study them together.

1: Observe the errors reported in the Virtual Machine terminal and make corresponding improvements based on the errors. Because there are many associated jars, you should pay attention to the introduction when prompting you to have less corresponding packages.

2: I deploy and run it in a virtual machine, but I have read a lot of information on the Internet, saying that data can be processed directly through eclipse, but I have not succeeded in debugging, let me know who has succeeded. I feel that I am using the version number and the virtual machine may not be properly bound.

3: Java commands (Java-jar XXX. Jar) can also be executed. In this case, you do not need to install or deploy the hadoop environment. However, my java VM always prompts that the memory is insufficient during execution. I still succeeded in the hadoop environment and overall success. You can try and talk about it. Processing Data is a bit interesting.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mapreduce program deployment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Mapreduce program deployment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support