How does hadoop execute its own mapreduce program?

Source: Internet
Author: User
Tags hadoop fs

For example, we have written a mapred program as follows:

package com.besttone.mapred;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class SingleWordCount {public static class SingleWordCountMapper extendsMapper<Object, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(Object key, Text value, Context context)throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());String keyword = context.getConfiguration().get("word");while (itr.hasMoreTokens()) {String nextkey = itr.nextToken();if (nextkey.trim().equals(keyword)) {word.set(nextkey);context.write(word, one);}}}}public static class SingleWordCountReducer extendsReducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {// TODO Auto-generated method stubint sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}/** * @param args * @throws IOException */public static void main(String[] args) throws Exception {// TODO Auto-generated method stubConfiguration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length != 3) {System.err.println("Usage: singlewordcount <in> <out> <word>");System.exit(2);}conf.set("word", otherArgs[2]);Job job = new Job(conf, "single word count");job.setJarByClass(SingleWordCount.class);job.setMapperClass(SingleWordCountMapper.class);job.setCombinerClass(SingleWordCountReducer.class);job.setReducerClass(SingleWordCountReducer.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}}

This mapred program is used to count the number of specified words.
Then we can package this class into jar, for example, named myexample. jar. Copy the file to the remote hadoop_home directory. For example, to count the number of "hello" Words in the input directory, run bin/hadoop jar myexample. jar COM. besttone. mapred. singlewordcount HDFS: // master: 9000/user/hadoop/input/* HDFS: // master: 9000/user/hadoop/output hello.

The other method is to write a driver program:

/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements.  See the NOTICE file * distributed with this work for additional information * regarding copyright ownership.  The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License.  You may obtain a copy of the License at * *     http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package com.besttone.mapred;import org.apache.hadoop.util.ProgramDriver;/** * A description of an example program based on its class and a  * human-readable description. */public class MapRedDriver {    public static void main(String argv[]){    int exitCode = -1;    ProgramDriver pgd = new ProgramDriver();    try {      pgd.addClass("singlewordcount", SingleWordCount.class,                    "A map/reduce program that counts the words in the input files.");           pgd.driver(argv);            // Success      exitCode = 0;    }    catch(Throwable e){      e.printStackTrace();    }        System.exit(exitCode);  }}

Then re-package the jar with the above class, double-click the JAR file to open, modify the manifest. MF file under the META-INF is as follows:

Manifest-version: 1.0
Ant-version: Apache ant 1.7.1
Created-by: 20.6-B01 (Sun Microsystems Inc .)
Main-class: COM/besttone/mapred/mapreddriver

Set main-class to the full path name of the driver, and copy the jar package to the hadoop_home directory. At this time, you do not need to write the full path name of mapred, but use the alias defined in the driver:

Bin/hadoop jar myexample. Jar singlewordcount HDFS: // master: 9000/user/hadoop/input/* HDFS: // master: 9000/user/hadoop/output hello.

 

The permission to execute directory files in hadoop mapred may occur during execution.
The error message is as follows:

Job submission failed with exception 'java. io. ioexception (the ownership/permissions on the staging directory/tmp/hadoop-hadoop-user1/mapred/staging/hadoop-user1 /. staging is not as expected. it is owned by hadoop-user1 and permissions are rwxrwxrwx. the
Directory must be owned by the submitter hadoop-user1 or by hadoop-user1 and permissions must be rwx ------)

Modify permissions:

Bin/hadoop FS-chmod-r700/home/hadoop/tmp

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.