The first Map/Reduce Program

Source: Internet
Author: User
Tags hadoop fs
After a development environment is set up on a MAC, the first thing to do is to find a helloworld program to practice. The helloword program in the hadoop world is the WordCount program below. 1. Create a project: The project name of FileNewOtherMapReduceProject can be obtained at will, such as MapReduceSample. Create WordCount. ja

After a development environment is set up on a MAC, the first thing to do is to find a hello world Program to practice. The hello word program in the hadoop world is the following Word Count program. 1. Create a Project: The Project name of the FileNewOtherMap/Reduce Project can be obtained at will, such as MapReduceSample. Create WordCount. ja

After a development environment is set up on a MAC, the first thing to do is to find a hello world Program to practice. The hello word program in the hadoop world is the following Word Count program.

1. Create a project

Step: File-> New-> Other-> Map/Reduce Project

Project names can be retrieved as needed, such as MapReduceSample. Create a new WordCount. java class with the following code:

Package com. lifeware. test;
Import java. io. IOException;
Import java. util .*;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. conf .*;
Import org. apache. hadoop. io .*;
Import org. apache. hadoop. mapred .*;
Import org. apache. hadoop. util .*;
Public class WordCount {

Public static class Map extends MapReduceBase implements Mapper {
Private final static IntWritable one = new IntWritable (1 );
Private Text word = new Text ();

? ? ? ? Public void map (LongWritable key, Text value, OutputCollector Output, Reporter reporter) throws IOException {
String line = value. toString ();
StringTokenizer tokenizer = new StringTokenizer (line );
While (tokenizer. hasMoreTokens ()){
Word. set (tokenizer. nextToken ());
Output. collect (word, one );
}
}
}

Public static class Reduce extends MapReduceBase implements Reducer {
Public void reduce (Text key, Iterator Values, OutputCollector Output, Reporter reporter) throws IOException {
Int sum = 0;
While (values. hasNext ()){
Sum + = values. next (). get ();
}
Output. collect (key, new IntWritable (sum ));
}
}
/**
* @ Param args
* @ Throws IOException
*/
Public static void main (String [] args) throws IOException {
// TODO Auto-generated method stub
JobConf conf = new JobConf (WordCount. class );
Conf. setJobName ("wordcount ");

? ? Conf. setOutputKeyClass (Text. class );
Conf. setOutputValueClass (IntWritable. class );
Conf. setMapperClass (Map. class );
Conf. setCombinerClass (Reduce. class );
Conf. setReducerClass (Reduce. class );

? ? Conf. setInputFormat (TextInputFormat. class );
Conf. setOutputFormat (TextOutputFormat. class );

? ? FileInputFormat. setInputPaths (conf, new Path (args [0]);
FileOutputFormat. setOutputPath (conf, new Path (args [1]);

? ? JobClient. runJob (conf );
}
}

2. Data Preparation

To run the program, we need an Input and Output Folder respectively. Output Folder, which is automatically generated after the program runs successfully. We need to input a folder for the program transmitter.

2.1 .? Prepare local files

Create a folder input in the current project directory, and create two files file1 and file2 under the folder. The content of these two files is as follows:

? File1 :? ? Hello World Bye World
File2 :? ? ? Hello Hadoop Goodbye Hadoop

2.2. Upload the folder input to the Distributed File System?

In the Hadoop daemon terminal that has started cd to the hadoop installation directory, run the following command:

Bin/hadoop fs-put ../test/input

After uploading the input Folder to the hadoop file system, an input Folder is added to the system. You can run the following command to view the folder:

Bin/hadoop fs-ls

You can also use the Eclipse plug-in to view the DFS Locations display:

3. Run the project

3.1 .? In the newly created Project MapReduceSample, right-click WordCount. java and choose Run As> Run deployments.

3.2. In the pop-up Run deployments dialog box, click Java Application, right-click-> New, and a New application named WordCount will be created.

3.3 .? Configure the running parameters, click Arguments, and enter "the Input Folder you want to pass to the Program and the folder you want the Program to save the computing result" in Program arguments, for example:

Hdfs: // localhost: 9000/user/metaboy/input hdfs: // localhost: 9000/user/metaboy/output

The input here is the file you just uploaded to the folder. You can enter the folder address as needed.

4. Run the program

Click Run to Run the program. After a period of time, the running is completed. After the running is completed, Run the following command on the terminal:

? ? ? Bin/hadoop fs-ls

You can also use the hadoop eclipse plug-in to check whether the folder output is generated.

? 5. view results

Run the following command to view the generated file content:

? ? Bin/hadoop fs-cat output /*

After running this program, it is basically a step into the Hadoop family!

Original article address: the first Map/Reduce program. Thank you for sharing it with the original author.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.