After a development environment is set up on a MAC, the first thing to do is to find a helloworld program to practice. The helloword program in the hadoop world is the WordCount program below. 1. Create a project: The project name of FileNewOtherMapReduceProject can be obtained at will, such as MapReduceSample. Create WordCount. ja
After a development environment is set up on a MAC, the first thing to do is to find a hello world Program to practice. The hello word program in the hadoop world is the following Word Count program. 1. Create a Project: The Project name of the FileNewOtherMap/Reduce Project can be obtained at will, such as MapReduceSample. Create WordCount. ja
After a development environment is set up on a MAC, the first thing to do is to find a hello world Program to practice. The hello word program in the hadoop world is the following Word Count program.
1. Create a project
Step: File-> New-> Other-> Map/Reduce Project
Project names can be retrieved as needed, such as MapReduceSample. Create a new WordCount. java class with the following code:
Package com. lifeware. test;
Import java. io. IOException;
Import java. util .*;
Import org. apache. hadoop. fs. Path;
Import org. apache. hadoop. conf .*;
Import org. apache. hadoop. io .*;
Import org. apache. hadoop. mapred .*;
Import org. apache. hadoop. util .*;
Public class WordCount {
Public static class Map extends MapReduceBase implements Mapper {
Private final static IntWritable one = new IntWritable (1 );
Private Text word = new Text ();
? ? ? ? Public void map (LongWritable key, Text value, OutputCollector Output, Reporter reporter) throws IOException {
String line = value. toString ();
StringTokenizer tokenizer = new StringTokenizer (line );
While (tokenizer. hasMoreTokens ()){
Word. set (tokenizer. nextToken ());
Output. collect (word, one );
}
}
}
Public static class Reduce extends MapReduceBase implements Reducer {
Public void reduce (Text key, Iterator Values, OutputCollector Output, Reporter reporter) throws IOException {
Int sum = 0;
While (values. hasNext ()){
Sum + = values. next (). get ();
}
Output. collect (key, new IntWritable (sum ));
}
}
/**
* @ Param args
* @ Throws IOException
*/
Public static void main (String [] args) throws IOException {
// TODO Auto-generated method stub
JobConf conf = new JobConf (WordCount. class );
Conf. setJobName ("wordcount ");
? ? Conf. setOutputKeyClass (Text. class );
Conf. setOutputValueClass (IntWritable. class );
Conf. setMapperClass (Map. class );
Conf. setCombinerClass (Reduce. class );
Conf. setReducerClass (Reduce. class );
? ? Conf. setInputFormat (TextInputFormat. class );
Conf. setOutputFormat (TextOutputFormat. class );
? ? FileInputFormat. setInputPaths (conf, new Path (args [0]);
FileOutputFormat. setOutputPath (conf, new Path (args [1]);
? ? JobClient. runJob (conf );
}
}
2. Data Preparation
To run the program, we need an Input and Output Folder respectively. Output Folder, which is automatically generated after the program runs successfully. We need to input a folder for the program transmitter.
2.1 .? Prepare local files
Create a folder input in the current project directory, and create two files file1 and file2 under the folder. The content of these two files is as follows:
? File1 :? ? Hello World Bye World
File2 :? ? ? Hello Hadoop Goodbye Hadoop
2.2. Upload the folder input to the Distributed File System?
In the Hadoop daemon terminal that has started cd to the hadoop installation directory, run the following command:
Bin/hadoop fs-put ../test/input
After uploading the input Folder to the hadoop file system, an input Folder is added to the system. You can run the following command to view the folder:
Bin/hadoop fs-ls
You can also use the Eclipse plug-in to view the DFS Locations display:
3. Run the project
3.1 .? In the newly created Project MapReduceSample, right-click WordCount. java and choose Run As> Run deployments.
3.2. In the pop-up Run deployments dialog box, click Java Application, right-click-> New, and a New application named WordCount will be created.
3.3 .? Configure the running parameters, click Arguments, and enter "the Input Folder you want to pass to the Program and the folder you want the Program to save the computing result" in Program arguments, for example:
Hdfs: // localhost: 9000/user/metaboy/input hdfs: // localhost: 9000/user/metaboy/output
The input here is the file you just uploaded to the folder. You can enter the folder address as needed.
4. Run the program
Click Run to Run the program. After a period of time, the running is completed. After the running is completed, Run the following command on the terminal:
? ? ? Bin/hadoop fs-ls
You can also use the hadoop eclipse plug-in to check whether the folder output is generated.
? 5. view results
Run the following command to view the generated file content:
? ? Bin/hadoop fs-cat output /*
After running this program, it is basically a step into the Hadoop family!
Original article address: the first Map/Reduce program. Thank you for sharing it with the original author.