Before using the Hadoop streaming environment to write Python programs, the following summarizes the editing Java Eclipse Environment Configuration Summary, and a wordcount example run.
Download Eclipse installation package and Hadoop plugin
1 go to the official website to download the Linux version of the Eclipse installation package (or in my convenience for everyone to download, uploaded to the csdn download, url:
2 Download plugin: Hadoop-eclipse-plugin-2.6.0.jar
Two-install Elicpse and Hadoop plugins
1 extract eclipse to path/user/local/eclipse
2 Copy the plugin Hadoop-eclipse-plugin-2.6.0.jar to the eclipse path:/user/local/eclipse/plugins/hadoop-eclipse-plugin-2.6.0.jar
3 Start Eclipse
./user/local/eclipse/eclipse-clean
Three configuring Eclipse's Hadoop environment
1 Select Preference under the Window menu
Configure the Hadoop path:/usr/local/hadoop:
2 Toggle Map/reduce Development view. Select the Open perspective-other-> map/reduce option under the Window menu to switch.
3 Establish a connection to the Hadoop cluster. Click the Map/reduce Locations panel in the lower-right corner of the Eclipse software, right-clicking in the Panel, select New Hadoop location
4 View effect, so there is a benefit is to visualize the file system, or can only enter the command to view, but I still think the input command is better, use it together. The visual file system effect is as follows:
Four WordCount example Run
1 Create a project: Click the File menu, select New Project, select Map/reduce Project, click Next, fill out project name as WordCount, and click Finish to create the project.
2 Create class classes: Then right-click on the WordCount project you just created, choose New-Class, fill in two places: Fill out org.apache.hadoop.examples at the package, fill in the Name Wordcou Nt.
3 Fill Code:
Package org.apache.hadoop.examples; import java.io.ioexception;import java.util.StringTokenizer; import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import org.apache.hadoop.io.IntWritable; Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper ; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.util.GenericOptionsParser; Public classWordCount { Public Static classTokenizermapper extends Mapper<object, text, text, intwritable>{ PrivateFinalStaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); Public voidmap (Object key, Text value, Context context) throws IOException, interruptedexception {stri Ngtokenizer ITR=NewStringTokenizer (value.tostring ()); while(Itr.hasmoretokens ()) {Word.Set(Itr.nexttoken ()); Context.write (Word, one); } } } Public Static classIntsumreducer extends Reducer<Text,IntWritable,Text,IntWritable> { Privateintwritable result =Newintwritable (); Public voidReduce (Text key, iterable<intwritable>values, Context context) throws IOException, Interruptedexception { intsum =0; for(intwritable val:values) {sum+ = val.Get(); } result.Set(sum); Context.write (key, result); } } Public Static voidMain (string[] args) throws Exception {Configuration conf=NewConfiguration (); String[] Otherargs=Newgenericoptionsparser (conf, args). Getremainingargs (); if(Otherargs.length! =2) {System.err.println ("usage:wordcount <in> <out>"); System.exit (2); } Job Job=NewJob (Conf,"Word Count"); Job.setjarbyclass (WordCount.class); Job.setmapperclass (tokenizermapper.class); Job.setcombinerclass (intsumreducer.class); Job.setreducerclass (intsumreducer.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); Fileinputformat.addinputpath (Job,NewPath (otherargs[0])); Fileoutputformat.setoutputpath (Job,NewPath (otherargs[1])); System.exit (Job.waitforcompletion (true) ?0:1); }}
4 before running the terminal enter the following command in order to modify the default local system by configuration file for the Hadoop file system and do not output a warning;
Cp/usr/local/hadoop/etc/hadoop/core-site.xml ~/workspace/wordcount//usr/local/hadoop/etc/hadoop/ Hdfs-site.xml ~/workspace/wordcount//usr/local/hadoop/etc/hadoop/log4j.properties ~/workspace/WordCount /src
5 set parameters, inputs and outputs. In particular , this input and output are actual file system paths, specifically/user/hadoop/input and/user/hadoop/output
6 output in the file system, view the results
Reference: http://www.powerxing.com/hadoop-build-project-using-eclipse/pictures from this blog, too troublesome
Hadoop:mapreduce Programming-wordcount count Words-eclipse-java Environment