First, Introduction
After writing the MapReduce task, it was always packaged and uploaded to the Hadoop cluster, then started the task through the shell command, then looked at the log log file on each node, and later to improve the development efficiency, You need to find a direct maprreduce task directly to the Hadoop cluster via ecplise. This section describes how users can finally complete the Eclipse price increase task to the MapReduce cluster from Eclipse's compressed package.
Second, detailed
1. Install Eclipse, install the Hadoop plugin
(1) Download the Eclipse tarball First, then download the Ecplise plugin from the Hadoop 2.7.1 and some other required files in the setup environment, then unzip the ecplise and place it on the D drive.
(2) Place the Hadoop-ecplise-plugin.jar plugin in the downloaded resource into the ecplise plugin directory: D:\ecplise\plugins\. Then turn on ecplise.
(3) Unzip the Hadoop-2.7.1 into the D drive, configure the appropriate environment variables, and add the%hadoop_home%\bin file to the PATH environment
(4) Then choose to configure the Hadoop plugin in ecplise:
A, Window---->show View-----> Other, where the MapReduce tool is selected
B:window---->perspective------>open Perspective-----> Othrer
C:window----> perferences----> Hadoop map/reduce, then select the Hadoop file that you just unzipped
D. Configure HDFS Connection: Create a new MapReduce connection in the MapReduce view
When this is done, we'll be able to see DFS in the package exploer and then flush with the files on the HDFs:
2. Development of MapReduce
(1) Unzip the Hadoopbin.zip in the Hadoop-ecplise folder, you will get the following files, put them into the Hadoop_home\bin directory, and then put the Hadoop.dll file into the C:\Window\ In the System32 folder
(2) Download from cluster: Log4j.properties,core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml these five files. Then write a wordcount example, and then put the five files into the SRC folder:
(3) Modify mapred-site.xml and Yarn-site.xml files
A, add A few KeyValue key values on the Mapred-site.xml:
< property><name>mapred.remote.os</name><value>linux</value></property>< Property><name>mapreduce.app-submission.cross-platform</name><value>true</value> </property>
<property><name>mapreduce.application.classpath</name><value>/home/hadoop/hadoop/ Hadoop-2.7.1/etc/hadoop,/home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/common/*,/home/hadoop/hadoop/hadoop- 2.7.1/share/hadoop/common/lib/*,/home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/hdfs/*,/home/hadoop/hadoop/h adoop-2.7.1/share/hadoop/hdfs/lib/*,/home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/*,/home/hadoop /hadoop/hadoop-2.7.1/share/hadoop/mapreduce/lib/*,/home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/yarn/*,/ho Me/hadoop/hadoop/hadoop-2.7.1/share/hadoop/yarn/lib/*</value></property>
B, add a parameter to the Yarn-site.xml file:
<property><name>yarn.application.classpath</name><value>/home/hadoop/hadoop/ Hadoop-2.7.1/etc/hadoop, /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/common/*, /home/hadoop/hadoop/ hadoop-2.7.1/share/hadoop/common/lib/*, /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/hdfs/*, /home/ hadoop/hadoop/hadoop-2.7.1/share/hadoop/hdfs/lib/*, /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/ mapreduce/*, /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/lib/*, /home/hadoop/hadoop/ hadoop-2.7.1/share/hadoop/yarn/*, /home/hadoop/hadoop/hadoop-2.7.1/share/hadoop/yarn/lib/*</value> </property>
It is necessary to explain that before Hadoop2.6, because its source code is adapted to the environment in the Linux operating system to represent the symbol $, and when the code is used under window, because the variables between the two systems are not the same, it causes the following error
Before Hadoop2.6 need to change the source code after the jar package to replace the old jar package file, the specific process please see the following blog:
Http://www.aboutyun.com/thread-8498-1-1.html
Here we modify the two parameters of Mapreduce.application.classpath and yarn.application.classpath to change them to absolute paths so that the above errors are not present.
(3) Start WordCount function:
Package Wc;import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.classification.interfaceaudience.public;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import org.apache.hadoop.io.LongWritable; Import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.Mapper ; Import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import Org.apache.hadoop.mapreduce.lib.input.textinputformat;import Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;import Org.apache.hadoop.mapreduce.lib.output.textoutputformat;import Org.apache.hadoop.record.compiler.JBoolean; public class Wcmapreduce {public static void main (string[] args) throws IOException, ClassNotFoundException, Interruptede Xception{configuration conf=new Configuration (); Job job=job.getinstance (conf); Job.setjobname ("word count"); Job.setjarbyclass (WcmapReduce.class); Job.setjar ("E:\\ecplise\\wc.jar");//configuration Task map and reduce class Job.setmapperclass (Wcmap.class); Job.setreducerclass (Wcreduce.class);//Output Type Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass ( Intwritable.class);//File Format Job.setinputformatclass (textinputformat.class); Job.setoutputformatclass ( Textoutputformat.class)///Set Output input path Fileinputformat.addinputpath (job,new path ("hdfs://192.98.12.234:9000/test/")) ; Fileoutputformat.setoutputpath (Job, New Path ("Hdfs://192.98.12.234:9000/result"));//Start Task Job.waitforcompletion ( true);} public static class Wcmap extends Mapper<longwritable, text, text, intwritable>{private static text outkey=new text ( );p rivate static intwritable outvalue=new intwritable (1); @Overrideprotected void map (longwritable key, Text value, mapper<longwritable, text, text, Intwritable> Context context) throws IOException, interruptedexception {//TODO auto-generated method stubstring words=value.tostring (); StringTokenizer tokenizer=new StringTokenizer (words, "\\s"); while (tOkenizer.hasmoretokens ()) {String word=tokenizer.nexttoken (); Outkey.set (word); Context.write (Outkey, OutValue);}}} public static class Wcreduce extends Reducer<text, intwritable, Text, intwritable>{private static intwritable Outva Lue=new intwritable (); @Overrideprotected void Reduce (text arg0, iterable<intwritable> arg1,reducer<text, intwritable, text, Intwritable>. Context arg2) throws IOException, interruptedexception {//TODO auto-generated method Stubint sum=0;for (intwritable i:arg 1) {sum+=i.get ();} Outvalue.set (sum); Arg2.write (Arg0,outvalue);}}
It is important to note that because the remote commit method is implemented here, the task's jar package needs to be sent to the cluster at remote submission, but the ecplise does not come with this framework, so it is necessary to put the jar in the appropriate file first, and then in the program, specify the location of the jar through the downstream code.
Job.setjar ("E:\\ecplise\\wc.jar");
(4) Configure the user environment variables for the submit task:
If the user name on Windows is different from the name of the user who started the cluster on Linux, you need to add an environment variable to implement the task's commit:
(5) Operation result
16/03/30 21:09:14 INFO Client. Rmproxy:connecting to ResourceManager at Hadoop1/192.98.12.234:803216/03/30 21:09:14 WARN MapReduce. Jobresourceuploader:hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with Toolrunner to remedy This.16/03/30 21:09:14 INFO input. Fileinputformat:total input paths to Process:116/03/30 21:09:14 INFO MapReduce. Jobsubmitter:number of Splits:116/03/30 21:09:15 INFO MapReduce. Jobsubmitter:submitting tokens for Job:job_1459331173846_003116/03/30 21:09:15 INFO impl. yarnclientimpl:submitted application Application_1459331173846_003116/03/30 21:09:15 INFO mapreduce. Job:the URL to track the Job:http://hadoop1:8088/proxy/application_1459331173846_0031/16/03/30 21:09:15 INFO mapreduce . Job:running job:job_1459331173846_003116/03/30 21:09:19 INFO mapreduce. Job:job job_1459331173846_0031 running in Uber mode:false16/03/30 21:09:19 INFO MapReduce. Job:map 0% reduce 0%16/03/30 21:09:24 INFO MapReduce. Job:map 100% reduce 0%16/03/30 21:09:28 INFO mapreduce. Job:map 100% reduce 100%16/03/30 21:09:29 INFO mapreduce. Job:job job_1459331173846_0031 completed Successfully16/03/30 21:09:29 INFO mapreduce. Job:counters:49file System Countersfile:number of bytes Read=19942file:number of bytes Written=274843file:number of R EAD Operations=0file:number of Large read operations=0file:number of write operations=0hdfs:number of bytes read=15533h Dfs:number of bytes Written=15671hdfs:number of read Operations=6hdfs:number of large read operations=0hdfs:number of Write Operations=2job Counters launched map tasks=1launched reduce tasks=1data-local map tasks=1total time spent by all MA PS in occupied slots (ms) =9860total time spent by all reduces in occupied slots (ms) =2053total time spent by all map tasks (ms) =2465total time spent by all reduce tasks (ms) =2053total Vcore-seconds taken by all map Tasks=2465total vcore-seconds Taken by all reduce tasks=2053total megabyte-seconds taken by all map Tasks=10096640total Megabyte-seconds taken by all reduce tasks=2102272map-reduce frameworkmap input Records=289Map Output Records=766map output Bytes=18404map output materialized bytes=19942input split Bytes=104combine input records=0c Ombine output Records=0reduce input groups=645reduce Shuffle bytes=19942reduce input records=766reduce output records= 645Spilled records=1532shuffled Maps =1failed shuffles=0merged Map outputs=1gc time Elapsed (ms) =33CPU time spent (ms) =107 0Physical memory (bytes) snapshot=457682944virtual memory (bytes) Snapshot=8013651968total committed heap usage (bytes) = 368050176Shuffle errorsbad_id=0connection=0io_error=0wrong_length=0wrong_map=0wrong_reduce=0file Input Format Counters Bytes read=15429file Output Format Counters Bytes written=15671
Because the MapReduce task configures the 5 files under the SRC file, the task is started locally. When the task is executed locally, the name of the task appears in the local, and the task name does not appear in the above list, so the task is successfully submitted to the Linux cluster
Eclipse commits a MapReduce task to a Hadoop cluster remotely