Reprinted please indicate the source, Source Address: http://blog.csdn.net/lastsweetop/article/details/8964520
1. preface Android studio was shocked at the Google I/O 2013 Developer Conference. I did not expect intellij idea to become so powerful. I have always been a loyal fan of Eclipse, but I have already become a fan of intellij idea, decisive download, installation, and debugging are really awesome, but there is no hadoop plug-in, which is a little depressing. Because I recently studied hadoop, I decided to implement remote debugging by myself. All the code content is hosted on GitHub. Address: Success
The project directory is as follows:
2. Step 1: There are already a bunch of SSH configurations on the Internet. I will briefly describe how to execute ssh-keygen-T RSA.
keygen -t rsa
Will be in ~ /. Ssh/id_rsa.pub File
Remotely copy this file to the namenode node through SCP
scp ~/.ssh/id_rsa.pub hadoop@namenode:~/.ssh/
Log on to namenode
ssh hadoop@namenode
Copy the id_rsa.pub file of the development environment to authorized_keys.
cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
SSH password-less login is complete
3. Step 2: Write the deploy. Sh script.
#!/bin/shecho "deploy jar"scp ../target/styhadoop-ch2-1.0.0-SNAPSHOT.jar hadoop@namenode:~/test/echo "deploy run.sh"scp run.sh hadoop@namenode:~/test/echo "change authority"ssh hadoop@namenode "chmod 755 ~/test/run.sh"echo "start run.sh"ssh hadoop@namenode "~/test/run.sh"
Run. Sh script
#!/bin/shecho "add jar to classpath"export HADOOP_CLASSPATH=~/test/styhadoop-ch2-1.0.0-SNAPSHOT.jarecho "run hadoop task"~/hadoop/bin/hadoop com.sweetop.styhadoop.MaxTemperature input/ output/
4. Step 3: Configure Pom. XML to run the script using Maven-antrun-plugin, bind it to verify, and then run the lifecycle.
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-antrun-plugin</artifactId> <version>1.7</version> <executions> <execution> <id>hadoop remote run</id> <phase>verify</phase> <goals> <goal>run</goal> </goals> <configuration> <target name="test"> <exec dir="${basedir}/shell" executable="bash"> <arg value="deploy.sh"></arg> </exec> </target> </configuration> </execution> </executions> </plugin> </plugins> </build>
5. HDFS file preparation
[hadoop@namenode test]$hadoop fs -mkdir /user[hadoop@namenode test]$hadoop fs -mkdir /user/hadoop/[hadoop@namenode test]$hadoop fs -put input /user/hadoop/[hadoop@namenode test]$hadoop fs -lsr /usr/hadoop
6. Execution result
test: [exec] deploy jar [exec] deploy run.sh [exec] change authority [exec] start run.sh [exec] add jar to classpath [exec] run hadoop task [exec] 13/05/23 11:36:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] 13/05/23 11:36:28 INFO input.FileInputFormat: Total input paths to process : 2 [exec] 13/05/23 11:36:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library [exec] 13/05/23 11:36:28 WARN snappy.LoadSnappy: Snappy native library not loaded [exec] 13/05/23 11:36:29 INFO mapred.JobClient: Running job: job_201305032210_0003 [exec] 13/05/23 11:36:30 INFO mapred.JobClient: map 0% reduce 0% [exec] 13/05/23 11:36:46 INFO mapred.JobClient: map 100% reduce 0% [exec] 13/05/23 11:37:04 INFO mapred.JobClient: map 100% reduce 100% [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Job complete: job_201305032210_0003 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Counters: 29 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Job Counters [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Launched reduce tasks=1 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19771 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Launched map tasks=2 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Data-local map tasks=2 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13494 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: File Output Format Counters [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Bytes Written=8 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: FileSystemCounters [exec] 13/05/23 11:37:09 INFO mapred.JobClient: FILE_BYTES_READ=131296 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: HDFS_BYTES_READ=1777394 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=327106 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=8 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: File Input Format Counters [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Bytes Read=1777168 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map-Reduce Framework [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map output materialized bytes=131302 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map input records=13130 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Reduce shuffle bytes=65656 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Spilled Records=26258 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map output bytes=105032 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: CPU time spent (ms)=6030 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Total committed heap usage (bytes)=379518976 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Combine input records=0 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=226 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Reduce input records=13129 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Reduce input groups=1 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Combine output records=0 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Physical memory (bytes) snapshot=469196800 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Reduce output records=1 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1723944960 [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Map output records=13129