Debug hadoop remotely in intellij idea

Last Update:2018-12-03 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprinted please indicate the source, Source Address: http://blog.csdn.net/lastsweetop/article/details/8964520

1. preface Android studio was shocked at the Google I/O 2013 Developer Conference. I did not expect intellij idea to become so powerful. I have always been a loyal fan of Eclipse, but I have already become a fan of intellij idea, decisive download, installation, and debugging are really awesome, but there is no hadoop plug-in, which is a little depressing. Because I recently studied hadoop, I decided to implement remote debugging by myself. All the code content is hosted on GitHub. Address: Success
The project directory is as follows:
2. Step 1: There are already a bunch of SSH configurations on the Internet. I will briefly describe how to execute ssh-keygen-T RSA.

keygen -t rsa

Will be in ~ /. Ssh/id_rsa.pub File

Remotely copy this file to the namenode node through SCP

scp ~/.ssh/id_rsa.pub hadoop@namenode:~/.ssh/

Log on to namenode

ssh hadoop@namenode

Copy the id_rsa.pub file of the development environment to authorized_keys.

cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

SSH password-less login is complete

3. Step 2: Write the deploy. Sh script.

#!/bin/shecho "deploy jar"scp ../target/styhadoop-ch2-1.0.0-SNAPSHOT.jar hadoop@namenode:~/test/echo "deploy run.sh"scp run.sh hadoop@namenode:~/test/echo "change authority"ssh hadoop@namenode "chmod 755 ~/test/run.sh"echo "start run.sh"ssh hadoop@namenode "~/test/run.sh"

Run. Sh script

#!/bin/shecho "add jar to classpath"export HADOOP_CLASSPATH=~/test/styhadoop-ch2-1.0.0-SNAPSHOT.jarecho "run hadoop task"~/hadoop/bin/hadoop com.sweetop.styhadoop.MaxTemperature   input/  output/

4. Step 3: Configure Pom. XML to run the script using Maven-antrun-plugin, bind it to verify, and then run the lifecycle.

<build>        <plugins>            <plugin>                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-antrun-plugin</artifactId>                <version>1.7</version>                <executions>                    <execution>                        <id>hadoop remote run</id>                        <phase>verify</phase>                        <goals>                            <goal>run</goal>                        </goals>                        <configuration>                            <target name="test">                                <exec dir="${basedir}/shell" executable="bash">                                     <arg value="deploy.sh"></arg>                                </exec>                            </target>                        </configuration>                    </execution>                </executions>            </plugin>        </plugins>    </build>

5. HDFS file preparation

[hadoop@namenode test]$hadoop fs -mkdir /user[hadoop@namenode test]$hadoop fs -mkdir /user/hadoop/[hadoop@namenode test]$hadoop fs -put input /user/hadoop/[hadoop@namenode test]$hadoop fs -lsr /usr/hadoop

6. Execution result

test:     [exec] deploy jar     [exec] deploy run.sh     [exec] change authority     [exec] start run.sh     [exec] add jar to classpath     [exec] run hadoop task     [exec] 13/05/23 11:36:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.     [exec] 13/05/23 11:36:28 INFO input.FileInputFormat: Total input paths to process : 2     [exec] 13/05/23 11:36:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library     [exec] 13/05/23 11:36:28 WARN snappy.LoadSnappy: Snappy native library not loaded     [exec] 13/05/23 11:36:29 INFO mapred.JobClient: Running job: job_201305032210_0003     [exec] 13/05/23 11:36:30 INFO mapred.JobClient:  map 0% reduce 0%     [exec] 13/05/23 11:36:46 INFO mapred.JobClient:  map 100% reduce 0%     [exec] 13/05/23 11:37:04 INFO mapred.JobClient:  map 100% reduce 100%     [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Job complete: job_201305032210_0003     [exec] 13/05/23 11:37:09 INFO mapred.JobClient: Counters: 29     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:   Job Counters      [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Launched reduce tasks=1     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=19771     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Launched map tasks=2     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Data-local map tasks=2     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13494     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:   File Output Format Counters      [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Bytes Written=8     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:   FileSystemCounters     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     FILE_BYTES_READ=131296     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     HDFS_BYTES_READ=1777394     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=327106     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=8     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:   File Input Format Counters      [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Bytes Read=1777168     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:   Map-Reduce Framework     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Map output materialized bytes=131302     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Map input records=13130     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Reduce shuffle bytes=65656     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Spilled Records=26258     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Map output bytes=105032     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     CPU time spent (ms)=6030     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Total committed heap usage (bytes)=379518976     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Combine input records=0     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     SPLIT_RAW_BYTES=226     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Reduce input records=13129     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Reduce input groups=1     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Combine output records=0     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Physical memory (bytes) snapshot=469196800     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Reduce output records=1     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1723944960     [exec] 13/05/23 11:37:09 INFO mapred.JobClient:     Map output records=13129

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More