Hands-on instructions on how to build the hadoop2.5.1 + eclipse development and debugging environment (2)

Last Update:2018-05-29 Source: Internet

Author: User

Tags hadoop mapreduce hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears! 4. For eclipse, see eclipse again. I don't want to talk about it as long as I am not familiar with java, so I will not talk about it any more. Ask www.eclipse.org for everything. Note that eclipse

In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears! 4, eclipse, see eclipse this I want to do as long as java is not unfamiliar, so I will not say more, everything to ask for http://www.eclipse.org. Note that eclipse

In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears!

4. See eclipse again.

This I want to do as long as java is not unfamiliar, so I will not say more, everything to ask for http://www.eclipse.org.

Note: The eclipse environment here is installed in the Virtual Machine. Do not install it in the wrong place!

5. Install the maven Environment

Download maven3 from maven.apache.org and decompress it to/home (because/home is generally a data disk, which does not occupy the space of the System Disk ). Configuration ~ /. Bash_profile, modify PATH

PATH = $ PATH:/home/maven3/bin/

Then source ~ /. Bash_profile to make the environment variable take effect.

Run mvn-version to check whether the configuration is successful.

Because

Ii. debug mapreduce code

1. Create a simple Hadoop MapReduce example

First, create a maven project in eclipse and use a simple template to modify pom. xml to add dependencies:

     
  
   org.apache.hadoop
      hadoop-client    
  
   2.5.1

Then refer. Of course, he used hadoop-eclipse-plugin. This plug-in is very complicated to configure. It is not recommended that you directly use the command line method, not only can you exercise the level of proficiency in hadoop commands, it saves a lot of trouble. I used his first Dedup example, two input file samples and a Dedup class for ease of use.

Use bin/hadoop fs-mkdir/user/root/dedup_in to create the directory on HDFS.

The two sample data can be stored in two TXT files, such as in.txtand in1.txt. The names are randomly stored in the/home/temp/directory, and then uploaded to HDFS using bin/hadoop dfs-copyFromLocal/home/temp/user/root/dedup_in, data preparation is complete.

Execute mvn install on the command line to compile and package the program to the jar, and then input:

Bin/hadoop jar project directory/target/xxx. jar [package]. Dedup

You can run a mapreduce, of course, you need to start hadoop with sbin/start-all.sh.

2. Modify the Dedup class

To debug mapreduce locally, we also need to modify the Dedup class, mainly to modify the main function, and add several lines of code after Configuration conf = new Configuration:

Configuration conf = new Configuration (); conf. addResource ("classpath: mapred-site.xml"); conf. set ("fs. defaultFS "," hdfs: // Virtual Machine IP: 8020 "); conf. set ("mapreduce. framework. name "," yarn "); conf. set ("yarn. resourcemanager. address "," Virtual Machine IP address: 8032 "); conf. set ("mapred. remote. OS "," Linux "); conf. set ("hadoop. job. ugi "," hadoop, hadoop ");

3. Create a local runtime configuration

Add a mapred-site.xml file under src/main/resources to configure the runtime hadoop parameters:

     
          
   
    mapreduce.framework.name
           
   
    yarn
       
      
          
   
    mapred.child.java.opts
           
   
    -Xmx800m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000
       
      
          
   
    mapreduce.jobtracker.staging.root.dir
           
   
    /tmp
       
      
          
   
    yarn.app.mapreduce.am.staging-dir
           
   
    /tmp
       
      
          
   
    mapreduce.framework.name
           
   
    local
       
      
          
   
    mapreduce.jobtracker.address
           
   
    local
       
      
          
   
    mapred.job.tracker
           
   
    local

Note:-Xmx800m-Xdebug-Xrunjdwp: transport = dt_socket, server = y, suspend = y, address = 8000 address = 8000 can be used to configure an unused port number, in case this port is occupied.

4. Add a LocalJob class

This class is used to help hadoop-mapreduce-client release jar in local mode. The Code is as follows:

Import java. io. IOException; import org. apache. hadoop. conf. configuration; import org. apache. hadoop. mapred. jobConf; import org. apache. hadoop. mapreduce. job;/*** @ author root **/public class LocalJob extends Job {public LocalJob () throws IOException {super ();} public LocalJob (Configuration conf, String jobName) throws IOException {super (conf, jobName);} public LocalJob (Configuration conf) throws IOException {super (conf);} public static LocalJob getInstance (Configuration conf, String jobName) throws IOException {JobConf jobConf = new JobConf (conf); LocalJob job = new LocalJob (jobConf); return job;} public void setJarByClass (Class
 Clazz) {super. setJarByClass (clazz); conf. setJar ("file: // project directory/target/xxx. jar ");}}

Modify Job job = new Job (conf, "Data Deduplication") in the main function of Dedup class as follows:

Job job = LocalJob.getInstance(conf, "Data Deduplication");

5. Add a Remote Java Application instance debug8000 In the Debug configuration of eclipse and set the Remote port number to 8000, which is the one we configured in the mapred-site.xml. Then execute mvn install once in the command line to compile the jar file.

6. Set the breakpoint in the map or reduce code of the Dedup class and Run the Dedup class in the form of application. Note that it is not the Debug method, but the Run method. If the dedup_out directory already exists, use bin/hadoop fs-rm-r/user/root/dedup_out to delete the output directory.

7. If you see that the Dedup class is started normally and no error is reported, but it cannot be stopped, do not be surprised because the breakpoint you just set takes effect. At this time, you need to run the Remote Java Application debug8000. In the Eclipse Debug perspective, you can see the breakpoint we just set and jump directly to the line we set!

If you have completed this step by step, congratulations! You have opened the door for Hadoop development!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More