Hands-on instructions on how to build the hadoop2.5.1 + eclipse development and debugging environment (2)

Source: Internet
Author: User
Tags hadoop mapreduce hadoop fs
In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears! 4. For eclipse, see eclipse again. I don't want to talk about it as long as I am not familiar with java, so I will not talk about it any more. Ask www.eclipse.org for everything. Note that eclipse

In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears! 4, eclipse, see eclipse this I want to do as long as java is not unfamiliar, so I will not say more, everything to ask for http://www.eclipse.org. Note that eclipse

In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears!


4. See eclipse again.

This I want to do as long as java is not unfamiliar, so I will not say more, everything to ask for http://www.eclipse.org.

Note: The eclipse environment here is installed in the Virtual Machine. Do not install it in the wrong place!


5. Install the maven Environment

Download maven3 from maven.apache.org and decompress it to/home (because/home is generally a data disk, which does not occupy the space of the System Disk ). Configuration ~ /. Bash_profile, modify PATH

PATH = $ PATH:/home/maven3/bin/

Then source ~ /. Bash_profile to make the environment variable take effect.

Run mvn-version to check whether the configuration is successful.

Because


Ii. debug mapreduce code

1. Create a simple Hadoop MapReduce example

First, create a maven project in eclipse and use a simple template to modify pom. xml to add dependencies:

     
  
   org.apache.hadoop
      hadoop-client    
  
   2.5.1
  
 


Then refer. Of course, he used hadoop-eclipse-plugin. This plug-in is very complicated to configure. It is not recommended that you directly use the command line method, not only can you exercise the level of proficiency in hadoop commands, it saves a lot of trouble. I used his first Dedup example, two input file samples and a Dedup class for ease of use.

Use bin/hadoop fs-mkdir/user/root/dedup_in to create the directory on HDFS.

The two sample data can be stored in two TXT files, such as in.txtand in1.txt. The names are randomly stored in the/home/temp/directory, and then uploaded to HDFS using bin/hadoop dfs-copyFromLocal/home/temp/user/root/dedup_in, data preparation is complete.

Execute mvn install on the command line to compile and package the program to the jar, and then input:

Bin/hadoop jar project directory/target/xxx. jar [package]. Dedup

You can run a mapreduce, of course, you need to start hadoop with sbin/start-all.sh.

2. Modify the Dedup class

To debug mapreduce locally, we also need to modify the Dedup class, mainly to modify the main function, and add several lines of code after Configuration conf = new Configuration:

Configuration conf = new Configuration (); conf. addResource ("classpath: mapred-site.xml"); conf. set ("fs. defaultFS "," hdfs: // Virtual Machine IP: 8020 "); conf. set ("mapreduce. framework. name "," yarn "); conf. set ("yarn. resourcemanager. address "," Virtual Machine IP address: 8032 "); conf. set ("mapred. remote. OS "," Linux "); conf. set ("hadoop. job. ugi "," hadoop, hadoop ");


3. Create a local runtime configuration

Add a mapred-site.xml file under src/main/resources to configure the runtime hadoop parameters:

     
          
   
    mapreduce.framework.name
           
   
    yarn
       
      
          
   
    mapred.child.java.opts
           
   
    -Xmx800m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000
       
      
          
   
    mapreduce.jobtracker.staging.root.dir
           
   
    /tmp
       
      
          
   
    yarn.app.mapreduce.am.staging-dir
           
   
    /tmp
       
      
          
   
    mapreduce.framework.name
           
   
    local
       
      
          
   
    mapreduce.jobtracker.address
           
   
    local
       
      
          
   
    mapred.job.tracker
           
   
    local
       
  
 


Note:-Xmx800m-Xdebug-Xrunjdwp: transport = dt_socket, server = y, suspend = y, address = 8000 address = 8000 can be used to configure an unused port number, in case this port is occupied.

4. Add a LocalJob class

This class is used to help hadoop-mapreduce-client release jar in local mode. The Code is as follows:

Import java. io. IOException; import org. apache. hadoop. conf. configuration; import org. apache. hadoop. mapred. jobConf; import org. apache. hadoop. mapreduce. job;/*** @ author root **/public class LocalJob extends Job {public LocalJob () throws IOException {super ();} public LocalJob (Configuration conf, String jobName) throws IOException {super (conf, jobName);} public LocalJob (Configuration conf) throws IOException {super (conf);} public static LocalJob getInstance (Configuration conf, String jobName) throws IOException {JobConf jobConf = new JobConf (conf); LocalJob job = new LocalJob (jobConf); return job;} public void setJarByClass (Class
 Clazz) {super. setJarByClass (clazz); conf. setJar ("file: // project directory/target/xxx. jar ");}}

Modify Job job = new Job (conf, "Data Deduplication") in the main function of Dedup class as follows:

Job job = LocalJob.getInstance(conf, "Data Deduplication");

5. Add a Remote Java Application instance debug8000 In the Debug configuration of eclipse and set the Remote port number to 8000, which is the one we configured in the mapred-site.xml. Then execute mvn install once in the command line to compile the jar file.

6. Set the breakpoint in the map or reduce code of the Dedup class and Run the Dedup class in the form of application. Note that it is not the Debug method, but the Run method. If the dedup_out directory already exists, use bin/hadoop fs-rm-r/user/root/dedup_out to delete the output directory.

7. If you see that the Dedup class is started normally and no error is reported, but it cannot be stopped, do not be surprised because the breakpoint you just set takes effect. At this time, you need to run the Remote Java Application debug8000. In the Eclipse Debug perspective, you can see the breakpoint we just set and jump directly to the line we set!


If you have completed this step by step, congratulations! You have opened the door for Hadoop development!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.