In the previous blog post we set up a good operating environment, this article we began to build a development debugging environment. This is the essence of the real, is the countless tears cast!
4. Eclipse, see also eclipse
This I think as long as the Java is not unfamiliar, so I will not say more, everything to http://www.eclipse.org request.
Note that the eclipse environment here is installed in the virtual machine Oh, don't put it in the wrong place!
5. Install MAVEN Environment
Go to maven.apache.org download Maven3, extract to/home (because/home is usually a data disk, installed here does not occupy the space of the system disk). Configure ~/.bash_profile, modify path
Path= $PATH:/home/maven3/bin/
Then source ~/.bash_profile, making the environment variable effective.
Execute the mvn-version to see if the configuration is successful.
Because repo.maven.org/maven2 in the foreign visit too slow, finally set up a local image, I use a oschina image, please refer to: http://maven.oschina.net/help.html
Second, debug the MapReduce code
1. A simple Hadoop mapreduce example
First we create a MAVEN project in Eclipse, directly with a simple template, and modify Pom.xml to add dependencies:
<dependency> <groupId>org.apache.hadoop</groupId> <artifactid>hadoop-client </artifactId> <version>2.5.1</version></dependency>
Then refer to: http://www.cnblogs.com/xia520pi/archive/2012/06/04/2534533.html to write a simple mapreduce program. Of course, he used the Hadoop-eclipse-plugin, this east of the complex configuration, not recommended to use, direct use of the command line, not only can exercise the proficiency of the Hadoop command, but also save a lot of things. For the sake of convenience, I used his first dedup example, a sample of two input files and a Dedup class.
Create a directory on HDFs with Bin/hadoop fs-mkdir/user/root/dedup_in first.
Two sample data can be placed in two TXT files, such as In.txt and In1.txt, the name is arbitrary, placed in the/home/temp/directory, and then Bin/hadoop dfs-copyfromlocal/home/temp/user/root/ The dedup_in is uploaded to HDFs and the data is ready for completion.
At the command line, execute MVN install to package this program into the jar, and then enter it at the command line:
Bin/hadoop Jar Engineering Catalog/target/xxx.jar [package]. Dedup
Can run a mapreduce, of course, you have to start Hadoop with sbin/start-all.sh first.
2, modify the Dedup class
In order to track and debug MapReduce locally, we also need to modify the Dedup class, mainly by modifying the main function, and adding a few lines of code after the configuration conf = new configuration ():
Configuration conf = new configuration (); Conf.addresource ("Classpath:mapred-site.xml"); Conf.set ("Fs.defaultfs", " hdfs://virtual Machine ip:8020 "), Conf.set (" Mapreduce.framework.name "," yarn "); Conf.set (" Yarn.resourcemanager.address "," Virtual machine IP : 8032 "); Conf.set (" Mapred.remote.os "," Linux "); Conf.set (" Hadoop.job.ugi "," Hadoop,hadoop ");
3. Create a local runtime configuration
Add a Mapred-site.xml file under Src/main/resources to configure the Hadoop parameters at run time:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yar n</value> </property> <property> <name>mapred.child.java.opts</name> < ;value>-xmx800m-xdebug-xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000</value> </ property> <property> <name>mapreduce.jobtracker.staging.root.dir</name> <value> ;/tmp</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name > <value>/tmp</value> </property> <property> <name>mapreduce.framewor k.name</name> <value>local</value> </property> <property> <name>ma preduce.jobtracker.address</name> <value>local</value> </property> <property> <name>mapred.job.trackeR</name> <value>local</value> </property></configuration>
Note: The address=8000 in-xmx800m-xdebug-xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 can be configured with a port number that is not occupied, In case this port is already occupied.
4. Add a Localjob class
This class is intended to help hadoop-mapreduce-client publish the jar in local mode, with the following code:
Import Java.io.ioexception;import Org.apache.hadoop.conf.configuration;import org.apache.hadoop.mapred.JobConf; Import org.apache.hadoop.mapreduce.job;/** * @author root * */public class Localjob extends Job {public localjob () throws IOException {super ();} Public Localjob (Configuration conf, String jobName) throws IOException {super (conf, jobName);} Public Localjob (Configuration conf) throws IOException {super (conf);} public static Localjob getinstance (Configuration conf, String jobName) throws ioexception{jobconf jobconf = new Jobconf (CO NF); Localjob job=new localjob (jobconf); return job;} public void Setjarbyclass (class<?> clazz) {super.setjarbyclass (clazz); Conf.setjar ("file:///Engineering catalog/target/ Xxx.jar ");}}
Modify the behavior of the Dedup class main function Job Job=new job (conf, "Data deduplication"):
Job Job = localjob.getinstance (conf, "Data deduplication");
5. Add a remote Java application instance debug8000 to the debug configuration of Eclipse and set the port number to 8000, which is the one we configured earlier in Mapred-site.xml. Then execute the MVN install once on the command line to compile the jar file.
6, in the Dedup class map or reduce code set breakpoints, in order to application form Rundedup class, note is not the debug mode, but the run mode. If this times Dedup_out directory already exists, use Bin/hadoop fs-rm-r/user/root/dedup_out to delete the output directory
7, if see Dedup class normal start, no error, but stop moving, then don't be surprised, because you just set the breakpoint to take effect. At this point, you will be running debug8000 this remote Java application, in the Eclipse debug perspective you can see the breakpoint we just set, and jump directly to the line we set!
If you are done step by step, congratulations! You have opened the door to Hadoop development!
Hands-on teaching hadoop2.5.1+eclipse development and commissioning Environment Construction (2)