In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears! 4. For eclipse, see eclipse again. I don't want to talk about it as long as I am not familiar with java, so I will not talk about it any more. Ask www.eclipse.org for everything. Note that eclipse
In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears! 4, eclipse, see eclipse this I want to do as long as java is not unfamiliar, so I will not say more, everything to ask for http://www.eclipse.org. Note that eclipse
In the previous blog, we set up the runtime environment. In this article, we started to build a development debugging environment. This is the true essence, made up of countless tears!
4. See eclipse again.
This I want to do as long as java is not unfamiliar, so I will not say more, everything to ask for http://www.eclipse.org.
Note: The eclipse environment here is installed in the Virtual Machine. Do not install it in the wrong place!
5. Install the maven Environment
Download maven3 from maven.apache.org and decompress it to/home (because/home is generally a data disk, which does not occupy the space of the System Disk ). Configuration ~ /. Bash_profile, modify PATH
PATH = $ PATH:/home/maven3/bin/
Then source ~ /. Bash_profile to make the environment variable take effect.
Run mvn-version to check whether the configuration is successful.
Because
Ii. debug mapreduce code
1. Create a simple Hadoop MapReduce example
First, create a maven project in eclipse and use a simple template to modify pom. xml to add dependencies:
org.apache.hadoop
hadoop-client
2.5.1
Then refer. Of course, he used hadoop-eclipse-plugin. This plug-in is very complicated to configure. It is not recommended that you directly use the command line method, not only can you exercise the level of proficiency in hadoop commands, it saves a lot of trouble. I used his first Dedup example, two input file samples and a Dedup class for ease of use.
Use bin/hadoop fs-mkdir/user/root/dedup_in to create the directory on HDFS.
The two sample data can be stored in two TXT files, such as in.txtand in1.txt. The names are randomly stored in the/home/temp/directory, and then uploaded to HDFS using bin/hadoop dfs-copyFromLocal/home/temp/user/root/dedup_in, data preparation is complete.
Execute mvn install on the command line to compile and package the program to the jar, and then input:
Bin/hadoop jar project directory/target/xxx. jar [package]. Dedup
You can run a mapreduce, of course, you need to start hadoop with sbin/start-all.sh.
2. Modify the Dedup class
To debug mapreduce locally, we also need to modify the Dedup class, mainly to modify the main function, and add several lines of code after Configuration conf = new Configuration:
Configuration conf = new Configuration (); conf. addResource ("classpath: mapred-site.xml"); conf. set ("fs. defaultFS "," hdfs: // Virtual Machine IP: 8020 "); conf. set ("mapreduce. framework. name "," yarn "); conf. set ("yarn. resourcemanager. address "," Virtual Machine IP address: 8032 "); conf. set ("mapred. remote. OS "," Linux "); conf. set ("hadoop. job. ugi "," hadoop, hadoop ");
3. Create a local runtime configuration
Add a mapred-site.xml file under src/main/resources to configure the runtime hadoop parameters:
mapreduce.framework.name
yarn
mapred.child.java.opts
-Xmx800m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000
mapreduce.jobtracker.staging.root.dir
/tmp
yarn.app.mapreduce.am.staging-dir
/tmp
mapreduce.framework.name
local
mapreduce.jobtracker.address
local
mapred.job.tracker
local
Note:-Xmx800m-Xdebug-Xrunjdwp: transport = dt_socket, server = y, suspend = y, address = 8000 address = 8000 can be used to configure an unused port number, in case this port is occupied.
4. Add a LocalJob class
This class is used to help hadoop-mapreduce-client release jar in local mode. The Code is as follows:
Import java. io. IOException; import org. apache. hadoop. conf. configuration; import org. apache. hadoop. mapred. jobConf; import org. apache. hadoop. mapreduce. job;/*** @ author root **/public class LocalJob extends Job {public LocalJob () throws IOException {super ();} public LocalJob (Configuration conf, String jobName) throws IOException {super (conf, jobName);} public LocalJob (Configuration conf) throws IOException {super (conf);} public static LocalJob getInstance (Configuration conf, String jobName) throws IOException {JobConf jobConf = new JobConf (conf); LocalJob job = new LocalJob (jobConf); return job;} public void setJarByClass (Class
Clazz) {super. setJarByClass (clazz); conf. setJar ("file: // project directory/target/xxx. jar ");}}
Modify Job job = new Job (conf, "Data Deduplication") in the main function of Dedup class as follows:
Job job = LocalJob.getInstance(conf, "Data Deduplication");
5. Add a Remote Java Application instance debug8000 In the Debug configuration of eclipse and set the Remote port number to 8000, which is the one we configured in the mapred-site.xml. Then execute mvn install once in the command line to compile the jar file.
6. Set the breakpoint in the map or reduce code of the Dedup class and Run the Dedup class in the form of application. Note that it is not the Debug method, but the Run method. If the dedup_out directory already exists, use bin/hadoop fs-rm-r/user/root/dedup_out to delete the output directory.
7. If you see that the Dedup class is started normally and no error is reported, but it cannot be stopped, do not be surprised because the breakpoint you just set takes effect. At this time, you need to run the Remote Java Application debug8000. In the Eclipse Debug perspective, you can see the breakpoint we just set and jump directly to the line we set!
If you have completed this step by step, congratulations! You have opened the door for Hadoop development!