Hadoop cluster (Phase 1) _ eclipse Development Environment Settings

Source: Internet
Author: User
Tags hadoop fs
Document directory
  • 1.1 hadoop cluster Introduction
  • 1.2 introduction to Windows Development
  • 2.1 Eclipse plug-in Introduction
  • 2.2 Introduction to hadoop working directory
  • 2.3 modify the system administrator name
  • 2.4 Eclipse plug-in Development Configuration
  • 3.1 configure the JDK of eclipse
  • 3.2 set eclipse encoding to UTF-8
  • 3.3 create a mapreduce Project
  • 3.4 create a wordcount class
  • 3.5 run the wordcount Program
  • 3.6 view wordcount running results
  • 4.1 "error: Failure to login" Problem
  • 4.2 "Permission denied" Problems
  • 4.3 "failed to set permissions of path"
  • 4.4 "hadoop mapred execution directory file permission" Restriction
1. Introduction to hadoop Development Environment 1.1 Introduction to hadoop Clusters

Java version: jdk-6u31-linux-i586.bin

Linux: centos6.0

Hadoop version: hadoop-1.0.0.tar.gz

1.2 introduction to Windows Development

Java version: jdk-6u31-windows-i586.exe

Windows 7 flagship Edition

Eclipse software: eclipse-jee-Indigo-Sr1-win32.zip | eclipse-jee-Helios-Sr2-win32.zip

Hadoop software: hadoop-1.0.0.tar.gz

Hadoop Eclipse plug-in: hadoop-eclipse-plugin-1.0.0.jar

: Http://download.csdn.net/detail/xia520pi/4113746

Note:The following is a collection of online "hadoop-eclipse-plugin-1.0.0.jar", except "version 2.0" is based on "V1.0" according to "FAQ _1In addition to the change, the remaining "V3.0", "v4.0", and "V5.0" have been completed by others like "V2.0", and I have already tested them, so there is no problem, you can use it with confidence. Select "V5.0" here. Remember to useRenameIs"Hadoop-eclipse-plugin-1.0.0.jar".

 

2. Introduction to hadoop eclipse and introduction to using the 2.1 Eclipse plug-in

Hadoop is a powerful parallel framework that allows tasks to be processed in parallel on its distributed cluster. However, writing and debugging hadoop programs are very difficult. Because of this, hadoop developers have developed the hadoop Eclipse plug-in, which embeds eclipse in the hadoop development environment, so as to achieve the graphical development environment and reduce programming difficulty. After installing the plug-in and configuring hadoop-related information, if you create a hadoop program, the plug-in automatically imports the JAR file of the hadoop programming interface, in this way, you can write, debug, and run hadoop programs (including standalone and distributed programs) on the eclipse graphical interface ), you can also view the real-time status, error information, and running result of your program, and view and manage HDFS and files. In general, the hadoop Eclipse plug-in is easy to install, easy to use, and powerful. Especially in hadoop programming, it is an essential tool for hadoop beginners and hadoop programming.

2.2 Introduction to hadoop working directory

For future development convenience, we will install the software used in development in this directory as follows, except for JDK installation. Here I will install JDK in the default installation path of drive C, the following is my working directory:

 

System Disk (E :)

| --- Hadoopworkplat

| --- Eclipse

| --- Hadoop-1.0.0

| --- Workplace

| ---......

 

Decompress eclipse and hadoop to the "E: \ hadoopworkplat" directory and create "Workplace" as the eclipse workspace.

 

 

Note:You can design according to your own situation, not necessarily according to my structure.

2.3 modify the system administrator name

After more than two days of exploration, in order to enable eclipse to modify and delete files on hdfs of the hadoop cluster, modify the name of the win7 system administrator for your work, the default value is"Administrator", Change it"Hadoop", The user name is the same as that of a common hadoop cluster user. You should remember that all machines in the hadoop cluster have a common user-hadoop, and hadoop is also used for running. In order not to worry about permissions, we can modify the name of the system administrator on Windows 7 to avoid problems such as the user's lack of permissions on the hadoop cluster, this will affect the creation and deletion of HDFS files in the hadoop cluster in eclipse.

You can perform an experiment to view the logs under "/usr/hadoop/logs" on the master. hadoop machine. If the permission is insufficient, you cannot perform the "write" operation. There are several solutions on the Internet, but this does not work for hadoop1.0. For details, see "FAQ _2". Next, modify the Administrator name.

First"Right-click"Desktop logo"My computer", Select"Management", The pop-up interface is as follows:

 

 

Then select"Local users and groups", Expand"User", Find the system administrator"Administrator", Modify it"Hadoop", The operation result is as follows:

 

 

Finally, run the computer"Cancel"Or"Restart your computer", So that the administrator canAvailableThis name.

2.4 Eclipse plug-in Development Configuration

Step 1: Put our "hadoop-eclipse-plugin-1.0.0.jar" to the eclipse directory"Plugins", And then re-Eclipse to take effect.

 

System Disk (E :)

| --- Hadoopworkplat

| --- Eclipse

| ---Plugins

| ---Hadoop-eclipse-plugin-1.0.0.jar

 

The above is where my "hadoop-eclipse-plugin" plug-in is placed. Restart eclipse, for example:

 

 

Carefully find "DFS locations" under "project Explorer" on the left, which indicates that eclipse has recognized the hadoop Eclipse plug-in just now.

 

Step 2: Select"Window"Under the menu"Preference", Then a form is displayed. On the left side of the form, there is a column of options, which will be more"Hadoop MAP/reduce"Option, click this option, select the hadoop installation directory (for example, My hadoop Directory: e: \ hadoopworkplat \ hadoop-1.0.0 ). The result is as follows:

 

 

Step 3: Switch the "map/reduce" working directory. There are two methods:

1) Select "open perspective" under "window", and a form is displayed. Select "map/reduce" to switch.

 

 

2)Upper right corner, Click in the icon """"Click the "other" option and select "map/reduce" in the pop-up window. Then click "OK" to confirm.

Switch to the "map/reduce" working directory, as shown in.

 

 

Step 4: Establish a connection with the hadoop cluster, under the eclipse software"MAP/reduce locations"ProceedRight-clickIn the left-side Navigation Pane, select"New hadoop location", And a form is displayed.

 

 

 

Note that the red mark must be noted.

  • Location name: identifies a "map/reduce location"
  • MAP/reduce master
    HOST: 192.168.1.2 (Master. hadoopIP address)
    Port: 9001
  • DFS master
    Use M/R master HOST: the previousCheck. (Because our namenode and jobtracker are on one machine .)
    Port: 9000
  • User name: hadoop (the default name is win system administrator, because we changed it to hadoop .)

 

 

Note:The host and port here are the addresses and ports you configured in the mapred-site.xml and core-site.xml respectively. For more information, see"Hadoop cluster _ Stage 2 _ hadoop installation configuration _ V1.0.

Click "Advanced Parameters" to find "hadoop. tmp. dir" and change it to the address set in our hadoop cluster. Our hadoop cluster is"/Usr/hadoop/tmpThis parameter is configured in core-site.xml.

 

 

After clicking "finish", you will find a message in "map/reduce locations" under eclipse, that is, the "map/reduce location" We just created ".

 

Step 5: View the HDFS file system, create folders, and upload files. Click "win7tohadoop" under "DFS locations" on the left of eclipse software to display the file structure on HDFS.

 

Right-click "win7tohadoop à user àHadoop"Try to create a" folder -- xiapi ", right-click and refresh to view the folder we just created.

 

 

After the creation, refresh the page. The result is as follows:

 

 

Use securecrt to remotely log on to the "Master. hadoop" server and run the following command to check whether a "xiapi" folder has been created.

 

Hadoop FS-ls

 

So far, our hadoop eclipse development environment has been configured. If you are not happy, you can upload a local file to the HDFS distributed file to compare whether the file has been uploaded successfully.

3. Run the wordcount program 3.1 in eclipse to configure the JDK of Eclipse.

If JDK 6.0 is not installed on your computer, check whether the default JDK of the eclipse platform is. Select from the "window" menu"Preference", A form is displayed. You can find" Java "on the left of the form, select" installed jres ", and add JDK. Below is my default JRE.

 

 

The following settings are not added:

 

 

The result after jdk6.0 is added is as follows:

 

 

Set complier.

 

 

3.2 set eclipse encoding to UTF-8

 

 

3.3 create a mapreduce Project

From the "file" menu, select "other" and find"MAP/reduce project", And then select it.

 

Enter the name of the mapreduce project as "wordcountproject" and click "finish.

 

 

So far, we have successfully created a mapreduce project. We found that the project we just created is added to the left side of the eclipse software.

 

 

3.4 create a wordcount class

Select "wordcountproject", right-click the pop-up menu, select "new", select "class", and enter the following information:

 

 

Because we directly use the wordcount program that comes with hadoop1.0.0, the registration must be consistent with "org. Apache. hadoop. Examples" in the code, and the class name must also be consistent with "wordcount ". This code is put in the following structure.

 

Hadoop-1.0.0

| --- SRC

| --- Examples

| --- Org

| --- Apache

| --- Hadoop

| --- Examples

 

See"Wordcount. Java"File, open it in notepad, and copy the code to the created Java file. Of course there are some changes to the source code, and the red of the changes has been marked.

 

package org.apache.hadoop.examples;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {  public static class TokenizerMapper        extends Mapper<Object, Text, Text, IntWritable>{        private final static IntWritable one = new IntWritable(1);    private Text word = new Text();          public void map(Object key, Text value, Context context                    ) throws IOException, InterruptedException {      StringTokenizer itr = new StringTokenizer(value.toString());      while (itr.hasMoreTokens()) {        word.set(itr.nextToken());        context.write(word, one);      }    }  }    public static class IntSumReducer        extends Reducer<Text,IntWritable,Text,IntWritable> {    private IntWritable result = new IntWritable();    public void reduce(Text key, Iterable values,                        Context context                       ) throws IOException, InterruptedException {      int sum = 0;      for (IntWritable val : values) {        sum += val.get();      }      result.set(sum);      context.write(key, result);    }  }  public static void main(String[] args) throws Exception {    Configuration conf = new Configuration();    conf.set("mapred.job.tracker", "192.168.1.2:9001");    String[] ars=new String[]{"input","newout"};    String[] otherArgs = new GenericOptionsParser(conf, ars).getRemainingArgs();    if (otherArgs.length != 2) {      System.err.println("Usage: wordcount  ");      System.exit(2);    }    Job job = new Job(conf, "word count");    job.setJarByClass(WordCount.class);    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));    System.exit(job.waitForCompletion(true) ? 0 : 1);  }}

 

NOTE: If"Conf. Set ("mapred. Job. Tracker", "192.168.1.2: 9001 ");", You will be prompted that your permissions are insufficient. The reason is that the configuration in" map/reduce location "isNoIt works completely. Instead, it creates a file on the local disk and tries to run the file. Obviously, it doesn't work. We want eclipse to submit a job to the hadoop cluster, so here we manually add the job run address. For more information, see "FAQ _3".

3.5 run the wordcount Program

Select the "wordcount. Java" program,Right-clickRun as "Run as à run on hadoop" at a time. Then, an example is displayed, as shown in the following figure.

 

 

The running result is as follows:

 

 

We learned that our program has been successfully run.

3.6 view wordcount running results

On the left side of the eclipse software, right-click "DFS locations à win7tohadoop à user à hadoop" and click "refresh". The folder "newoutput" appears. Remember that the "newoutput" folder is automatically created when the program is running. If the same folder already exists, either the program changes to a new output folder or the duplicate folder on HDFS is deleted, otherwise, an error occurs.

 

 

Open the "newoutput" folder and open"Part-r-00000 "file, you can see the result after execution.

 

 

So far, the eclipse development environment has been set up and the wordcount program has been successfully run. Next, let's start the hadoop journey.

4. FAQs: faq4.1 "error: Failure to login"

Next to the Internet to find the "hadoop-0.20.203.0" for example, I used "V1.0" also appeared in this case, the reason is that "hadoop-eclipse-plugin-1.0.0_V1.0.jar", is directly compiled from the source code, therefore, the corresponding jar package is missing. The details are as follows:

Addresses: http://blog.csdn.net/chengfei112233/article/details/7252404

In my practice, I found that if the package of the hadoop-0.20.203.0 version is copied directly to the Eclipse plug-in directory, an error will occur when connecting to the DFS, the message is: "error: failure to login ".

The pop-up error prompt box contains"An internal error occurred during: "connecting to DFS hadoop". org/Apache/commons/configuration/Configuration". Check the eclipse log and find that the jar package is missing. Further find the information, found that directly copy the hadoop-eclipse-plugin-0.20.203.0.jar, the package in the lib directoryMissingJar package.

After collecting online information, the correct installation method is provided here:

First, modify the hadoop-eclipse-plugin-0.20.203.0.jar. Open the package with the archive manager and discover that there are only two packages, commons-cli-1.2.jar and hadoop-core.jar. Run the following command in the hadoop/lib directory:

  • Commons-configuration-1.6.jar,
  • Commons-httpclient-3.0.1.jar,
  • Commons-lang-2.4.jar,
  • Jackson-core-asl-1.0.1.jar
  • Jackson-mapper-asl-1.0.1.jar

A total of five packages are copied to the lib directory of the hadoop-eclipse-plugin-0.20.203.0.jar, such:

 

 

Then, modifyManifest. MF, Change classpath to the following content:

 

Bundle-classpath: classes/, lib/hadoop-core.jar, lib/commons-cli-1.2.jar, lib/commons-httpclient-3.0.1.jar, lib/jackson-core-asl-1.0.1.jar, lib/jackson-mapper-asl-1.0.1.jar, lib/commons-configuration-1.6.jar, lib/commons-lang-2.4.jar

 

 

This completes the modification to the hadoop-eclipse-plugin-0.20.203.0.jar.

Finally, copy the hadoop-eclipse-plugin-0.20.203.0.jar to the eclipse plugins directory.

Note:The operation above applies to the same "hadoop-1.0.0.

4.2 "Permission denied" Problems

I tried a lot on the Internet. I mentioned "hadoop FS-chmod 777/user/hadoop" and "DFS. permissions configuration item, change the value to false ", mentioned" hadoop. job. ugi ", but communication is ineffective.

References:

Address 1: http://www.cnblogs.com/acmy/archive/2011/10/28/2227901.html

Address 2: http://sunjun041640.blog.163.com/blog/static/25626832201061751825292/

Error Type: Org. apache. hadoop. security. accesscontrolexception: Org. apache. hadoop. security. accesscontrolexception: Permission denied: User = **********, access = write, inode = "hadoop": hadoop: supergroup: rwxr-XR-x

Solution:

My solution directly changes the name of the system administrator to the user whose hadoop cluster runs hadoop.

4.3 "failed to set permissions of path"

References: https://issues.apache.org/jira/browse/HADOOP-8089

The error message is as follows:

Error security. usergroupinformation: priviledgedactionexception as: hadoop cause: Java. io. ioexception failed to set permissions of path: \ USR \ hadoop \ TMP \ mapred \ staging \ hadoop753422487 \. staging to 0700 exception in thread "Main" Java. io. ioexception: failed to set permissions of path: \ USR \ hadoop \ TMP \ mapred \ staging \ hadoop753422487 \. staging to 0700

Solution:

 

Configuration conf = new configuration ();

Conf. Set ("mapred. Job. Tracker", "[server]: 9001 ");

 

"[Server]: 9001"In"[Server]"Is the IP address of the hadoop cluster master.

4.4 "hadoop mapred execution directory file permission" Restriction

References: http://blog.csdn.net/azhao_dn/article/details/6921398

The error message is as follows:

Job submission failed with exception 'java. io. ioexception (the ownership/permissions on the staging directory/tmp/hadoop-hadoop-user1/mapred/staging/hadoop-user1 /. staging is not as expected. it is owned by hadoop-user1 and permissions are rwxrwxrwx. the directory must be owned by the submitter hadoop-user1 or by hadoop-user1 and permissions must be rwx ------)

Modify permissions:

 

 

This will solve the problem.

 

Article download: http://files.cnblogs.com/xia520pi/HadoopCluster_Vol.7.rar

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.