Ubuntu under Eclipse Development Hadoop Application Environment configuration

Source: Internet
Author: User
Tags gtk hadoop fs

Hello, everyone, let me introduce you to Ubuntu. Eclipse Development Hadoop Application Environment configuration, the purpose is simple, for research and learning, the deployment of a Hadoop operating environment, and build a Hadoop development and testing environment.

Environment: Vmware 8.0 and Ubuntu11.04

The first step: Download eclipse-sdk-4.2.1-linux-gtk.tar.gz

Http://mirrors.ustc.edu.cn/eclipse/eclipse/downloads/drops4/R-4.2.1-201209141800/eclipse-SDK-4.2.1-linux-gtk.tar.gz

Note: Download the 32-bit eclipse under Linux, do not download 64-bit eclipse, or you will not be able to start eclipse

Step Two: Download the latest version of the Hadoop plugin

Https://issues.apache.org/jira/secure/attachment/12460491/hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar

Rename: Rename the downloaded plugin to "Hadoop-0.20.2-eclipse-plugin.jar"

Copy the Hadoop-0.20.2-eclipse-plugin.jar to the Eclipse/plugins directory and restart Eclipse.

1, on the left of Project Explorer there will be a DFS locations flag
2, in Windows-> preferences will be more than one Hadoop map/reduce option, select this option, and then to the right, the download of the Hadoop root directory selected

If you can see the above two points the installation was successful.

Step Three: Configure the Hadoop path

Window-> Preferences Select "Hadoop map/reduce" and click "Browse ..." to select the path to the Hadoop folder.
This step is not related to the operating environment, but can be automatically imported into the Hadoop root and all jar packages under the Lib directory when you create a new project.

Fourth Step: Add a MapReduce environment

The plugin is finished, start Hadoop, and then you can build a Hadoop connection, which is equivalent to configuring a WebLogic connection in eclipse.
As shown in the picture, open map/reduce locations view, in the upper right-hand corner there is an elephant icon clicked

At the bottom of eclipse, there will be a tab next to the console, called "Map/reduce locations", right in the blank space below, select "New Hadoop location ..." as shown in the figure:

Location name: This casually fill, I fill in is: Hadoop.
Map/reduce Master in this box
Host: This is the cluster machine where jobtracker, write localhost
Hort: That's the port of Jobtracker, it says 9101.
These two parameters are the IP and port inside the mapred-site.xml inside Mapred.job.tracker
DFS Master in this box
Host: This is the cluster machine where namenode, write localhost
Port: It's the port of Namenode, which writes 9100.
These two parameters are the IP and port inside the core-site.xml inside Fs.default.name
(Use M/R master host, this checkbox, if selected, is the default and Map/reduce master in this box, if you do not choose, you can define the input, here Jobtracker and Namenode on a machine, the

To be the same, on the tick
User name: This is the username to connect to Hadoop, because I'm a Hadoop installed with a Sony user, and I don't have any other users, so I use Sony.
The following is not required.
Then click the Finish button, and there is one more record in this view.

Restart Eclipse and edit the connection record you just created, as shown in the figure now we edit the Advance Parameters tab page

(Restart Edit Advance Parameters tab page Reason: When you create a new connection, some of the properties of this Advance Paramters tab page won't show up, you can't set it up, so you have to restart eclipse to come in and edit to see it).
Most of the attributes here have been automatically filled in, in fact, the Core-defaulte.xml, Hdfs-defaulte.xml, mapred-defaulte.xml inside some of the configuration properties displayed. Because there are changes in the site family profile when you install Hadoop, this is the same setting. The main concerns are the following attributes:
Fs.defualt.name: This is already set on the General tab page.
Mapred.job.tracker: This is also set on the General tab page.
Hadoop.job.ugi: Just said the invisible one, is this attribute, here to fill: Sony,tardis, the comma front is connected to the Hadoop user, the comma is written after the death of Tardis
Then click Finish, and then it's connected.

Step Fifth: Use Eclipse to modify the content of the HDFs

After the previous step, the left "Project Explorer" should appear in the configured HDFs, right-click, you can create a new folder, delete folders, upload files, download files, delete files and other operations. Note: After each operation you cannot immediately display changes in Eclipse, you must refresh it.

Create a new two file in the/home/tanglg1987/input directory File01.txt,file02.txt

File01.txt contents are as follows:

Hello Hadoop

File02.txt contents are as follows:

Hello World

Upload local files to HDFs:

Hadoop fs-put/home/tanglg1987/file01.txt input
Hadoop fs-put/home/tanglg1987/file02.txt input

Sixth step: Create the Project

File-> New-> Project Select Map/reduce Project, and then enter the name of the item to create the project. The plugin automatically imports all the jar packages in the Hadoop root and Lib directories.

Seventh Step: Create a new Wordcount.java, where the system with the Tokencountmapper and Longsumreducer, the code is as follows:

Package com.baison.action;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.Text;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.mapred.FileInputFormat;
Import Org.apache.hadoop.mapred.FileOutputFormat;
Import org.apache.hadoop.mapred.JobClient;
Import org.apache.hadoop.mapred.JobConf;
Import Org.apache.hadoop.mapred.lib.TokenCountMapper;
Import Org.apache.hadoop.mapred.lib.LongSumReducer;
		public class WordCount {public static void main (string[] args) {jobclient client = new Jobclient ();
		jobconf conf = new jobconf (wordcount.class);
		String[] arg = {"Hdfs://localhost:9100/user/tanglg1987/input", "Hdfs://localhost:9100/user/tanglg1987/output"};
		Fileinputformat.addinputpath (conf, new Path (arg[0));
		Fileoutputformat.setoutputpath (conf, new Path (arg[1));
		Conf.setoutputkeyclass (Text.class);
		Conf.setoutputvalueclass (Longwritable.class);
		Conf.setmapperclass (Tokencountmapper.class);
		Conf.setcombinerclass (Longsumreducer.class); Conf.sEtreducerclass (Longsumreducer.class);
		client.setconf (conf);
		try {jobclient.runjob (conf);
		catch (Exception e) {e.printstacktrace (); }
	}
}

Eighth step: Run WordCount

Run as-> running on Hadoop Select the MapReduce run environment that was previously configured and click "Finish".

The running process is as follows:

12/10/18 22:53:38 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=jobtracker, sessionid=
12/10/18 22:53:38 WARN mapred. Jobclient:use Genericoptionsparser for parsing the arguments. Applications should implement Tool for the same.
12/10/18 22:53:38 WARN mapred.  Jobclient:no job jar file set. User classes May is found. Jobconf (Class) or Jobconf#setjar (String).
12/10/18 22:53:38 INFO mapred. Fileinputformat:total input paths to Process:2
12/10/18 22:53:39 INFO mapred. Jobclient:running job:job_local_0001
12/10/18 22:53:39 INFO mapred. Fileinputformat:total input paths to Process:2
12/10/18 22:53:39 INFO mapred. Maptask:numreducetasks:1
12/10/18 22:53:39 INFO mapred. MAPTASK:IO.SORT.MB = 100
12/10/18 22:53:39 INFO mapred. Maptask:data buffer = 79691776/99614720
12/10/18 22:53:39 INFO mapred. Maptask:record buffer = 262144/327680
12/10/18 22:53:39 INFO mapred. maptask:starting Flush of map output
12/10/18 22:53:39 INFO mapred. Maptask:finished spill 0
12/10/18 22:53:39 INFO mapred. TaskRunner:Task:attempt_local_0001_m_000000_0 is done. and is in the process of commiting
12/10/18 22:53:39 INFO mapred. Localjobrunner:hdfs://localhost:9100/user/tanglg1987/input/file01.txt:0+12
12/10/18 22:53:39 INFO mapred. Taskrunner:task ' Attempt_local_0001_m_000000_0 ' done.
12/10/18 22:53:39 INFO mapred. Maptask:numreducetasks:1
12/10/18 22:53:39 INFO mapred. MAPTASK:IO.SORT.MB = 100
12/10/18 22:53:39 INFO mapred. Maptask:data buffer = 79691776/99614720
12/10/18 22:53:39 INFO mapred. Maptask:record buffer = 262144/327680
12/10/18 22:53:39 INFO mapred. maptask:starting Flush of map output
12/10/18 22:53:39 INFO mapred. Maptask:finished spill 0
12/10/18 22:53:39 INFO mapred. TaskRunner:Task:attempt_local_0001_m_000001_0 is done. and is in the process of commiting
12/10/18 22:53:39 INFO mapred. Localjobrunner:hdfs://localhost:9100/user/tanglg1987/input/file02.txt:0+13
12/10/18 22:53:39 INFO mapred. Taskrunner:task ' attempt_local_0001_m_000001_0 ' done.
12/10/18 22:53:39 INFO mapred. Localjobrunner:
12/10/18 22:53:39 INFO mapred. Merger:merging 2 sorted Segments
12/10/18 22:53:39 INFO mapred. Merger:down to the last Merge-pass, with 2 segments left of total size:69 bytes
12/10/18 22:53:39 INFO mapred. Localjobrunner:
12/10/18 22:53:39 INFO mapred. TaskRunner:Task:attempt_local_0001_r_000000_0 is done. and is in the process of commiting
12/10/18 22:53:39 INFO mapred. Localjobrunner:
12/10/18 22:53:39 INFO mapred. Taskrunner:task Attempt_local_0001_r_000000_0 is allowed to commit now
12/10/18 22:53:39 INFO mapred. fileoutputcommitter:saved output of Task ' attempt_local_0001_r_000000_0 ' to hdfs://localhost:9100/user/tanglg1987/ Output
12/10/18 22:53:39 INFO mapred. Localjobrunner:reduce > Reduce
12/10/18 22:53:39 INFO mapred. Taskrunner:task ' Attempt_local_0001_r_000000_0 ' done.
12/10/18 22:53:40 INFO mapred. Jobclient:map 100% Reduce 100%
12/10/18 22:53:40 INFO mapred. Jobclient:job complete:job_local_0001
12/10/18 22:53:40 INFO mapred. Jobclient:counters:15
12/10/18 22:53:40 INFO mapred. Jobclient:filesystemcounters
12/10/18 22:53:40 INFO mapred. jobclient:file_bytes_read=49601
12/10/18 22:53:40 INFO mapred. jobclient:hdfs_bytes_read=62
12/10/18 22:53:40 INFO mapred. jobclient:file_bytes_written=100852
12/10/18 22:53:40 INFO mapred. Jobclient:hdfs_bytes_written=25
12/10/18 22:53:40 INFO mapred. Jobclient:map-reduce Framework
12/10/18 22:53:40 INFO mapred. Jobclient:reduce input groups=3
12/10/18 22:53:40 INFO mapred. Jobclient:combine Output records=4
12/10/18 22:53:40 INFO mapred. Jobclient:map input records=2
12/10/18 22:53:40 INFO mapred. Jobclient:reduce Shuffle bytes=0
12/10/18 22:53:40 INFO mapred. Jobclient:reduce Output records=3
12/10/18 22:53:40 INFO mapred. Jobclient:spilled records=8
12/10/18 22:53:40 INFO mapred. Jobclient:map Output bytes=57
12/10/18 22:53:40 INFO mapred. Jobclient:map input bytes=25
12/10/18 22:53:40 INFO mapred. Jobclient:combine input records=4
12/10/18 22:53:40 INFO mapred. Jobclient:map Output records=4
12/10/18 22:53:40 INFO mapred. Jobclient:reduce input records=4

To view the results of a run:

In the output directory, you can see the output file for the WordCount program. In addition, you can also see a logs folder, which will have a running log.

DFS locations There will be an elephant below, there will be a folder, that is, the root directory of HDFs, here is the display of the Distributed File system directory structure. So far, the Eclipse Hadoop development environment configuration has been completely built. Finally, you can develop Hadoop programs in eclipse like a normal Java program. Haha, it's done.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.