Build a Hadoop source code learning Environment

Source: Internet
Author: User

Hadoop file directory Related

The previous article put Hadoop as a simple primer, download the source code, wrote HelloWorld, briefly analyzed its programming essentials, and then made a more complex example. Next look at its source code, the study of how to achieve.

Research the source, then we will first look at the overall Hadoop-0.20.2 directory:

This is the content in the directory list just after the code is finished

Catalog/File

Description

Bin

Below is the executable sh named, all operations are here

Conf

directory where the configuration files are located

Ivy

Apache Ivy is dedicated to managing the project's jar dependencies, which is the main directory of Ivy

Lib

Referenced library file directory, which stores the jar package used

Src

This is the main source.

Build.xml

The configuration file used for compiling. We're compiling with Ant.

CHANGES.txt

Text file that records the change history of this version

Ivy.xml

Ivy's configuration file

LICENSE.txt

Document This document

NOTICE.txt

A text file that records the place to be noticed

README.txt

Description file.

Into the SRC directory, you can see the contents as shown:

Build a learning environment for Hadoop source

To create a common Java project:

Click Next, enter the project name: Hadoopsrcstudy, and then next

Then all defaults to the next, then finish finishes:

Next, add the source code, open the SRC folder under Hadoop, and copy the core,hdfs,marped three directories into the project. (first select the three folders Ctrl + C, then go back to Eclipse, select the Hadoopsrcstudy project, and then press CTRL + V directly)

After the folder is added, now these three folders can not be compiled as source code, so we right-Jian project properties:

Then select Java Build Path, select Source on the Right tab, then click Add Folder:

In the pop-up page, select Core, HDFs, mapred three directories, then click OK two times to complete the setup.

Create a Jar folder in the source directory. The jar files in the following directories are then copied in.

Hadoop-0.20.2/build/ivy/lib/hadoop/common/*.jar

Hadoop-0.20.2/lib/jsp-2.1/*.jar

Hadoop-0.20.2/lib/kfs-0.2.2.jar

Hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar

Then right Jian project, select Property page, on BuildPath page, select Libraiers:

Click Add Jars:

Select all the jar files under the Jar folder and click OK two times.

There are still bugs in the Rcctask file:

On the right-hand menu of the file, build Path->exclude.

Then the hadoop-0.20.2 directory under the Conf folder Core-site.xml, Hdfs-site.xml, Mapred-site.xml, log4j.properties These files, placed in the SRC directory,

Under the SRC folder under the hadoop-0.20.2 directory, copy the WebApps to the SRC directory.

In eclipse, the SRC directory is built with a package named: Org.apache.hadoop and then hadoop-0.20.2\build\src\org\apahe\hadoop\ Package-info.java file, copy it to the package. The directory is as follows:

This is done by the source debugging environment.

Let Hadoop run in eclipse

The source code has been added, and has been compiled and passed, then you have to run in Eclipse, try to run normally.

Here we try to execute namenode with the command line, then run Datanode with Eclipse, and then open a command line, with the FS command, whether the previous content can be found.

1. Open terminal, CD into hadoop-0.20.2 directory, execute bin/hadoop namenode command

The following error occurred:

14/12/15 17:31:47 INFO datanode. Datanode:startup_msg:
/************************************************************
startup_msg:starting DataNode
startup_msg:host = ubuntu/127.0.1.1
Startup_msg:args = []
startup_msg:version = 0.20.2
Startup_msg:build =-r; compiled by ' Wu ' on Sun Nov 07:50:30 PST
************************************************************/
14/12/15 17:31:49 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 0 time (s).
14/12/15 17:31:50 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 1 time (s).
14/12/15 17:31:51 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 2 time (s).
14/12/15 17:31:52 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 3 time (s).
14/12/15 17:31:53 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 4 time (s).
14/12/15 17:31:54 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 5 time (s).
14/12/15 17:31:55 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 6 time (s).

Terminal Input Command:

bin/start-all.sh

Console normal output:

14/12/15 17:34:25 INFO Datanode. Datanode:startup_msg:
/************************************************************
Startup_msg:starting DataNode
Startup_msg:host = ubuntu/127.0.1.1
Startup_msg:args = []
Startup_msg:version = 0.20.2
Startup_msg:build =-R; Compiled by ' Wu ' on Sun Nov 07:50:30 PST 2014
************************************************************/
14/12/15 17:34:25 INFO Common. Storage:storage Directory/tmp/hadoop-wu/dfs/data is not formatted.
14/12/15 17:34:25 INFO Common. Storage:formatting ...
14/12/15 17:34:26 INFO Datanode. Datanode:registered Fsdatasetstatusmbean
14/12/15 17:34:26 INFO Datanode. datanode:opened Info Server at 50010
14/12/15 17:34:26 INFO Datanode. Datanode:balancing Bandwith is 1048576 bytes/s
14/12/15 17:34:26 INFO mortbay.log:Logging to Org.slf4j.impl.Log4jLoggerAdapter (Org.mortbay.log) via Org.mortbay.log.Slf4jLog
14/12/15 17:34:26 INFO http. Httpserver:port returned by Webserver.getconnectors () [0].getlocalport () before open () is-1. Opening the Listener on 50075
14/12/15 17:34:26 INFO http. HttpServer:listener.getLocalPort () returned 50075 webserver.getconnectors () [0].getlocalport () returned 50075
14/12/15 17:34:26 INFO http. Httpserver:jetty bound to Port 50075
14/12/15 17:34:26 INFO mortbay.log:jetty-6.1.14
14/12/15 17:34:33 INFO mortbay.log:Started [email protected]:50075
14/12/15 17:34:34 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=datanode, sessionid=null
14/12/15 17:34:34 INFO Metrics. Rpcmetrics:initializing RPC Metrics with Hostname=datanode, port=50020
14/12/15 17:34:34 INFO IPC. SERVER:IPC Server responder:starting
14/12/15 17:34:34 INFO IPC. SERVER:IPC Server Listener on 50020:starting
14/12/15 17:34:34 INFO IPC. SERVER:IPC Server Handler 0 on 50020:starting

2. In Eclipse, enter the HDFs directory, Then go to the Org.apache.hadoop.hdfs.server.datanode directory, open the Datanode.java file, then click on the above to run, then you can see in eclipse, the normal output information, and no errors. This information can be found in the log folder, the Datanode logs, the content is the same. Also in the previous command line form, you can see that an Datanode access request was received in the Namenode program.

3. Open a command-line window, enter the hadoop-0.20.2 directory Bin/hadoop Fs–ls, you can see the output file list.

4. Then enter the command Bin/hadoop Fs-cat out/* to see the data that was generated in the Out directory before the program was run.

If all two of the above commands succeed, the Namenode and Datanode running in eclipse work. It can be observed that when we execute the cat command, in the output box in eclipse, we see a new response output indicating that it is working.

Again, we can, in turn, run Namenode in Eclipse and run Datanode on the command line. The same effect.

In order to see more debug log output, we can also open src under the log4j.properties file, the second line in the information to debug, so that the output will be more detailed.

Resources

Http://www.cnblogs.com/zjfstudio/p/3919331.html

Build a Hadoop source code learning Environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.