Hadoop file directory Related
The previous article put Hadoop as a simple primer, download the source code, wrote HelloWorld, briefly analyzed its programming essentials, and then made a more complex example. Next look at its source code, the study of how to achieve.
Research the source, then we will first look at the overall Hadoop-0.20.2 directory:
This is the content in the directory list just after the code is finished
Catalog/File |
Description |
Bin |
Below is the executable sh named, all operations are here |
Conf |
directory where the configuration files are located |
Ivy |
Apache Ivy is dedicated to managing the project's jar dependencies, which is the main directory of Ivy |
Lib |
Referenced library file directory, which stores the jar package used |
Src |
This is the main source. |
Build.xml |
The configuration file used for compiling. We're compiling with Ant. |
CHANGES.txt |
Text file that records the change history of this version |
Ivy.xml |
Ivy's configuration file |
LICENSE.txt |
Document This document |
NOTICE.txt |
A text file that records the place to be noticed |
README.txt |
Description file. |
Into the SRC directory, you can see the contents as shown:
Build a learning environment for Hadoop source
To create a common Java project:
Click Next, enter the project name: Hadoopsrcstudy, and then next
Then all defaults to the next, then finish finishes:
Next, add the source code, open the SRC folder under Hadoop, and copy the core,hdfs,marped three directories into the project. (first select the three folders Ctrl + C, then go back to Eclipse, select the Hadoopsrcstudy project, and then press CTRL + V directly)
After the folder is added, now these three folders can not be compiled as source code, so we right-Jian project properties:
Then select Java Build Path, select Source on the Right tab, then click Add Folder:
In the pop-up page, select Core, HDFs, mapred three directories, then click OK two times to complete the setup.
Create a Jar folder in the source directory. The jar files in the following directories are then copied in.
Hadoop-0.20.2/build/ivy/lib/hadoop/common/*.jar
Hadoop-0.20.2/lib/jsp-2.1/*.jar
Hadoop-0.20.2/lib/kfs-0.2.2.jar
Hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
Then right Jian project, select Property page, on BuildPath page, select Libraiers:
Click Add Jars:
Select all the jar files under the Jar folder and click OK two times.
There are still bugs in the Rcctask file:
On the right-hand menu of the file, build Path->exclude.
Then the hadoop-0.20.2 directory under the Conf folder Core-site.xml, Hdfs-site.xml, Mapred-site.xml, log4j.properties These files, placed in the SRC directory,
Under the SRC folder under the hadoop-0.20.2 directory, copy the WebApps to the SRC directory.
In eclipse, the SRC directory is built with a package named: Org.apache.hadoop and then hadoop-0.20.2\build\src\org\apahe\hadoop\ Package-info.java file, copy it to the package. The directory is as follows:
This is done by the source debugging environment.
Let Hadoop run in eclipse
The source code has been added, and has been compiled and passed, then you have to run in Eclipse, try to run normally.
Here we try to execute namenode with the command line, then run Datanode with Eclipse, and then open a command line, with the FS command, whether the previous content can be found.
1. Open terminal, CD into hadoop-0.20.2 directory, execute bin/hadoop namenode command
The following error occurred:
14/12/15 17:31:47 INFO datanode. Datanode:startup_msg:
/************************************************************
startup_msg:starting DataNode
startup_msg:host = ubuntu/127.0.1.1
Startup_msg:args = []
startup_msg:version = 0.20.2
Startup_msg:build =-r; compiled by ' Wu ' on Sun Nov 07:50:30 PST
************************************************************/
14/12/15 17:31:49 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 0 time (s).
14/12/15 17:31:50 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 1 time (s).
14/12/15 17:31:51 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 2 time (s).
14/12/15 17:31:52 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 3 time (s).
14/12/15 17:31:53 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 4 time (s).
14/12/15 17:31:54 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 5 time (s).
14/12/15 17:31:55 INFO IPC. Client:retrying Connect to server:localhost/127.0.0.1:9000. Already tried 6 time (s).
Terminal Input Command:
bin/start-all.sh
Console normal output:
14/12/15 17:34:25 INFO Datanode. Datanode:startup_msg:
/************************************************************
Startup_msg:starting DataNode
Startup_msg:host = ubuntu/127.0.1.1
Startup_msg:args = []
Startup_msg:version = 0.20.2
Startup_msg:build =-R; Compiled by ' Wu ' on Sun Nov 07:50:30 PST 2014
************************************************************/
14/12/15 17:34:25 INFO Common. Storage:storage Directory/tmp/hadoop-wu/dfs/data is not formatted.
14/12/15 17:34:25 INFO Common. Storage:formatting ...
14/12/15 17:34:26 INFO Datanode. Datanode:registered Fsdatasetstatusmbean
14/12/15 17:34:26 INFO Datanode. datanode:opened Info Server at 50010
14/12/15 17:34:26 INFO Datanode. Datanode:balancing Bandwith is 1048576 bytes/s
14/12/15 17:34:26 INFO mortbay.log:Logging to Org.slf4j.impl.Log4jLoggerAdapter (Org.mortbay.log) via Org.mortbay.log.Slf4jLog
14/12/15 17:34:26 INFO http. Httpserver:port returned by Webserver.getconnectors () [0].getlocalport () before open () is-1. Opening the Listener on 50075
14/12/15 17:34:26 INFO http. HttpServer:listener.getLocalPort () returned 50075 webserver.getconnectors () [0].getlocalport () returned 50075
14/12/15 17:34:26 INFO http. Httpserver:jetty bound to Port 50075
14/12/15 17:34:26 INFO mortbay.log:jetty-6.1.14
14/12/15 17:34:33 INFO mortbay.log:Started [email protected]:50075
14/12/15 17:34:34 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=datanode, sessionid=null
14/12/15 17:34:34 INFO Metrics. Rpcmetrics:initializing RPC Metrics with Hostname=datanode, port=50020
14/12/15 17:34:34 INFO IPC. SERVER:IPC Server responder:starting
14/12/15 17:34:34 INFO IPC. SERVER:IPC Server Listener on 50020:starting
14/12/15 17:34:34 INFO IPC. SERVER:IPC Server Handler 0 on 50020:starting
2. In Eclipse, enter the HDFs directory, Then go to the Org.apache.hadoop.hdfs.server.datanode directory, open the Datanode.java file, then click on the above to run, then you can see in eclipse, the normal output information, and no errors. This information can be found in the log folder, the Datanode logs, the content is the same. Also in the previous command line form, you can see that an Datanode access request was received in the Namenode program.
3. Open a command-line window, enter the hadoop-0.20.2 directory Bin/hadoop Fs–ls, you can see the output file list.
4. Then enter the command Bin/hadoop Fs-cat out/* to see the data that was generated in the Out directory before the program was run.
If all two of the above commands succeed, the Namenode and Datanode running in eclipse work. It can be observed that when we execute the cat command, in the output box in eclipse, we see a new response output indicating that it is working.
Again, we can, in turn, run Namenode in Eclipse and run Datanode on the command line. The same effect.
In order to see more debug log output, we can also open src under the log4j.properties file, the second line in the information to debug, so that the output will be more detailed.
Resources
Http://www.cnblogs.com/zjfstudio/p/3919331.html
Build a Hadoop source code learning Environment