Hadoop learning notes (10)-Build a source code learning environment

Source: Internet
Author: User

Hadoop Study Notes (10)

-- Build a source code learning environment

In the previous chapter, we have a preliminary understanding of the entire hadoop directory and the source code directory. Next we plan to learn more about this image. But what is the code used ?, What about single-step debugging? It is difficult to adjust the program. If you want to trace the variables, it is difficult to check the execution path.

So here, we need to set up this debugging environment. The main code of hadoop is written in Java, so eclipse is used as the environment here.

In the hadoop directory, you can perform operations for an Eclipse project, but I don't want to. I want to create a project myself and add its code myself.

Create a common Java project:

Click "Next", enter the project name "hadoopsrcstudy", and then click "Next ".

Next, complete the following steps:

 

Next, add the source code, open the SRC folder under hadoop, and copy what? Let's learn the core directories, such as core, HDFS, and marped, and copy them to the project. (How to copy? First select three folders, then return to eclipse, select the hadoopsrcstudy project, and then press Ctrl + V)

Okay, come in, But now these three folders cannot be compiled as source code, so our right Jian project properties:

Select Java build path, select source on the tab on the right, and click Add Folder:

On the displayed page, select the core, HDFS, and mapred directories, and click OK twice to complete the settings.

Then let's look at the project. The three directories are already the same as the icons in the SRC folder, so the Java program in them is also used as the source code and compiled, but more than 2 k errors are found. What's going on? Is it difficult to reference other source code files? The answer is to reduce the number of jar packages.

Therefore, we should first create a jar folder under the source code directory. Copy the jar files in the following directories.

Hadoop-0.20.2/build/Ivy/lib/hadoop/common/*. Jar

Hadoop-0.20.2/lib/jsp-2.1/*. Jar

Hadoop-0.20.2/lib/kfs-0.2.2.jar

Hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar

Right-click the project and select the property page. On the buildpath page, select libraiers:

Click Add jars:

Select All jar files in the jar folder and click OK twice.

These bugs are immediately reduced:

But there are still, and they are all in the rcctask file, so temporarily remove it first, right-click the file and choose build path-> exclude from the right menu.

Okay, now there is no bug.

Then put the hadoop-0.20.2, core-site.xml, hdfs-site.xml, log4j. properties files under the conf folder under the mapred-site.xml directory, under the src directory,

Copy webapps under the SRC folder under the hadoop-0.20.2 directory to the src directory.

In eclipse, create a package named Org. apache. hadoop, then copy the hadoop-0.20.2 \ build \ SRC \ org \ apahe \ hadoop \ package-info.java file to the package. The directory is as follows:

In this way, the source code debugging environment is OK.

 

Run hadoop in eclipse

The source code has been added and compiled. You have to run it in eclipse to check whether it can run properly.

Here we will try to execute namenode in the command line, then run datanode in eclipse, then open a command line, and use the FS command to check whether the previous content can be found.

1. Open the command line, enter the hadoop-0.20.2 directory, execute bin/hadoop namenode

2. in eclipse, go to the HDFS directory, and then go to Org. apache. hadoop. HDFS. server. datanode directory, open datanode. java file, and then click the run, then you can see the normal output information in eclipse, and there is no error. You can find the datanode log in the log folder. The content is the same. In the preceding command line form, you can see that the namenode program receives a datanode access request.

3. Open a command line window, enter the hadoop-0.20.2 directory bin/hadoop FS-ls, you can see the output file list.

4. Then enter the command bin/hadoop FS-cat out/* to view the data generated by the program running in the out directory.

If the preceding two commands are successfully executed, both namenode and datanode running in eclipse work. You can observe that when we execute the cat command, we will see a new response output in the output box of Eclipse, indicating that it is working.

Similarly, we can run namenode in eclipse and datanode in the command line. The same effect.

To view more debugging log output, open the log4j. properties file under SRC and Change Info to debug in the second line. The output content will be more detailed.

 

So far, our source code learning environment has been set up to facilitate debugging and even modifying hadoop code in eclipse.

 

Okay ~~ So far, the first season of work is closed. Let the thinking fly for a while, wait for later to read some hadoop source code, and then share it. At that time, we will keep the same processing method: starting with simple. Program Analysis starts with the main function.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.