Hadoop job submission Analysis (2)

Source: Internet
Author: User

Http://www.cnblogs.com/spork/archive/2010/04/11/1709380.html

In the previous article, we analyzed the bin/hadoop script and learned the Basic settings required for submitting a hadoop job and the class for truly executing the task. In this article, we will analyze the class org. Apache. hadoop. util. runjar for submitting the task to see what it has done internally.

Runjar is a tool class in hadoop. Its structure is very simple and there are only two methods: Main and unjar. We start from main step by step.

Main first checks whether the passing parameter meets the requirements, then obtains the name of the jar package from the first passing parameter, and tries to obtain the manifest information from the jar package to find mainclass name. If mainclass name cannot be found, set the second parameter to mainclass name.

Next, create a temporary folder under "hadoop. tmp. dir" and close the deletion thread on the Mount. This temporary folder is used to place the decompressed jar package content. Unjar is used to decompress the jar package. You can use jarentry to obtain the contents of the jar package one by one, including folders and files, and then release the package to a temporary folder.

After decompression, add the classpath and add the decompressed Temporary Folder, passed jar package, the classes folder in the temporary folder, and all jar packages in the lib to the classpath. Next, create a urlclassloader using the classpath as the search URL (note that the parent of this Class Loader includes the classpath that was just submitted by the bin/hadoop script ), and set it to the context class loader of the current thread.

Finally, use class. the forname method uses the previous urlclassloader as the class loader to dynamically generate a Class Object of mainclass and obtain its main method, then the main method is called by passing the remaining parameters in the parameter as the call parameter.

Well, from the above analysis, this runjar class is a very simple class. It is to decompress the passed jar package, add some classpath, and then dynamically call the main method of mainclass in the jar package. Here, I think you should know how to use JavaCodeTo compile an alternative bin/hadoopProgramThe main steps are as follows:

    • Add hadoop dependent libraries and configuration files;
    • Decompress the jar package, add some classpath, and call the corresponding method dynamically. The lazy method is to use the runjar class directly.

The next section describes how to implement such a Java program.

To be continued...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.