Hadoop job submission analysis (5)

Source: Internet
Author: User

Http://www.cnblogs.com/spork/archive/2010/04/21/1717592.html

 

After the analysis in the previous article, we know whether the hadoop job is submitted to cluster or local, which is closely related to the configuration file parameters in the conf folder, many other classes are related to Conf, so remember to put conf in your classpath when submitting a job.

Because configuration uses the class loader of the current thread context to load resources and files, we use the Dynamic Loading Method to add the corresponding dependent libraries and resources first, then construct a urlclassloader as the class loader of the current thread context.

    Public     Static  Classloader getclassloader (){
Classloader parent = Thread. currentthread (). getcontextclassloader ();
If (Parent = Null ){
Parent = Ejob. Class . Getclassloader ();
}
If (Parent = Null ){
Parent = Classloader. getsystemclassloader ();
}
Return New Urlclassloader (classpath. toarray ( New URL [ 0 ]), Parent );
}

CodeIt's very simple. I will not talk much about it. The call example is as follows:

  Ejob. addclasspath ("/Usr/lib/hadoop-0.20/Conf");
Classloader=Ejob. getclassloader ();
Thread. currentthread (). setcontextclassloader (classloader );

After the class loader is set up, the next step is to package the JAR file, that is, let the project self-package its class as a jar package. Here I take the standard Eclipse project folder layout as an example, the package is the class in the bin folder.

  Public    Static  File createtempjar (string root)  Throws  Ioexception {
If ( ! New File (Root). exists ()){
Return Null ;
}
Manifest manifest = New Manifest ();
Manifest. getmainattributes (). putvalue ( " Manifest-version " , " 1.0 " );
Final File jarfile = File. createtempfile ( " Ejob- " , " . Jar " , New File (System
. Getproperty ( " Java. Io. tmpdir " )));

Runtime. getruntime (). addshutdownhook ( New Thread (){
Public Void Run (){
Jarfile. Delete ();
}
});

Jaroutputstream out = New Jaroutputstream ( New Fileoutputstream (jarfile ),
Manifest );
Createtempjarinner (Out, New File (Root ), "" );
Out. Flush ();
Out. Close ();
Return Jarfile;
}

Private Static Void Createtempjarinner (jaroutputstream out, file F,
String Base) Throws Ioexception {
If (F. isdirectory ()){
File [] fl = F. listfiles ();
If (Base. Length () > 0 ){
Base = Base + " / " ;
}
For ( Int I = 0 ; I < Fl. length; I ++ ){
Createtempjarinner (Out, FL [I], base + FL [I]. getname ());
}
} Else {
Out. putnextentry ( New Jarentry (base ));
Fileinputstream in = New Fileinputstream (f );
Byte [] Buffer = New Byte [ 1024 ];
Int N = In. Read (buffer );
While (N ! = - 1 ){
Out. Write (buffer, 0 , N );
N = In. Read (buffer );
}
In. Close ();
}
}

The external interface here is createtempjar. The receiving parameter is the root path of the folder to be packaged, and the package of subfolders is supported. Use the recursive processing method to package the structure and files in the folder into jar in sequence. It is very simple, that is, basic file stream operations. What is strange is manifest and jaroutputstream, and you can check the API.

Well, everything is ready. We have nothing to worry about. Let's try it out. Take wordcount for example:

   //  Add these statements. xxx  
File jarfile = Ejob. createtempjar ("bin ");
Ejob. addclasspath ("/usr/lib/hadoop-0.20/conf ");
Classloader = Ejob. getclassloader ();
Thread. currentthread (). setcontextclassloader (classloader );

Configuration Conf = New Configuration ();
String [] otherargs = New Genericoptionsparser (Conf, argS)
. Getremainingargs ();
If (Otherargs. Length ! = 2 ){
System. Err. println ( " Usage: wordcount <in> <out> " );
System. Exit ( 2 );
}

Job job = New Job (Conf, " Word Count " );
Job. setjarbyclass (wordcounttest. Class );
Job. setmapperclass (tokenizermapper. Class );
Job. setcombinerclass (intsumreducer. Class );
Job. setreducerclass (intsumreducer. Class );
Job. setoutputkeyclass (text. Class );
Job. setoutputvalueclass (intwritable. Class );
Fileinputformat. addinputpath (job, New PATH (otherargs [ 0 ]);
Fileoutputformat. setoutputpath (job, New PATH (otherargs [ 1 ]);
System. Exit (job. waitforcompletion ( True ) ? 0 : 1 );

Run as Java application...!!!No job jar file set...Exception, It seems thatThe statement job. setjarbyclass (wordcounttest. Class) fails to set the jar package of the job. Why?

This method uses the wordcount. Class class loader to find the jar package containing the class, and then sets the jar package as the jar package used by the job. However, the jar package of our job isProgramThe class loader of wordcount. Class is appclassloader. After running, we cannot change its search path. Therefore, the jar package of the job cannot be set using setjarbyclass. We must use setjar in jobconf to directly set the job jar package, as shown below:

 
  (Jobconf) job. getconfiguration (). setjar (jarfile );

Okay. Let's modify the preceding example and add the preceding statement.

  Job job= NewJob (Conf,"Word Count");
//And add this statement. xxx
(Jobconf) job. getconfiguration (). setjar (jarfile. tostring ());

Run as Java application again, and finally OK ~~

Run on hadoop of this method is easy to use and has good compatibility. We recommend that you try it. :)

In this example, due to the time relationship, only the pseudo-distributed test is performed on Ubuntu, but it can be used in theory.

> Click here to download <

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.