Http://www.cnblogs.com/spork/archive/2010/04/21/1717592.html
After the analysis in the previous article, we know whether the hadoop job is submitted to cluster or local, which is closely related to the configuration file parameters in the conf folder, many other classes are related to Conf, so remember to put conf in your classpath when submitting a job.
Because configuration uses the class loader of the current thread context to load resources and files, we use the Dynamic Loading Method to add the corresponding dependent libraries and resources first, then construct a urlclassloader as the class loader of the current thread context.
Public Static Classloader getclassloader (){
Classloader parent = Thread. currentthread (). getcontextclassloader ();
If (Parent = Null ){
Parent = Ejob. Class . Getclassloader ();
}
If (Parent = Null ){
Parent = Classloader. getsystemclassloader ();
}
Return New Urlclassloader (classpath. toarray ( New URL [ 0 ]), Parent );
}
CodeIt's very simple. I will not talk much about it. The call example is as follows:
Ejob. addclasspath ("/Usr/lib/hadoop-0.20/Conf");
Classloader=Ejob. getclassloader ();
Thread. currentthread (). setcontextclassloader (classloader );
After the class loader is set up, the next step is to package the JAR file, that is, let the project self-package its class as a jar package. Here I take the standard Eclipse project folder layout as an example, the package is the class in the bin folder.
Public Static File createtempjar (string root) Throws Ioexception {
If ( ! New File (Root). exists ()){
Return Null ;
}
Manifest manifest = New Manifest ();
Manifest. getmainattributes (). putvalue ( " Manifest-version " , " 1.0 " );
Final File jarfile = File. createtempfile ( " Ejob- " , " . Jar " , New File (System
. Getproperty ( " Java. Io. tmpdir " )));
Runtime. getruntime (). addshutdownhook ( New Thread (){
Public Void Run (){
Jarfile. Delete ();
}
});
Jaroutputstream out = New Jaroutputstream ( New Fileoutputstream (jarfile ),
Manifest );
Createtempjarinner (Out, New File (Root ), "" );
Out. Flush ();
Out. Close ();
Return Jarfile;
}
Private Static Void Createtempjarinner (jaroutputstream out, file F,
String Base) Throws Ioexception {
If (F. isdirectory ()){
File [] fl = F. listfiles ();
If (Base. Length () > 0 ){
Base = Base + " / " ;
}
For ( Int I = 0 ; I < Fl. length; I ++ ){
Createtempjarinner (Out, FL [I], base + FL [I]. getname ());
}
} Else {
Out. putnextentry ( New Jarentry (base ));
Fileinputstream in = New Fileinputstream (f );
Byte [] Buffer = New Byte [ 1024 ];
Int N = In. Read (buffer );
While (N ! = - 1 ){
Out. Write (buffer, 0 , N );
N = In. Read (buffer );
}
In. Close ();
}
}
The external interface here is createtempjar. The receiving parameter is the root path of the folder to be packaged, and the package of subfolders is supported. Use the recursive processing method to package the structure and files in the folder into jar in sequence. It is very simple, that is, basic file stream operations. What is strange is manifest and jaroutputstream, and you can check the API.
Well, everything is ready. We have nothing to worry about. Let's try it out. Take wordcount for example:
// Add these statements. xxx
File jarfile = Ejob. createtempjar ("bin ");
Ejob. addclasspath ("/usr/lib/hadoop-0.20/conf ");
Classloader = Ejob. getclassloader ();
Thread. currentthread (). setcontextclassloader (classloader );
Configuration Conf = New Configuration ();
String [] otherargs = New Genericoptionsparser (Conf, argS)
. Getremainingargs ();
If (Otherargs. Length ! = 2 ){
System. Err. println ( " Usage: wordcount <in> <out> " );
System. Exit ( 2 );
}
Job job = New Job (Conf, " Word Count " );
Job. setjarbyclass (wordcounttest. Class );
Job. setmapperclass (tokenizermapper. Class );
Job. setcombinerclass (intsumreducer. Class );
Job. setreducerclass (intsumreducer. Class );
Job. setoutputkeyclass (text. Class );
Job. setoutputvalueclass (intwritable. Class );
Fileinputformat. addinputpath (job, New PATH (otherargs [ 0 ]);
Fileoutputformat. setoutputpath (job, New PATH (otherargs [ 1 ]);
System. Exit (job. waitforcompletion ( True ) ? 0 : 1 );
Run as Java application...!!!No job jar file set...Exception, It seems thatThe statement job. setjarbyclass (wordcounttest. Class) fails to set the jar package of the job. Why?
This method uses the wordcount. Class class loader to find the jar package containing the class, and then sets the jar package as the jar package used by the job. However, the jar package of our job isProgramThe class loader of wordcount. Class is appclassloader. After running, we cannot change its search path. Therefore, the jar package of the job cannot be set using setjarbyclass. We must use setjar in jobconf to directly set the job jar package, as shown below:
(Jobconf) job. getconfiguration (). setjar (jarfile );
Okay. Let's modify the preceding example and add the preceding statement.
Job job= NewJob (Conf,"Word Count");
//And add this statement. xxx
(Jobconf) job. getconfiguration (). setjar (jarfile. tostring ());
Run as Java application again, and finally OK ~~
Run on hadoop of this method is easy to use and has good compatibility. We recommend that you try it. :)
In this example, due to the time relationship, only the pseudo-distributed test is performed on Ubuntu, but it can be used in theory.
> Click here to download <