Hadoop job submission analysis (5)

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.cnblogs.com/spork/archive/2010/04/21/1717592.html

After the analysis in the previous article, we know whether the hadoop job is submitted to cluster or local, which is closely related to the configuration file parameters in the conf folder, many other classes are related to Conf, so remember to put conf in your classpath when submitting a job.

Because configuration uses the class loader of the current thread context to load resources and files, we use the Dynamic Loading Method to add the corresponding dependent libraries and resources first, then construct a urlclassloader as the class loader of the current thread context.

    Public     Static  Classloader getclassloader (){
Classloader parent  =  Thread. currentthread (). getcontextclassloader ();
 If  (Parent  =     Null  ){
Parent  =  Ejob.  Class  . Getclassloader ();
}
  If  (Parent  =     Null  ){
Parent  = Classloader. getsystemclassloader ();
}
  Return     New  Urlclassloader (classpath. toarray (  New  URL [  0  ]), Parent );
}

CodeIt's very simple. I will not talk much about it. The call example is as follows:

  Ejob. addclasspath ("/Usr/lib/hadoop-0.20/Conf");
Classloader=Ejob. getclassloader ();
Thread. currentthread (). setcontextclassloader (classloader );

After the class loader is set up, the next step is to package the JAR file, that is, let the project self-package its class as a jar package. Here I take the standard Eclipse project folder layout as an example, the package is the class in the bin folder.

  Public    Static  File createtempjar (string root)  Throws  Ioexception {
  If  (  !  New  File (Root). exists ()){
  Return     Null  ;
}
Manifest manifest  =     New Manifest ();
Manifest. getmainattributes (). putvalue (  "  Manifest-version  "  ,  "  1.0  "  );
  Final  File jarfile  =  File. createtempfile (  "  Ejob-  "  , "  . Jar  "  ,  New  File (System
. Getproperty (  "  Java. Io. tmpdir  "  )));

Runtime. getruntime (). addshutdownhook (  New  Thread (){
  Public     Void Run (){
Jarfile. Delete ();
}
});

Jaroutputstream out  =     New  Jaroutputstream (  New  Fileoutputstream (jarfile ),
Manifest );
Createtempjarinner (Out,  New  File (Root ),  ""  );
Out. Flush ();
Out. Close ();
  Return  Jarfile;
}

 Private     Static     Void  Createtempjarinner (jaroutputstream out, file F,
String Base)  Throws  Ioexception {
  If  (F. isdirectory ()){
File [] fl  =  F. listfiles ();
  If  (Base. Length ()  >    0  ){
Base  =  Base  +     "  /  "  ;
}
  For  (  Int  I  =     0 ; I  <  Fl. length; I  ++  ){
Createtempjarinner (Out, FL [I], base  +  FL [I]. getname ());
}
}  Else  {
Out. putnextentry (  New  Jarentry (base ));
Fileinputstream in  =     New Fileinputstream (f );
  Byte  [] Buffer  =     New     Byte  [  1024  ];
  Int  N  =  In. Read (buffer );
  While  (N ! =     -  1  ){
Out. Write (buffer,  0  , N );
N  =  In. Read (buffer );
}
In. Close ();
}
}

The external interface here is createtempjar. The receiving parameter is the root path of the folder to be packaged, and the package of subfolders is supported. Use the recursive processing method to package the structure and files in the folder into jar in sequence. It is very simple, that is, basic file stream operations. What is strange is manifest and jaroutputstream, and you can check the API.

Well, everything is ready. We have nothing to worry about. Let's try it out. Take wordcount for example:

   //  Add these statements. xxx  
  File jarfile  = Ejob. createtempjar ("bin ");
Ejob. addclasspath ("/usr/lib/hadoop-0.20/conf ");
Classloader =  Ejob. getclassloader ();
Thread. currentthread (). setcontextclassloader (classloader );

Configuration Conf  =     New  Configuration ();
String [] otherargs =     New  Genericoptionsparser (Conf, argS)
. Getremainingargs ();
  If  (Otherargs. Length  ! =     2  ){
System. Err. println (  "  Usage: wordcount <in> <out>  "  );
System. Exit (  2 );
}

Job job  =     New  Job (Conf,  "  Word Count  "  );  
Job. setjarbyclass (wordcounttest.  Class  );
Job. setmapperclass (tokenizermapper.  Class  );
Job. setcombinerclass (intsumreducer.  Class );
Job. setreducerclass (intsumreducer.  Class  );
Job. setoutputkeyclass (text.  Class  );
Job. setoutputvalueclass (intwritable.  Class  );
Fileinputformat. addinputpath (job,  New  PATH (otherargs [  0  ]);
Fileoutputformat. setoutputpath (job,  New  PATH (otherargs [  1 ]);
System. Exit (job. waitforcompletion (  True  )  ?     0  :  1  );

Run as Java application...!!!No job jar file set...Exception, It seems thatThe statement job. setjarbyclass (wordcounttest. Class) fails to set the jar package of the job. Why?

This method uses the wordcount. Class class loader to find the jar package containing the class, and then sets the jar package as the jar package used by the job. However, the jar package of our job isProgramThe class loader of wordcount. Class is appclassloader. After running, we cannot change its search path. Therefore, the jar package of the job cannot be set using setjarbyclass. We must use setjar in jobconf to directly set the job jar package, as shown below:

   (Jobconf) job. getconfiguration (). setjar (jarfile );

Okay. Let's modify the preceding example and add the preceding statement.

  Job job= NewJob (Conf,"Word Count");
//And add this statement. xxx
(Jobconf) job. getconfiguration (). setjar (jarfile. tostring ());

Run as Java application again, and finally OK ~~

Run on hadoop of this method is easy to use and has good compatibility. We recommend that you try it. :)

In this example, due to the time relationship, only the pseudo-distributed test is performed on Ubuntu, but it can be used in theory.

> Click here to download <

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop job submission analysis (5)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support