How to make your jobs run in a distributed manner in a hadoop Cluster

Source: Internet
Author: User

How to makeProgramDistributed running in a hadoop cluster is a headache.

Someone may say that right-click "run on hadoop" in the eclipse class file. Note: by default, "run on hadoop" in Eclipse only runs on a single machine, because in order to make programs run in a distributed manner in a cluster, it also involves uploading class files and distributing them to various nodes, A simple "run on hadoop" only starts the local hadoop class library to run your program. On the hadoop jobtracker Web Management page (http: // localhost: 50030) you cannot see any job information, because your job is not running on the cluster.

Hadoop: the definitive guide 3rd edition describes how to use a jar package, and then run the distributed program using the jar option of the hadoop script command. As follows:
Hadoop jar hadoop-examples.jar v3.maxtemperaturedriver-conf CONF/hadoop-cluster.xml input/ncdc/all Max-temp.

But the problem is that Maven is used in this book to compile class files. XML easily solves the dependency problem of jar packages, which makes it very easy for me to use eclipse to program programs that often depend on IDE, the configuration and usage of Maven have not been understood yet. simply do not learn it. Open a jar package by yourself.

To solve the problem of class file reference, take the hbase jar package (hbase-0.94.3.jar) I used in the program as an example, I have tried a variety of methods to solve class file reference and create a jar package. In any case, I will always prompt that I cannot find Org. apache. hadoop. hbase. util. bytes class file (it's in the hbase-0.94.3.jar), the following methods I 've tried:

1. Set classpath to hbase-0.94.3.jar
2. Set hadoop_classpath to hbase-0.94.3.jar
3. Set hadoop_classpath to hbase_home
4. package the hbase-0.94.3.jar into the running jar package (which includes 1. directly enter the jar package; 2. put the Lib folder and enter the jar package. 3. set the class-path option to point to the hbase-0.94.3.jar through the manifest file)
5. Copy the hbase-0.94.3.jar to hadoop_home/lib

At this time, I am approaching a crash. I have spent nearly three days solving this problem, but I still have nothing to gain. I simply did not make changes to other things, more than 10 days have passed.

After more than 10 days of busy work, another problem-solving problem emerged. Since I can't package myself in the command line, why don't I use Eclipse? (in the end, I still succumb to IDE, I despise myself !). At the beginning, the entire project was directly exported as a jar package, but this only contained the source file compiled by myself and the dependent jar package in the Lib folder, there is no reference to third-party jar packages such as hadoop-core. According to this idea, Google will discuss how to completely include the referenced third-party jar packages, I found it:Fat-jar!()

Decompress the package and you will getNet. SF. fjep. fatjar_0.0.31.jar(Different versions may have different names). Copy the jar package to eclipse_home/plugins and restart eclipse.Windows => prefernce => fat jar preference indicates that the installation is successful.

Next,Right-click the Java project to be exported, select "Export", and select "Fat jar exporter" from "Others" to perform the packaging operation. Select "Main-class" and the files to be packaged. After "finish", you will get a complete jar package, so that you can run it anywhere using the hadoop jar command. Take my program as an example:
Hadoop jar hdg3.jar CF/ratedataimporter // hdg3.jar is the name of my jar package, 64 MB in total,CF/ratedataimporter: name of the class file to be run

On the jobtracker web management interface, you can clearly view the job information, as well as the running progress and status, I feel pretty cool (although my cluster has only two nodes )!!

So far, the hadoop cluster that has been tossing for a long time has finally solved the Distributed Job problem!

I am grateful for all the above!

PS: In the comments, deefox gave me support for a simpler method, that is, export it as runnable jar directly and like it!

This article is original and reprinted with the source:Http://www.cnblogs.com/beanmoon/archive/2013/05/09/3068729.html

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.