[Ganzhou] How to Run mahout itemcf on hadoop on cdh5.2

Source: Internet
Author: User
Environment:Hadoop-2.5.0-cdh5.2.0
Mahout-0.9-cdh5.2.0 Steps:The basic idea is to introduce all jar packages under mahout into hadoop's classpath, So we modified $ hadoop_home/etc/hadoop/hadoop-env.sh, add the following code to introduce all jar packages of mahout into hadoop's classpath: For B in $ mahout_home/lib /*. jar; do if ["$ hadoop_classpath"]; then export hadoop_classpath = $ hadoop_classpath: $ B else export hadoop_classpath = $ B fidone
For C in $ mahout_home/*. jar; do if ["$ hadoop_classpath"]; then export hadoop_classpath = $ hadoop_classpath: $ C else export hadoop_classpath = $ C fidone
After adding the code, prepare the basic data, upload the jar package, execute the command: hadoop jar gul-itemcf-hadoop.jar itemcfhadoop

Note:This jar package does not contain all the dependent packages and only contains mapreduce classes. Adding Maven to all dependent packages will lead to a very bloated jar package. This method is not elegant and will increase the network and memory load, so give up. Problem EncounteredThe execution of the first job is successfully completed. Starting from the second job, the following error occurs: Java. Lang. classnotfoundexception: org. Apache. mahout. Math. vector.
Then throw the following exception in thread "Main" Java. io. filenotfoundexception: file does not exist:/recommendersystem/jilinsmepsp/recommenderengine/service/guessulike/tmp/1414042683946/preparepreferencematrix/numuser S. bin

I believe everyone understands the first error. This is a problem that hadoop does not recognize a third-party (mahout) dependent jar package. SolutionFirst of all, it is good to add hadoop_classpath to $ hadoop_home/etc/hadoop/hadoop-env.sh, because after removing the first added statement, even the CF-related class of mahout could not be recognized, but why does it only recognize a part? Is it a conflict? Later, I also made some detours, referring to the many tricks on the Internet, for example, copying the jar package to $ hadoop_home/lib, but none of the tricks were good, the final thought is back to the "package conflict" idea. Ultimate SolutionBy comparing several jar packages under $ mahout_home, Mahout-core-0.9-cdh5.2.0-job.jarContains all the classes required to execute the job and repeat with the mahout-math-0.9-cdh5.2.0.jar to include Org. apache. mahout. math. vector, it seems that this class is not recognized due to conflicts. Therefore, the ultimate solution is very simple, In $ Hadoop_home/etc/hadoop/hadoop-env.sh in Introduce a jar package.: Export hadoop_classpath = $ hadoop_classpath: $ mahout_home/mahout-core-0.9-cdh5.2.0-job.jar
Then, the program is successfully executed, and the world is bright!

[Ganzhou] How to Run mahout itemcf on hadoop on cdh5.2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.