Mahout (or Hadoop) takes precedence over loading jar packages with user-specified classpath

Source: Internet
Author: User
Tags hadoop mapreduce

mahout (or Hadoop) takes precedence over loading jar packages with user-specified classpath



Problem: When using mahout0.8, Java.lang.NoSuchMethodError:org.apache.lucene.util.PriorityQueue appears
Similar http://www.warski.org/blog/2013/10/using-amazons-elastic-map-reduce-to-compute-recommendations-with-apache-mahout-0-8/
Reason:
$HADOOP _home/lib There is an old version of the Lucene-core-3.6.0.jar jar package, Mahout Lib also has its own lucene-core jar package.
HADOOP will prioritize $hadoop_home/lib's jar packages,
So the mahout Lib of that Lucene-core-4.3.0.jar did not load, but loaded $hadoop_home/lib that old version of the.
The old version of Lucene-core-3.6.0.jar is not compatible with mahout0.8, so something went wrong.


Google search overwrite Hadoop classpath see
Http://stackoverflow.com/questions/11685949/overriding-default-hadoop-jars-in-class-path
For other versions of Hadoop, your ' re best to check with the Taskrunner.java class to
Confirm the name of the Config property after all this is a "Semi hidden config":
Static final String Mapreduce_user_classpath_first =
"Mapreduce.user.classpath.first"; A Semi-hidden config

The source code of Taskrunner.java in hadoop1.1.2
https://github.com/hobinyoon/hadoop-1.1.2/blob/8d91a5453c23ea9f643952a3961f78d31ca5f22c/src/mapred/org/apache/ Hadoop/mapred/taskrunner.java

So when calling Mahout, add the-dmapreduce.user.classpath.first=true option
$MAHOUT recommendfactorized-dmapreduce.user.classpath.first=true--numthreads--input $hdfs _user_ratings_path-- Userfeatures $hdfs _user_features_path--itemfeatures $hdfs _item_features_path--numrecommendations--output $hdfs _ Result_path--maxrating 1
The-d option is also valid for Hadoop.

In fact, the beginning is my reference to the following link approach:
Hadoop MapReduce Program jar Package version Conflict resolution method
Http://f.dataguru.cn/thread-58160-1-1.html
In general, when executing Mr with a HADOOP jar, the jar package under $hadoop_home/lib is loaded first,
Due to the use of Commons-net-1.4.1.jar in Hadoop, the 1.4.1 version is loaded first, ignoring the user's own specified version 3.2, so the exception is reported.
Parameter item-dmapreduce.task.classpath.user.precedence can change the priority order of system Classpath loading
Verify:
Hadoop jar Collect_log.jar Com.collect.logcollectjob-dmapreduce.task.classpath.user.precedence=true-libjars Commons-net-3.2.jar/new_log_collect/input/new_log_collect/output

I found that the-libjars option in Mahout was useless and would be an error. Although Mahout hints have-libjars option, but do not seem to know-libjars like. So I gave up the practice.
For parameter entry-dmapreduce.task.classpath.user.precedence, it may be an option for other versions of Hadoop.

But for me this scenario, hadoop1.1.2, is to use the-DMAPREDUCE.USER.CLASSPATH.FIRST=TRUE option to load user-defined jar packages first.
So why don't I need to specify the Lucene-core-4.3.0.jar under Mahout Lib?
Because Mahout himself will try to load its own lib under the jar package,
If the-dmapreduce.user.classpath.first=true option is set,
The Lucene-core-4.3.0.jar will not be overwritten by the default system.


Harvest:
hadoop1.1.2 uses the-dmapreduce.user.classpath.first=true option to first load user-defined jar packages
This option is different for each version of Hadoop and can be found in the Hadoop source Taskrunner.java.


This article link: http://blog.csdn.net/lingerlanlan/article/details/42504479

This article linger






Mahout (or Hadoop) takes precedence over loading jar packages with user-specified classpath

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.