mahout (or Hadoop) takes precedence over loading jar packages with user-specified classpath
Problem: When using mahout0.8, Java.lang.NoSuchMethodError:org.apache.lucene.util.PriorityQueue appears
Similar http://www.warski.org/blog/2013/10/using-amazons-elastic-map-reduce-to-compute-recommendations-with-apache-mahout-0-8/
Reason:
$HADOOP _home/lib There is an old version of the Lucene-core-3.6.0.jar jar package, Mahout Lib also has its own lucene-core jar package.
HADOOP will prioritize $hadoop_home/lib's jar packages,
So the mahout Lib of that Lucene-core-4.3.0.jar did not load, but loaded $hadoop_home/lib that old version of the.
The old version of Lucene-core-3.6.0.jar is not compatible with mahout0.8, so something went wrong.
Google search overwrite Hadoop classpath see
Http://stackoverflow.com/questions/11685949/overriding-default-hadoop-jars-in-class-path
For other versions of Hadoop, your ' re best to check with the Taskrunner.java class to
Confirm the name of the Config property after all this is a "Semi hidden config":
Static final String Mapreduce_user_classpath_first =
"Mapreduce.user.classpath.first"; A Semi-hidden config
The source code of Taskrunner.java in hadoop1.1.2
https://github.com/hobinyoon/hadoop-1.1.2/blob/8d91a5453c23ea9f643952a3961f78d31ca5f22c/src/mapred/org/apache/ Hadoop/mapred/taskrunner.java
So when calling Mahout, add the-dmapreduce.user.classpath.first=true option
$MAHOUT recommendfactorized-dmapreduce.user.classpath.first=true--numthreads--input $hdfs _user_ratings_path-- Userfeatures $hdfs _user_features_path--itemfeatures $hdfs _item_features_path--numrecommendations--output $hdfs _ Result_path--maxrating 1
The-d option is also valid for Hadoop.
In fact, the beginning is my reference to the following link approach:
Hadoop MapReduce Program jar Package version Conflict resolution method
Http://f.dataguru.cn/thread-58160-1-1.html
In general, when executing Mr with a HADOOP jar, the jar package under $hadoop_home/lib is loaded first,
Due to the use of Commons-net-1.4.1.jar in Hadoop, the 1.4.1 version is loaded first, ignoring the user's own specified version 3.2, so the exception is reported.
Parameter item-dmapreduce.task.classpath.user.precedence can change the priority order of system Classpath loading
Verify:
Hadoop jar Collect_log.jar Com.collect.logcollectjob-dmapreduce.task.classpath.user.precedence=true-libjars Commons-net-3.2.jar/new_log_collect/input/new_log_collect/output
I found that the-libjars option in Mahout was useless and would be an error. Although Mahout hints have-libjars option, but do not seem to know-libjars like. So I gave up the practice.
For parameter entry-dmapreduce.task.classpath.user.precedence, it may be an option for other versions of Hadoop.
But for me this scenario, hadoop1.1.2, is to use the-DMAPREDUCE.USER.CLASSPATH.FIRST=TRUE option to load user-defined jar packages first.
So why don't I need to specify the Lucene-core-4.3.0.jar under Mahout Lib?
Because Mahout himself will try to load its own lib under the jar package,
If the-dmapreduce.user.classpath.first=true option is set,
The Lucene-core-4.3.0.jar will not be overwritten by the default system.
Harvest:
hadoop1.1.2 uses the-dmapreduce.user.classpath.first=true option to first load user-defined jar packages
This option is different for each version of Hadoop and can be found in the Hadoop source Taskrunner.java.
This article link: http://blog.csdn.net/lingerlanlan/article/details/42504479
This article linger
Mahout (or Hadoop) takes precedence over loading jar packages with user-specified classpath