Environment:Hadoop-2.5.0-cdh5.2.0
Mahout-0.9-cdh5.2.0
Steps:The basic idea is to introduce all jar packages under mahout into hadoop's classpath, So we modified $ hadoop_home/etc/hadoop/hadoop-env.sh, add the following code to introduce all jar packages of mahout into hadoop's classpath: For B in $ mahout_home/lib /*. jar; do if ["$ hadoop_classpath"]; then export hadoop_classpath = $ hadoop_classpath: $ B else export hadoop_classpath = $ B fidone
For C in $ mahout_home/*. jar; do if ["$ hadoop_classpath"]; then export hadoop_classpath = $ hadoop_classpath: $ C else export hadoop_classpath = $ C fidone
After adding the code, prepare the basic data, upload the jar package, execute the command: hadoop jar gul-itemcf-hadoop.jar itemcfhadoop
Note:This jar package does not contain all the dependent packages and only contains mapreduce classes. Adding Maven to all dependent packages will lead to a very bloated jar package. This method is not elegant and will increase the network and memory load, so give up.
Problem EncounteredThe execution of the first job is successfully completed. Starting from the second job, the following error occurs: Java. Lang. classnotfoundexception: org. Apache. mahout. Math. vector.
Then throw the following exception in thread "Main" Java. io. filenotfoundexception: file does not exist:/recommendersystem/jilinsmepsp/recommenderengine/service/guessulike/tmp/1414042683946/preparepreferencematrix/numuser S. bin
I believe everyone understands the first error. This is a problem that hadoop does not recognize a third-party (mahout) dependent jar package.
SolutionFirst of all, it is good to add hadoop_classpath to $ hadoop_home/etc/hadoop/hadoop-env.sh, because after removing the first added statement, even the CF-related class of mahout could not be recognized, but why does it only recognize a part? Is it a conflict? Later, I also made some detours, referring to the many tricks on the Internet, for example, copying the jar package to $ hadoop_home/lib, but none of the tricks were good, the final thought is back to the "package conflict" idea.
Ultimate SolutionBy comparing several jar packages under $ mahout_home,
Mahout-core-0.9-cdh5.2.0-job.jarContains all the classes required to execute the job and repeat with the mahout-math-0.9-cdh5.2.0.jar to include Org. apache. mahout. math. vector, it seems that this class is not recognized due to conflicts. Therefore, the ultimate solution is very simple,
In
$ Hadoop_home/etc/hadoop/hadoop-env.sh in
Introduce a jar package.: Export hadoop_classpath = $ hadoop_classpath: $ mahout_home/mahout-core-0.9-cdh5.2.0-job.jar
Then, the program is successfully executed, and the world is bright!
[Ganzhou] How to Run mahout itemcf on hadoop on cdh5.2