A summary of running mahout problems in Hadoop __hadoop

Source: Internet
Author: User
Tags hadoop fs

Complete the build on Hadoop, start running a few small tests, after all, for the first time, encountered some minor problems.

First, the steps in resources to verify that the installation was successful.

Upload the download data synthetic_control.data to the HDFs, with the following command

(1) Hadoop fs-mkdir testdata(Note that the folder path for this command must be the same as above, not other forms such as/testdata)

(2) Hadoop fs-put Synthetic_control.data (Note that if the data is not in the current folder, you should add a relative or absolute path) testdata/

(3) Hadoop Fs-ls testdata (see if upload succeeded)


After the data upload is successful, start to run the test clustering program, which commands the following

(4) Hadoop jar Mahout-distribution-0.9/mahout-examples-0.9-job.jar (note path) ORG.APACHE.MAHOUT.CLUSTERING.SYNTHETICC Ontrol.kmeans.Job


After the run is complete, view the results

Hadoop fs-ls Output(note path)


Complete.


Second, there is a problem in running data serialization, which is to turn the data in the HDFs into mahout to recognize the amount input format.

Will prompt, Exception in thread "main" Java.lang.noclassdeffounderror:org/apache/mahout/common/abstractjob error ....

This is because the jar files in Mahout are not imported. Find some solutions on the Internet, as follows

For a mapreduce program to refer to a third-party jar file, you can use the following methods:
1. Pass the jar file, such as-libjars, through the command-line arguments;
2. Set directly in Conf, such as Conf.set ("Tmpjars", *.jar), and jar files separated by commas;
3. Using distributed caching, such as Distributedcache.addarchivetoclasspath (path, job), the path here must be HDFs, which means that the jar is uploaded to the HDFs, and then the path is added to the distributed cache; The third party jar file and its own program are packaged into a jar file, and the program obtains the entire file through Job.getjar () and passes it to the HDFs. (Very bulky)
4. Add a jar file to the $hadoop_home/lib directory of each machine. (Not recommended)

The first three are not studied and validated, use the fourth kind, but note here,HADOOP old version put in the Lib folder can, I am using hadoop-2.5.2, need to put the jar file to $hadoop_home/share/hadoop/ Common or in another folder .

When you are done, you are ready to perform.


Solution:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.