Complete the build on Hadoop, start running a few small tests, after all, for the first time, encountered some minor problems.
First, the steps in resources to verify that the installation was successful.
Upload the download data synthetic_control.data to the HDFs, with the following command
(1) Hadoop fs-mkdir testdata(Note that the folder path for this command must be the same as above, not other forms such as/testdata)
(2) Hadoop fs-put Synthetic_control.data (Note that if the data is not in the current folder, you should add a relative or absolute path) testdata/
(3) Hadoop Fs-ls testdata (see if upload succeeded)
After the data upload is successful, start to run the test clustering program, which commands the following
(4) Hadoop jar Mahout-distribution-0.9/mahout-examples-0.9-job.jar (note path) ORG.APACHE.MAHOUT.CLUSTERING.SYNTHETICC Ontrol.kmeans.Job
After the run is complete, view the results
Hadoop fs-ls Output(note path)
Complete.
Second, there is a problem in running data serialization, which is to turn the data in the HDFs into mahout to recognize the amount input format.
Will prompt, Exception in thread "main" Java.lang.noclassdeffounderror:org/apache/mahout/common/abstractjob error ....
This is because the jar files in Mahout are not imported. Find some solutions on the Internet, as follows
For a mapreduce program to refer to a third-party jar file, you can use the following methods:
1. Pass the jar file, such as-libjars, through the command-line arguments;
2. Set directly in Conf, such as Conf.set ("Tmpjars", *.jar), and jar files separated by commas;
3. Using distributed caching, such as Distributedcache.addarchivetoclasspath (path, job), the path here must be HDFs, which means that the jar is uploaded to the HDFs, and then the path is added to the distributed cache; The third party jar file and its own program are packaged into a jar file, and the program obtains the entire file through Job.getjar () and passes it to the HDFs. (Very bulky)
4. Add a jar file to the $hadoop_home/lib directory of each machine. (Not recommended)
The first three are not studied and validated, use the fourth kind, but note here,HADOOP old version put in the Lib folder can, I am using hadoop-2.5.2, need to put the jar file to $hadoop_home/share/hadoop/ Common or in another folder .
When you are done, you are ready to perform.
Solution: