Kmeans simple instance in mahout

Source: Internet
Author: User

In the mahout_in_action book, there is a simple example of kmeans.Source codeDoes not indicate which packages to import to run correctly

This book begins with a reference to allCodeAll of them are based on mahout0.4, but I found that the kmeans example is based on mahout0.3. There are several functions not available in version 0.4.

I don't know if it is because I directly used the compiled package, but I did not check the source code of mahout0.4. Below I will mark which functions are not found in 0.4 in the code.

Public static final double [] [] points = {1, 1}, {2, 1}, {1, 2}, {2, 2}, {3, 3 },{ 8, 8 },{ 9, 8 },{ 8, 9 },{ 9, 9 }}; public static void writepointstofile (list <vector> points, string filename, filesystem FS, configuration conf) throws ioexception {Path = New Path (filename); sequencefile. writer writer = new sequencefile. writer (FS, Conf, path, longwritable. class, vectorwritable. class); long recnum = 0; VEC Torwritable VEC = new vectorwritable (); For (vector point: Points) {Vec. set (point); writer. append (New longwritable (recnum ++), VEC);} writer. close ();} public static list <vector> getpoints (double [] [] Raw) {list <vector> points = new arraylist <vector> (); for (INT I = 0; I <raw. length; I ++) {double [] Fr = raw [I]; vector VEC = new randomaccesssparsevector ("vector:" + String. valueof (I), Fr. length); // In mahout0. 4. the constructor VEC does not have this parameter. assign (FR); points. add (VEC) ;}return points;} public static void main (string ARGs []) throws exception {int K = 2; List <vector> vectors = getpoints (points ); file testdata = new file ("testdata"); If (! Testdata. exists () {testdata. mkdir ();} testdata = new file ("testdata/points"); If (! Testdata. exists () {testdata. mkdir ();} configuration conf = new configuration (); filesystem FS = filesystem. get (CONF); writepointstofile (vectors, "testdata/points/file1", FS, conf); Path = New Path ("testdata/clusters/parts-00000 "); sequencefile. writer writer = new sequencefile. writer (FS, Conf, path, text. class, cluster. class); For (INT I = 0; I <K; I ++) {vector VEC = vectors. get (I); cluster = new cluster (VEC, I); // This constructor cluster does not exist in mahout-0.4. addpoint (cluster. getcenter (); // The writer function is not available in mahout-0.4. append (new text (cluster. getidentifier (), cluster);} writer. close (); kmeansdriver. change runjob ("testdata/points", "testdata/clusters", // mahout-0.4 to kmeansdriver. run (parameter), without the runjob function "output", euclideandistancemeasure. class. getname (), 0.001, 10, 1); sequencefile. reader reader = new sequencefile. reader (FS, New Path ("output/points/parts-00000"), conf); text key = new text (); text value = new text (); while (reader. next (Key, value) {system. out. println (key. tostring () + "belongs to cluster" + value. tostring ();} reader. close ();}

Of course, you can change it to code that supports mahout0.4. If run as-> JAVA application is selected during runtime, The result file will be generated in the local directory. If run as-> run on hadoop, the file is generated in HDFS.

According to the error message, find the hadoop-core-0.20.2.jar, mahout-core-0.3.jar, mahout-math-0.3.jar import, at this time the Code should not report an error, in the running process, according to the prompts, in turn to the following package in the project. I 've imported mahout-collections-0.3.jar, slf4j-api-1.5.8.jar, slf4j-jcl-1.5.8.jar, commons-logging-1.1.1.jar, commons-cli-2.0.jar, commons-httpclient-3.1.jar, and which can all be found in the lib directory under the mahout installation directory or installation directory, the final running result is as follows:

 
Vector: 0 belongs to cluster 0 vector: 1 belongs to cluster 0 vector: 2 belongs to cluster 0 vector: 3 belongs to cluster 0 vector: 4 belongs to cluster 0 vector: 5 belongs to cluster 1 vector: 6 belongs to cluster 1 vector: 7 belongs to cluster 1 vector: 8 belongs to cluster 1

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.