In the mahout_in_action book, there is a simple example of kmeans.Source codeDoes not indicate which packages to import to run correctly
This book begins with a reference to allCodeAll of them are based on mahout0.4, but I found that the kmeans example is based on mahout0.3. There are several functions not available in version 0.4.
I don't know if it is because I directly used the compiled package, but I did not check the source code of mahout0.4. Below I will mark which functions are not found in 0.4 in the code.
Public static final double [] [] points = {1, 1}, {2, 1}, {1, 2}, {2, 2}, {3, 3 },{ 8, 8 },{ 9, 8 },{ 8, 9 },{ 9, 9 }}; public static void writepointstofile (list <vector> points, string filename, filesystem FS, configuration conf) throws ioexception {Path = New Path (filename); sequencefile. writer writer = new sequencefile. writer (FS, Conf, path, longwritable. class, vectorwritable. class); long recnum = 0; VEC Torwritable VEC = new vectorwritable (); For (vector point: Points) {Vec. set (point); writer. append (New longwritable (recnum ++), VEC);} writer. close ();} public static list <vector> getpoints (double [] [] Raw) {list <vector> points = new arraylist <vector> (); for (INT I = 0; I <raw. length; I ++) {double [] Fr = raw [I]; vector VEC = new randomaccesssparsevector ("vector:" + String. valueof (I), Fr. length); // In mahout0. 4. the constructor VEC does not have this parameter. assign (FR); points. add (VEC) ;}return points;} public static void main (string ARGs []) throws exception {int K = 2; List <vector> vectors = getpoints (points ); file testdata = new file ("testdata"); If (! Testdata. exists () {testdata. mkdir ();} testdata = new file ("testdata/points"); If (! Testdata. exists () {testdata. mkdir ();} configuration conf = new configuration (); filesystem FS = filesystem. get (CONF); writepointstofile (vectors, "testdata/points/file1", FS, conf); Path = New Path ("testdata/clusters/parts-00000 "); sequencefile. writer writer = new sequencefile. writer (FS, Conf, path, text. class, cluster. class); For (INT I = 0; I <K; I ++) {vector VEC = vectors. get (I); cluster = new cluster (VEC, I); // This constructor cluster does not exist in mahout-0.4. addpoint (cluster. getcenter (); // The writer function is not available in mahout-0.4. append (new text (cluster. getidentifier (), cluster);} writer. close (); kmeansdriver. change runjob ("testdata/points", "testdata/clusters", // mahout-0.4 to kmeansdriver. run (parameter), without the runjob function "output", euclideandistancemeasure. class. getname (), 0.001, 10, 1); sequencefile. reader reader = new sequencefile. reader (FS, New Path ("output/points/parts-00000"), conf); text key = new text (); text value = new text (); while (reader. next (Key, value) {system. out. println (key. tostring () + "belongs to cluster" + value. tostring ();} reader. close ();}
Of course, you can change it to code that supports mahout0.4. If run as-> JAVA application is selected during runtime, The result file will be generated in the local directory. If run as-> run on hadoop, the file is generated in HDFS.
According to the error message, find the hadoop-core-0.20.2.jar, mahout-core-0.3.jar, mahout-math-0.3.jar import, at this time the Code should not report an error, in the running process, according to the prompts, in turn to the following package in the project. I 've imported mahout-collections-0.3.jar, slf4j-api-1.5.8.jar, slf4j-jcl-1.5.8.jar, commons-logging-1.1.1.jar, commons-cli-2.0.jar, commons-httpclient-3.1.jar, and which can all be found in the lib directory under the mahout installation directory or installation directory, the final running result is as follows:
Vector: 0 belongs to cluster 0 vector: 1 belongs to cluster 0 vector: 2 belongs to cluster 0 vector: 3 belongs to cluster 0 vector: 4 belongs to cluster 0 vector: 5 belongs to cluster 1 vector: 6 belongs to cluster 1 vector: 7 belongs to cluster 1 vector: 8 belongs to cluster 1