Mahout custom Clusterdumper only output center point

Source: Internet
Author: User
Tags getmessage glob log log log

hadoop1.0.4,mahout0.5.

Mahout inside the implementation of the read clustering algorithm, called Clusterdumper, this class output format is generally as follows:

Vl-2{n=6 c=[1.833, 2.417] r=[0.687, 0.344]}  
    Weight: Point  :  
    1.0: [1.000, 3.000]  
...  
    1.0: [3.000, 2.500]  
vl-11{n=7 c=[2.857, 4.714] r=[0.990, 0.364]}  
    Weight: Point  :  
    1.0: [1.000, 5.000]  
...  
    1.0: [4.000, 4.500]  
vl-14{n=8 c=[4.750, 3.438] r=[0.433, 0.682]}  
    Weight: Point  :  
    1.0: [4.000, 3.000 ]  
    ...  
    1.0: [5.000, 4.000]

However, if I only want to achieve the output of the cluster center file, then not. Originally want to inherit clusterdumper, results Clusterdumper is a final, forget, or write it yourself.

Back to the column page: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/

Refer to the source code in Clusterdumper, as follows:

For (Cluster value:  
           new Path (Seqfiledir, "part-*"), sequencefiledirvalueiterable<cluster> Pathtype.glob, conf)) {  
        String fmtstr = value.asformatstring (dictionary);  
        if (subString > 0 && fmtstr.length () > subString) {  
          writer.write (': ');  
          Writer.write (fmtstr, 0, Math.min (subString, Fmtstr.length ()));  
        else {  
          writer.write (FMTSTR);  
        }

or refer to the LZ before an article: Mahout source Kmeansdriver analysis of the second center point file Analysis (no text), there are also about the cluster center reading;

You can write a Clustercenterdump class, as follows:

Package com.caic.cloud.util;  
Import Java.io.File;  
Import java.io.FileNotFoundException;  
Import java.io.IOException;  
      
Import Java.io.Writer;  
Import Org.apache.commons.logging.Log;  
Import Org.apache.commons.logging.LogFactory;  
Import org.apache.hadoop.conf.Configuration;  
Import Org.apache.hadoop.fs.Path;  
Import Org.apache.mahout.clustering.Cluster;  
Import Org.apache.mahout.common.iterator.sequencefile.PathType;  
      
Import org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable;  
Import Com.google.common.base.Charsets;  
      
Import Com.google.common.io.Files; /** * Just output the center vector to a given file * @author fansy * * */public class Clustercenterdump {PR  
    Ivate Log Log=logfactory.getlog (clustercenterdump.class);  
    Private Configuration conf;  
    Private Path Centerpathdir;  
          
    Private String OutputPath; /*public clustercenterdump () {} public Clustercenterdump (Configuration conf) { 
        this.conf=conf; }*/public Clustercenterdump (Configuration conf,string centerpathdir,string outputpath) {THIS.C  
        onf=conf;  
        This.centerpathdir=new Path (Centerpathdir);  
    This.setoutputpath (OutputPath); /** * Write the given cluster center to the given file * @return * @throws Filenotfou Ndexception */public Boolean writecentertolocal () throws filenotfoundexception{if (this.conf==null || this.outputpath==null| |  
            This.centerpathdir==null) {log.info ("Error:\nshould initial the configuration, OutputPath and Centerpath");  
        return false;  
        } Writer Writer=null;  
            Try {file Outputfile=new file (OutputPath);  
            writer = Files.newwriter (outputfile, charsets.utf_8); This.writetxtcenter (writer, New Sequencefiledirvalueiterable<cluster> ( NeW Path (Centerpathdir, "part-*"), Pathtype.glob, conf);  
                    New Sequencefiledirvalueiterable<writable> (New Path (Centerpathdir, "part-r-00000"), Pathtype.list,  
                    Pathfilters.partfilter (), Conf));  
        Writer.flush ();  
            catch (IOException e) {log.info ("Write error:\n" +e.getmessage ());  
        return false;  
                }finally{try {if (writer!=null) {writer.close ();  
            The catch (IOException e) {log.info ("Close writer error:\n" +e.getmessage ());  
    } return true; 
     /** * Write the cluster to writer * @param writer * @param cluster * @return * @throws IOException * * Private Boolean Writetxtcenter (Writer writer,iterable<cluster> clusters) t Hrows ioexception{for (cluster cluster:clusters) {String fmtstr = cluster.asformatstring (null);  
            System.out.println ("Fmtstr:" +fmtstr);  
            Writer.write (FMTSTR);  
        Writer.write ("\ n");  
    return true;  
    Public Configuration getconf () {return conf;  
    public void setconf (Configuration conf) {this.conf = conf;  
    Public Path Getcenterpathdir () {return centerpathdir;  
    } public void Setcenterpathdir (Path centerpathdir) {this.centerpathdir = Centerpathdir;  
    }/** * @return the OutputPath */public String Getoutputpath () {return outputpath; /** * @param outputpath the OutputPath to set */public void Setoutputpath (String outputpath)  
    {This.outputpath = OutputPath; }  
      
          
}

Here is a test class:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.