Mahout Source code Analysis of the Distributedlanczossolver (vii) Summary article

Source: Internet
Author: User
Tags final log prepare

Mahout version: 0.7,hadoop version: 1.0.4,jdk:1.7.0_25 64bit.

Look at the SVD judge online surface using the Amazon cloud platform calculation, but given the SVD algorithm to call the way, when the eigenvectors, what should be done? For example, the original data is 600*60 (600 rows, 60 columns) of data, calculated eigenvectors is 24*60 (of which 24 is not the rank of a value), then the final result should be Original_ The data multiplied by the eigenvectors transpose thus obtains a 600*24 matrix, which achieves the goal of dimensionality reduction.

This article describes a direct use of SVD tool class can be downloaded in http://download.csdn.net/detail/fansy1990/6479451;

After downloading a total of three files, one of which is synthetic_control.data data file, a Svd.jar file, a Crunch-0.5.0-incubating.jar file (to be placed under the Cloud Platform Lib);

Operation mode: 1 Put the Crunch-0.5.0-incubating.jar under the/lib of Hadoop, then restart the cluster;

2 upload synthetic_control.data file to HDFs;

3) Run Svd.jar, refer to the following instructions:

~/hadoop-1.0.4/bin$./hadoop jar. /.. /mahout_jar/svd.jar mahout.fansy.svd.svdrunner-i/svd/input/-o/svd/output1-nr 600-nc 60-r 3-sym Square--CLEANSVD t Rue--tempdir/svd/temp

The following is a design idea, the source file code:

1. Beginning with the data being text, the data must first be converted into vectorwritable format, as a temporary input file preparevector;

2. For the converted data, call SVD algorithm to calculate, get eigenvectors;

3. Transvector the results of the final conversion of preparevector and eigenvectors calculations;

The final result of the algorithm is in: output/transformedvector;

Main class:

Package MAHOUT.FANSY.SVD;  
Import java.io.IOException;  
Import java.util.List;  
      
      
Import Java.util.Map;  
Import org.apache.hadoop.conf.Configuration;  
Import Org.apache.hadoop.util.ToolRunner;  
Import Org.apache.mahout.common.AbstractJob;  
Import Org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver;  
Import Org.apache.mahout.math.hadoop.decomposer.EigenVerificationJob;  
Import Org.slf4j.Logger;  
      
Import Org.slf4j.LoggerFactory;  /** * SVD runner use to run SVD algorithm on input like:<br> * 1.1,2.4,1,4.3,... <br> * ...<br> * and reduce dimension<br> * There are three jobs in-section:<br> * (1) Prepare the text input to Vect ORS which the second job Needed;<br> * (2) Do the Distributedlanczossolver job, which are the same as in Mahout;&lt 
;br> * (3) Dimension reduction:transform the input to the reduced output;<br> * * * @author fansy * * * * public class Svdrunner ExtenDS abstractjob {private static final String preparevectorpath= "/preparevector";  
    private static final String transformedvector= "/transformedvector";  
          
    Private map<string, list<string>> Parsedargs;  
    Private static final Logger log = Loggerfactory.getlogger (Svdrunner.class); @Override public int run (string[] args) throws Exception {if (Prepareargs (args)!=0) {return-1  
        ; /* * Prepare vectors job/log.info ("Prepare Vector job begins.  
        ..");  
        String InputPath = abstractjob.getoption (Parsedargs, "--input");  
        String OutputPath =abstractjob.getoption (Parsedargs, "--tempdir") +svdrunner.preparevectorpath;  
        String regex= ","; if (Abstractjob.getoption (Parsedargs, "--splitterpattern")!=null) {regex=abstractjob.getoption (ParsedArgs, "-  
        -splitterpattern "); } String Column=abstractjoB.getoption (Parsedargs, "--numcols");  
        String[] Job1args=new string[]{"-i", InputPath, "O", OutputPath, "-regex", Regex, "-NC", column};  
        int Job1result=toolrunner.run (getconf (), New Preparesvdvector (), Job1args);  
        if (job1result!=0) {return-1;  
        } log.info ("SVD algorithm job begins ..."); Replace the input for (int i=0;i<args.length;i++) {if (Args[i].equals ("i") | | Args[i].equals ("--input")) {args[i+1]=abstractjob.getoption (Parsedargs, "--tempdir") +svdrunner.prepareve  
                Ctorpath;  
            Break  
        an int job2result=toolrunner.run (new Distributedlanczossolver (). Job (), args);  
        if (job2result!=0) {return-1;  
        } log.info ("Transform job begins ...");  
        Inputpath=outputpath; Outputpath=abstractjob.getoption (Parsedargs, "--output") +svdrunner.transformedvector; 
        String eigenpath=abstractjob.getoption (Parsedargs, "--output") + "/" +eigenverificationjob.clean_eigenvectors;  
        String[] Job3args=new string[]{"-i", InputPath, "O", OutputPath, "-NC", column, "E", Eigenpath};  
        int Job3result=toolrunner.run (getconf (), New Svdreductiontranform (), Job3args);  
        if (job3result!=0) {return-1;  
    return 0; /** * Prepare arguments * @param args:input arguments * @return 0 if nothing 
     wrong; * @throws IOException */private int Prepareargs (string[] args) throws ioexception{()  
        ;  
        Addoutputoption ();  
        AddOption ("NumRows", "NR", "Number of rows of the" input matrix);  
        AddOption ("Numcols", "NC", "Number of columns of the" input matrix); AddOption ("Rank", "R", "desired decomposition rank" (note:only roughly 1/4 to 1/3 "+" this would hav E the top portion of thE spectrum) ");  
        AddOption ("symmetric", "sym", "is the" input matrix square and symmetric?);  
                                    AddOption ("Workingdir", "WD", "Working directory path to store Lanczos basis vectors"  
        + "(to being used on restarts, and to avoid too much RAM usage)"); Options required to run CLEANSVD job addoption ("CLEANSVD", "CL", "Run the" eigenverificationjob to clean the E  
        Igenvectors after SVD ", false);  
        AddOption ("Maxerror", "Err", "Maximum acceptable error", "0.05");  
        AddOption ("Mineigenvalue", "MeV", "Minimum eigenvalue to keep" vectors for "," 0.0 ");  
      
        AddOption ("InMemory", "mem", "Buffer Eigen matrix into memory (if you have enough!)", "false");  AddOption ("Splitterpattern", "regex", "the char used to split the" input text Default Value: "+" \ "\"
        ", false);  
        This.parsedargs = parsearguments (args); if (This.parsedargs = = null) {Return-1;  
        else {return 0; }/** * Svdrunner main * @throws Exception/public static void main (St  
    Ring[] args) throws Exception {Toolrunner.run (New Configuration (), New Svdrunner (), args); }  
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.