Mahout Source Code Analysis of Distributedlanczossolver (iii) JOB2

Source: Internet
Author: User
Tags square root

Mahout version: 0.7,hadoop version: 1.0.4,jdk:1.7.0_25 64bit.

1. Prelude:

This chapter continues with the analysis, analysis of lanczossolver: Vector nextvector = issymmetric? Corpus.times (Currentvector): corpus.timessquared (Currentvector); The previous article said this is to establish a job task, and according to a certain algorithm to obtain a nextvector, then next?

if (state.getscalefactor () <= 0) {  
        state.setscalefactor (Calculatescalefactor (Nextvector));  
      

Here we first judge whether the value of getscalefactor is less than or equal to 0, because initializing scalefactor = 0 at the beginning, so call the Calculatescalefactor (nextvector) function:

Protected double Calculatescalefactor (Vector nextvector) {return  
    nextvector.norm (2);  
  }

How is this calculated? else if (Power = = 2.0) {return math.sqrt (dotself ()); This is the code that is invoked when the parameter is 2 in the norm function, so here's what's going to be the first nextvector from the dot, and then the square root. ; (at the time of the test, this value is: The value of the 2029123.4011255247,excel calculation is: 2034667.82368468) Next:

Nextvector.assign (New Scale (1.0/state.getscalefactor ()));

Nextvector times 1 divided by Scalefactor is actually nextvector divided by Scalefactor, the number is too big to be smaller? After this step, Nextvector becomes:

{0:0.011875906226907599,1:0.0017759586067652153,2:0.0021729514771005837,3:0.014292365192727802,4:0.09660595016979406,5:0.0 02638859113021243,6:0.0026868791091140517,7:2.476888783392492e-4, 8:0.001831833994868574,9:0.005012618192500366,10:8.604490527160895e-4, 11:0.0029456317791350514,12:0.9951190694939772}

Excel changes to:

0.0118771 0.001776226 0.002173228 0.01429431 0.096617439 0.00263899 0.00268705 0 0 0.0050126 0 0-0 .99511791

Visible, because of the error, the data in Excel directly into 0 (relatively small number);

The next step is to update Nextvector:

Double alpha = Currentvector.dot (nextvector);  
     Nextvector.assign (Currentvector, New Plusmult (-alpha));

The first is the dot product of the currentvector and Nextvector, and then the items in the Nextvector are updated with the items in the nextvector minus the items in the Currentvector multiplied by the value of alpha; The test results above, The value of alpha is: 0.315642761491587,excel calculates a value of 0.31564687543564, which is very close, and then the value of Nextvector:

{0:-0.07566764464132066,1:-0.08576759226146304,2:-0.08537059939112766,3:-0. 07325118567550044,4:0.009062399301565813,5:-0.08490469175520701,6:-0.0848566717591142,7:-0.087295861989889,8 : -0.08571171687335968,9:-0.08253093267572789,10:-0.08668310181551216,11:-0. 0845979190890932,12:0.9075755186257489}

The values in Excel are:

-0.075668-0.08576847-0.08537146-0.073250382 0.009072747-0.0849057-0.0848576-0.1-0.1-0.082532- 0.1-0.1 0.90757322

Followed by:

Endtime (timingsection.iterate);  
      StartTime (timingsection.orthoganlize);  
      Orthoganalizeagainstallbutlast (Nextvector, state);  
      Endtime (timingsection.orthoganlize);

Endtime and StartTime should be just the directory-related settings, no matter what, look directly at the Orthoganalizeagainstallbutlast function:

protected void Orthoganalizeagainstallbutlast (Vector nextvector, lanczosstate State) {for  
    (int i = 0; i < state.ge Titerationnumber (); i++) {  
      Vector basisvector = State.getbasisvector (i);  
      Double Alpha;  
      if (Basisvector = null | | (alpha = Nextvector.dot (basisvector)) = = 0.0) {  
        continue;  
      }  
      Nextvector.assign (Basisvector, New Plusmult (-alpha));  
    }  
  

The operation of this function is to update the Nextvector by using bisis, updating the original value by subtracting the basisvector corresponding value multiplied by (nextvector and Basisvector dot product), and the first time the Basisvector has only one value, is a vector of 13 1 initial square root 13, then the updated Nextvector is:

{0:-0.07566764464132064,1:-0.08576759226146302,2:-0.08537059939112765,3:-0. 07325118567550043,4:0.009062399301565828,5:-0.084904691755207,6:-0.08485667175911418,7:-0.08729586198988899,8 : -0.08571171687335967,9:-0.08253093267572788,10:-0.08668310181551214,11:-0. 08459791908909318,12:0.9075755186257489}

The feeling is not the same as before, because the dot product of Nextvector and Basisvector is very small; Next is:

Beta = nextvector.norm (2);

Well, this function was analyzed before: Just nextvector yourself, then open the root, and get a beta value of 0.9488780991876485, and then determine if alpha and beta are over a certain number, as follows:

if (Outofrange (Beta) | | outofrange (alpha)) {  
        Log.warn ("Lanczos parameters out of Range:alpha = {}, beta = {}.  ") Bailing out early! ",  
            alpha, beta);  
        break;  
      

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.