Mahout version: 0.7,hadoop version: 1.0.4,jdk:1.7.0_25 64bit.
1. Prelude:
This chapter continues with the analysis, analysis of lanczossolver: Vector nextvector = issymmetric? Corpus.times (Currentvector): corpus.timessquared (Currentvector); The previous article said this is to establish a job task, and according to a certain algorithm to obtain a nextvector, then next?
if (state.getscalefactor () <= 0) {
state.setscalefactor (Calculatescalefactor (Nextvector));
Here we first judge whether the value of getscalefactor is less than or equal to 0, because initializing scalefactor = 0 at the beginning, so call the Calculatescalefactor (nextvector) function:
Protected double Calculatescalefactor (Vector nextvector) {return
nextvector.norm (2);
}
How is this calculated? else if (Power = = 2.0) {return math.sqrt (dotself ()); This is the code that is invoked when the parameter is 2 in the norm function, so here's what's going to be the first nextvector from the dot, and then the square root. ; (at the time of the test, this value is: The value of the 2029123.4011255247,excel calculation is: 2034667.82368468) Next:
Nextvector.assign (New Scale (1.0/state.getscalefactor ()));
Nextvector times 1 divided by Scalefactor is actually nextvector divided by Scalefactor, the number is too big to be smaller? After this step, Nextvector becomes:
{0:0.011875906226907599,1:0.0017759586067652153,2:0.0021729514771005837,3:0.014292365192727802,4:0.09660595016979406,5:0.0 02638859113021243,6:0.0026868791091140517,7:2.476888783392492e-4, 8:0.001831833994868574,9:0.005012618192500366,10:8.604490527160895e-4, 11:0.0029456317791350514,12:0.9951190694939772}
Excel changes to:
0.0118771 0.001776226 0.002173228 0.01429431 0.096617439 0.00263899 0.00268705 0 0 0.0050126 0 0-0 .99511791
Visible, because of the error, the data in Excel directly into 0 (relatively small number);
The next step is to update Nextvector:
Double alpha = Currentvector.dot (nextvector);
Nextvector.assign (Currentvector, New Plusmult (-alpha));
The first is the dot product of the currentvector and Nextvector, and then the items in the Nextvector are updated with the items in the nextvector minus the items in the Currentvector multiplied by the value of alpha; The test results above, The value of alpha is: 0.315642761491587,excel calculates a value of 0.31564687543564, which is very close, and then the value of Nextvector:
{0:-0.07566764464132066,1:-0.08576759226146304,2:-0.08537059939112766,3:-0. 07325118567550044,4:0.009062399301565813,5:-0.08490469175520701,6:-0.0848566717591142,7:-0.087295861989889,8 : -0.08571171687335968,9:-0.08253093267572789,10:-0.08668310181551216,11:-0. 0845979190890932,12:0.9075755186257489}
The values in Excel are:
-0.075668-0.08576847-0.08537146-0.073250382 0.009072747-0.0849057-0.0848576-0.1-0.1-0.082532- 0.1-0.1 0.90757322
Followed by:
Endtime (timingsection.iterate);
StartTime (timingsection.orthoganlize);
Orthoganalizeagainstallbutlast (Nextvector, state);
Endtime (timingsection.orthoganlize);
Endtime and StartTime should be just the directory-related settings, no matter what, look directly at the Orthoganalizeagainstallbutlast function:
protected void Orthoganalizeagainstallbutlast (Vector nextvector, lanczosstate State) {for
(int i = 0; i < state.ge Titerationnumber (); i++) {
Vector basisvector = State.getbasisvector (i);
Double Alpha;
if (Basisvector = null | | (alpha = Nextvector.dot (basisvector)) = = 0.0) {
continue;
}
Nextvector.assign (Basisvector, New Plusmult (-alpha));
}
The operation of this function is to update the Nextvector by using bisis, updating the original value by subtracting the basisvector corresponding value multiplied by (nextvector and Basisvector dot product), and the first time the Basisvector has only one value, is a vector of 13 1 initial square root 13, then the updated Nextvector is:
{0:-0.07566764464132064,1:-0.08576759226146302,2:-0.08537059939112765,3:-0. 07325118567550043,4:0.009062399301565828,5:-0.084904691755207,6:-0.08485667175911418,7:-0.08729586198988899,8 : -0.08571171687335967,9:-0.08253093267572788,10:-0.08668310181551214,11:-0. 08459791908909318,12:0.9075755186257489}
The feeling is not the same as before, because the dot product of Nextvector and Basisvector is very small; Next is:
Beta = nextvector.norm (2);
Well, this function was analyzed before: Just nextvector yourself, then open the root, and get a beta value of 0.9488780991876485, and then determine if alpha and beta are over a certain number, as follows:
if (Outofrange (Beta) | | outofrange (alpha)) {
Log.warn ("Lanczos parameters out of Range:alpha = {}, beta = {}. ") Bailing out early! ",
alpha, beta);
break;