The 1NN nearest neighbor algorithm for machine learning, called IB1 in Weka, is because instance base 1, which is a lazy learning algorithm based only on an instance of the nearest neighbor.
The following summarizes, Weka in the IB1 source of learning summary.
First, you need to introduce Weka-src.jar to the compilation path, otherwise you cannot track the source code.
1) Read the data, complete the call of the IB1 classifier, and evaluate the result forecast. For the following trace.
Try{File File=NewFile ("F:\\tools/lib/data/contact-lenses.arff");Arffloader loader =NewArffloader (); Loader.setfile (file); INS=Loader.getdataset (); //Be sure to set instances's Classindex before using the sample, otherwise the instances object will throw an exceptionIns.setclassindex (Ins.numattributes ()-1); CFS =NewIB1 ();cfs.buildclassifier (INS); Instance Testinst; Evaluation testingevaluation=NewEvaluation (INS); intLength =ins.numinstances (); for(inti =0; i < length; i++) {Testinst=ins.instance (i); //This method is used to test the effect of the classifier with each test sample. Double predictvalue = cfs.classifyinstance (testinst); System. out. println (Testinst.classvalue () +"--"+predictvalue); }//System. out. println ("the correct rate of the classifier:"+ (1-testingevaluation.errorrate ())); } Catch(Exception e) {e.printstacktrace (); }
2) Ctrl Click Buildclassifier, further tracking the source code of the Buildclassifier method, in IB1 class rewrite This abstract method, the source code is:
Public voidbuildclassifier (Instances Instances) throws Exception {//can classifier handle the data?getcapabilities (). Testwithfail (instances); //remove instances with missing classinstances =NewInstances (Instances); Instances.deletewithmissingclass (); M_train=NewInstances (Instances,0, Instances.numinstances ()); M_minarray=New Double[M_train.numattributes ()]; M_maxarray=New Double[M_train.numattributes ()]; for(inti =0; I < m_train.numattributes (); i++) {M_minarray[i]= M_maxarray[i] =Double.NaN; } Enumeration ENU=m_train.enumerateinstances (); while(Enu.hasmoreelements ()) {Updateminmax ((Instance) enu.nextelement ()); } }
(1) If is judged, the IB1 classifier cannot handle attributes that are strings and categories are numeric samples;
(2) If is judged, delete the sample without class label;
(3) M_minarray and M_maxarray respectively save the minimum and maximum values, and initialize the double array "number of samples";
(4) Traverse all the training sample instances to find the minimum and maximum values; Continue to follow the Updateminmax method;
3) The source code of the Updateminmax method of IB1 class is as follows:
Private voidUpdateminmax (Instance Instance) { for(intj =0; J < M_train.numattributes (); J + +) { if((M_train.attribute (j). IsNumeric ()) && (!instance.ismissing (j))) { if(Double.isnan (M_minarray[j])) {M_minarray[j]=Instance.value (j); M_MAXARRAY[J]=Instance.value (j); } Else { if(Instance.value (j) <M_minarray[j]) {M_minarray[j]=Instance.value (j); } Else { if(Instance.value (j) >M_maxarray[j]) {M_maxarray[j]=Instance.value (j); } } } } } }
(1) filter out the attribute is not the numerical type and the missing label instance;
(2) If IsNaN, is isn't a number, is a numeric type, loop through each of the sample properties, to find the maximum minimum value;
So far, the IB1 model has been trained (one might ask that lazy algorithms don't need a training model?). I think the build classifier is to initialize the M_train and ask for the maximum minimum value for each property of all instances, to prepare for the next distance.
Below is a description of the following forecast source:
4) Tracking classifyinstance method, the source code is as follows:
Public Doubleclassifyinstance (Instance Instance) throws Exception {if(m_train.numinstances () = =0) { Throw NewException ("No Training instances!"); } DoubleDistance, mindistance = double.max_value, Classvalue =0; Updateminmax (instance); Enumeration ENU=m_train.enumerateinstances (); while(Enu.hasmoreelements ()) {Instance traininstance=(Instance) enu.nextelement (); if(!traininstance.classismissing ()) {Distance=distance (instance, traininstance); if(Distance <mindistance) {mindistance=distance; Classvalue=Traininstance.classvalue (); } } } returnClassvalue; }
(1) Call method Updateminmax Update the maximum minimum value after adding the test instance;
(2) Calculate the distance from the test instance to each training instance,distance the method, and save the instance with the minimum distance mindistance;
5) Tracking classifyinstance method, the source code is as follows:
Private Doubledistance (Instance first, Instance second) {Doublediff, Distance =0; for(inti =0; I < m_train.numattributes (); i++) { if(i = =M_train.classindex ()) { Continue; } if(M_train.attribute (i). Isnominal ()) {//If attribute is nominal if(First.ismissing (i) | | | second.ismissing (i) | | ((int) First.value (i)! = (int) Second.value (i))) {Distance+=1; } } Else { //If attribute is numeric if(First.ismissing (i) | |second.ismissing (i)) { if(First.ismissing (i) &&second.ismissing (i)) {diff=1; } Else { if(second.ismissing (i)) {diff=Norm (First.value (i), i); } Else{diff=Norm (Second.value (i), i); } if(diff <0.5) {diff=1.0-diff; } } } Else{diff= Norm (First.value (i), i)-Norm (Second.value (i), i); } Distance+ = diff *diff; } } returndistance; }
For each attribute traversal, the sum of squares of the distance of the numeric attribute is computed, and the norm method is the normalized distance formula, which is the real number of "0,1"
6) Tracking Norm Normalization method, the source code is as follows:
Private Double norm (double x,int i) { if (Double.isnan (M_minarray[i]) || Utils.eq (M_maxarray[i], m_minarray[i])) { return0; Else { return (X-m_minarray[i])/(M_maxarray[i]- m_minarray[i]); } }
Normalized distance: (X-m_minarray[i])/(M_maxarray[i]- m_minarray[i]);
Specific algorithm pseudo-code, please find the nearest neighbor classifier paper, I will not post it.
Machine learning IB1 algorithm Weka Source detailed analysis (1NN)