Machine learning IB1 Algorithm Weka Source detailed analysis (1NN)

Source: Internet
Author: User
Tags diff

The 1NN nearest neighbor algorithm for machine learning, called IB1 in Weka, is because instance base 1, which is a lazy learning algorithm based only on an instance of the nearest neighbor.

The following summarizes, Weka in the IB1 source of learning summary.

First, you need to introduce Weka-src.jar to the compilation path, otherwise you cannot track the source code.

1) Read the data, complete the call of the IB1 classifier, and evaluate the result forecast. For the following trace.

Try{File File=NewFile ("F:\\tools/lib/data/contact-lenses.arff");Arffloader loader =NewArffloader ();            Loader.setfile (file); INS=Loader.getdataset (); //Be sure to set instances's Classindex before using the sample, otherwise the instances object will throw an exceptionIns.setclassindex (Ins.numattributes ()-1); CFS =NewIB1 ();cfs.buildclassifier (INS); Instance Testinst; Evaluation testingevaluation=NewEvaluation (INS); intLength =ins.numinstances ();  for(inti =0; i < length; i++) {Testinst=ins.instance (i); //This method is used to test the effect of the classifier with each test sample.                Double predictvalue = cfs.classifyinstance (testinst); System. out. println (Testinst.classvalue () +"--"+predictvalue); }//System. out. println ("the correct rate of the classifier:"+ (1-testingevaluation.errorrate ())); } Catch(Exception e) {e.printstacktrace (); }

2) Ctrl Click Buildclassifier, further tracking the source code of the Buildclassifier method, in IB1 class rewrite This abstract method, the source code is:

 Public voidbuildclassifier (Instances Instances) throws Exception {//can classifier handle the data?getcapabilities (). Testwithfail (instances); //remove instances with missing classinstances =NewInstances (Instances);        Instances.deletewithmissingclass (); M_train=NewInstances (Instances,0, Instances.numinstances ()); M_minarray=New Double[M_train.numattributes ()]; M_maxarray=New Double[M_train.numattributes ()];  for(inti =0; I < m_train.numattributes (); i++) {M_minarray[i]= M_maxarray[i] =Double.NaN; } Enumeration ENU=m_train.enumerateinstances ();  while(Enu.hasmoreelements ()) {Updateminmax ((Instance) enu.nextelement ()); }  }

(1) If is judged, the IB1 classifier cannot handle attributes that are strings and categories are numeric samples;

(2) If is judged, delete the sample without class label;

(3) M_minarray and M_maxarray respectively save the minimum and maximum values, and initialize the double array "number of samples";

(4) Traverse all the training sample instances to find the minimum and maximum values; Continue to follow the Updateminmax method;

3) The source code of the Updateminmax method of IB1 class is as follows:

  Private voidUpdateminmax (Instance Instance) { for(intj =0; J < M_train.numattributes (); J + +) {      if((M_train.attribute (j). IsNumeric ()) && (!instance.ismissing (j))) {    if(Double.isnan (M_minarray[j])) {M_minarray[j]=Instance.value (j); M_MAXARRAY[J]=Instance.value (j); } Else {      if(Instance.value (j) <M_minarray[j]) {M_minarray[j]=Instance.value (j); } Else {        if(Instance.value (j) >M_maxarray[j]) {M_maxarray[j]=Instance.value (j); }      }    }      }    }  }

(1) filter out the attribute is not the numerical type and the missing label instance;

(2) If IsNaN, is isn't a number, is a numeric type, loop through each of the sample properties, to find the maximum minimum value;

So far, the IB1 model has been trained (one might ask that lazy algorithms don't need a training model?). I think the build classifier is to initialize the M_train and ask for the maximum minimum value for each property of all instances, to prepare for the next distance.

Below is a description of the following forecast source:


4) Tracking classifyinstance method, the source code is as follows:

  Public Doubleclassifyinstance (Instance Instance) throws Exception {if(m_train.numinstances () = =0) {      Throw NewException ("No Training instances!"); }    DoubleDistance, mindistance = double.max_value, Classvalue =0;    Updateminmax (instance); Enumeration ENU=m_train.enumerateinstances ();  while(Enu.hasmoreelements ()) {Instance traininstance=(Instance) enu.nextelement (); if(!traininstance.classismissing ()) {Distance=distance (instance, traininstance); if(Distance <mindistance) {mindistance=distance; Classvalue=Traininstance.classvalue (); }      }    }    returnClassvalue; }

(1) Call method Updateminmax Update the maximum minimum value after adding the test instance;

(2) Calculate the distance from the test instance to each training instance,distance the method, and save the instance with the minimum distance mindistance;

5) Tracking classifyinstance method, the source code is as follows:

 Private Doubledistance (Instance first, Instance second) {Doublediff, Distance =0;  for(inti =0; I < m_train.numattributes (); i++) {       if(i = =M_train.classindex ()) {    Continue; }      if(M_train.attribute (i). Isnominal ()) {//If attribute is nominal    if(First.ismissing (i) | | | second.ismissing (i) | |        ((int) First.value (i)! = (int) Second.value (i))) {Distance+=1; }      } Else {        //If attribute is numeric    if(First.ismissing (i) | |second.ismissing (i)) {      if(First.ismissing (i) &&second.ismissing (i)) {diff=1; } Else {        if(second.ismissing (i)) {diff=Norm (First.value (i), i); } Else{diff=Norm (Second.value (i), i); }        if(diff <0.5) {diff=1.0-diff; }      }    } Else{diff= Norm (First.value (i), i)-Norm (Second.value (i), i); } Distance+ = diff *diff; }    }        returndistance; }

For each attribute traversal, the sum of squares of the distance of the numeric attribute is computed, and the norm method is the normalized distance formula, which is the real number of "0,1"

6) Tracking Norm Normalization method, the source code is as follows:

  Private Double norm (double x,int  i) {    if  (Double.isnan (M_minarray[i])    || Utils.eq (M_maxarray[i], m_minarray[i])) {      return0;     Else {      return (X-m_minarray[i])/(M_maxarray[i]- m_minarray[i]);    }  }

Normalized distance: (X-m_minarray[i])/(M_maxarray[i]- m_minarray[i]);


Specific algorithm pseudo-code, please find the nearest neighbor classifier paper, I will not post it.

Machine learning IB1 algorithm Weka Source detailed analysis (1NN)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.