Machine learning IB1 Algorithm Weka Source detailed analysis (1NN)

Last Update:2016-04-09 Source: Internet

Author: User

Tags diff

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The 1NN nearest neighbor algorithm for machine learning, called IB1 in Weka, is because instance base 1, which is a lazy learning algorithm based only on an instance of the nearest neighbor.

The following summarizes, Weka in the IB1 source of learning summary.

First, you need to introduce Weka-src.jar to the compilation path, otherwise you cannot track the source code.

1) Read the data, complete the call of the IB1 classifier, and evaluate the result forecast. For the following trace.

Try{File File=NewFile ("F:\\tools/lib/data/contact-lenses.arff");Arffloader loader =NewArffloader ();            Loader.setfile (file); INS=Loader.getdataset (); //Be sure to set instances's Classindex before using the sample, otherwise the instances object will throw an exceptionIns.setclassindex (Ins.numattributes ()-1); CFS =NewIB1 ();cfs.buildclassifier (INS); Instance Testinst; Evaluation testingevaluation=NewEvaluation (INS); intLength =ins.numinstances ();  for(inti =0; i < length; i++) {Testinst=ins.instance (i); //This method is used to test the effect of the classifier with each test sample.                Double predictvalue = cfs.classifyinstance (testinst); System. out. println (Testinst.classvalue () +"--"+predictvalue); }//System. out. println ("the correct rate of the classifier:"+ (1-testingevaluation.errorrate ())); } Catch(Exception e) {e.printstacktrace (); }

2) Ctrl Click Buildclassifier, further tracking the source code of the Buildclassifier method, in IB1 class rewrite This abstract method, the source code is:

 Public voidbuildclassifier (Instances Instances) throws Exception {//can classifier handle the data?getcapabilities (). Testwithfail (instances); //remove instances with missing classinstances =NewInstances (Instances);        Instances.deletewithmissingclass (); M_train=NewInstances (Instances,0, Instances.numinstances ()); M_minarray=New Double[M_train.numattributes ()]; M_maxarray=New Double[M_train.numattributes ()];  for(inti =0; I < m_train.numattributes (); i++) {M_minarray[i]= M_maxarray[i] =Double.NaN; } Enumeration ENU=m_train.enumerateinstances ();  while(Enu.hasmoreelements ()) {Updateminmax ((Instance) enu.nextelement ()); }  }

(1) If is judged, the IB1 classifier cannot handle attributes that are strings and categories are numeric samples;

(2) If is judged, delete the sample without class label;

(3) M_minarray and M_maxarray respectively save the minimum and maximum values, and initialize the double array "number of samples";

(4) Traverse all the training sample instances to find the minimum and maximum values; Continue to follow the Updateminmax method;

3) The source code of the Updateminmax method of IB1 class is as follows:

  Private voidUpdateminmax (Instance Instance) { for(intj =0; J < M_train.numattributes (); J + +) {      if((M_train.attribute (j). IsNumeric ()) && (!instance.ismissing (j))) {    if(Double.isnan (M_minarray[j])) {M_minarray[j]=Instance.value (j); M_MAXARRAY[J]=Instance.value (j); } Else {      if(Instance.value (j) <M_minarray[j]) {M_minarray[j]=Instance.value (j); } Else {        if(Instance.value (j) >M_maxarray[j]) {M_maxarray[j]=Instance.value (j); }      }    }      }    }  }

(1) filter out the attribute is not the numerical type and the missing label instance;

(2) If IsNaN, is isn't a number, is a numeric type, loop through each of the sample properties, to find the maximum minimum value;

So far, the IB1 model has been trained (one might ask that lazy algorithms don't need a training model?). I think the build classifier is to initialize the M_train and ask for the maximum minimum value for each property of all instances, to prepare for the next distance.

Below is a description of the following forecast source:

4) Tracking classifyinstance method, the source code is as follows:

  Public Doubleclassifyinstance (Instance Instance) throws Exception {if(m_train.numinstances () = =0) {      Throw NewException ("No Training instances!"); }    DoubleDistance, mindistance = double.max_value, Classvalue =0;    Updateminmax (instance); Enumeration ENU=m_train.enumerateinstances ();  while(Enu.hasmoreelements ()) {Instance traininstance=(Instance) enu.nextelement (); if(!traininstance.classismissing ()) {Distance=distance (instance, traininstance); if(Distance <mindistance) {mindistance=distance; Classvalue=Traininstance.classvalue (); }      }    }    returnClassvalue; }

(1) Call method Updateminmax Update the maximum minimum value after adding the test instance;

(2) Calculate the distance from the test instance to each training instance,distance the method, and save the instance with the minimum distance mindistance;

5) Tracking classifyinstance method, the source code is as follows:

 Private Doubledistance (Instance first, Instance second) {Doublediff, Distance =0;  for(inti =0; I < m_train.numattributes (); i++) {       if(i = =M_train.classindex ()) {    Continue; }      if(M_train.attribute (i). Isnominal ()) {//If attribute is nominal    if(First.ismissing (i) | | | second.ismissing (i) | |        ((int) First.value (i)! = (int) Second.value (i))) {Distance+=1; }      } Else {        //If attribute is numeric    if(First.ismissing (i) | |second.ismissing (i)) {      if(First.ismissing (i) &&second.ismissing (i)) {diff=1; } Else {        if(second.ismissing (i)) {diff=Norm (First.value (i), i); } Else{diff=Norm (Second.value (i), i); }        if(diff <0.5) {diff=1.0-diff; }      }    } Else{diff= Norm (First.value (i), i)-Norm (Second.value (i), i); } Distance+ = diff *diff; }    }        returndistance; }

For each attribute traversal, the sum of squares of the distance of the numeric attribute is computed, and the norm method is the normalized distance formula, which is the real number of "0,1"

6) Tracking Norm Normalization method, the source code is as follows:

  Private Double norm (double x,int  i) {    if  (Double.isnan (M_minarray[i])    || Utils.eq (M_maxarray[i], m_minarray[i])) {      return0;     Else {      return (X-m_minarray[i])/(M_maxarray[i]- m_minarray[i]);    }  }

Normalized distance: (X-m_minarray[i])/(M_maxarray[i]- m_minarray[i]);

Specific algorithm pseudo-code, please find the nearest neighbor classifier paper, I will not post it.

Machine learning IB1 algorithm Weka Source detailed analysis (1NN)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More