Active Learning:
The process of active learning: requires the classifier to interact with the tagging expert. A typical process:
(1) Building a model based on a small number of labeled samples
(2) A sample of the largest amount of information ever selected from an unmarked sample to be labeled by an expert
(3) Merging these samples with previous samples and building models
(4) Repeat steps (2) and step (3) until stopping criterion (no unlabeled samples or other conditions exist)
Simulation ideas:
1. Dividing data into label and Unlabel datasets
2. Divide the Unlabel into 100 groups, each set of sample array to calculate the entropy value, according to the entropy value, take the first 5 samples, added to the label sample
Packagedemo;ImportJava.io.FileReader;Importjava.util.ArrayList;Importjava.util.Collections;ImportJava.util.Random;Importweka.classifiers.Evaluation;ImportWeka.classifiers.bayes.NaiveBayes;Importweka.core.Instance;Importweka.core.Instances;ImportWeka.core.converters.ConverterUtils.DataSource;//sort the test cases according to the entropy valuesclassInstancesortImplementsComparable<instancesort>{ PublicInstance Instance; Public Doubleentropy; PublicInstancesort (Instance Instance,Doubleentropy) { This. Instance =instance; This. Entropy =entropy; } @Override Public intcompareTo (Instancesort o) {//TODO auto-generated Method Stub if( This. entropy <o.entropy) { return1; }Else if( This. entropy >o.entropy) { return-1; } return0; }} Public classactivelearning { Public StaticInstances getinstances (String fileName)throwsexception{Instances Data=NewInstances (NewFileReader (fileName)); Data.setclassindex (Data.numattributes ()-1); returndata; } //Calculate Entropy Public Static DoubleComputeentropy (Doublepredictvalue) { DoubleEntropy = 0.0; if(1-predictvalue < 0.000000001d | | Predictvalue < 0.000000001D){ return0; }Else { return-predictvalue* (Math.log (predictvalue)/math.log (2.0d))-(1-predictvalue) * (Math.log (1-predictvalue)/math.log (2.0d)); } } Public Static voidClassify (Instances train, Instances test)throwsexception{naivebayes Classifier=NewNaivebayes (); //Training ModelClassifier.buildclassifier (train); //Evaluation ModelEvaluation eval =NewEvaluation (test); Eval.evaluatemodel (classifier, test); System.out.println (Eval.toclassdetailsstring ()); } //not sure sampling Public StaticInstances uncertaintysample (Instances labeled, Instances unlabeled,intStartintEndthrowsexception{//using a labeled first training modelNaivebayes classifier =NewNaivebayes (); Classifier.buildclassifier (labeled); //Sort by EntropyArrayList <InstanceSort> L =NewArraylist<instancesort>(); for(inti = start; I < end; i++) { Doubleresult =classifier.classifyinstance (Unlabeled.instance (i)); DoubleEntropy =computeentropy (Result); Instancesort is=NewInstancesort (Unlabeled.instance (i), entropy); L.add (IS); } //sort by the entropy valueCollections.sort (L); DataSource Source=NewDataSource ("Nasa//pc1.arff"); Instances A=Source.getdataset (); Instances choseninstances=NewInstances (A, 0); //Select 5 instances with minimum entropy value per 100 for(inti = 0; I < 5; i++) {Choseninstances.add (L.get (i). instance); } returnchoseninstances; } //sampling Public Static voidSample (Instances Instances, Instances test)throwsexception{Random Rand=NewRandom (1023); Instances.randomize (RAND); Instances.stratify (10); Instances unlabeled= INSTANCES.TRAINCV (10, 0); Instances labeled= INSTANCES.TESTCV (10, 0); intiterations = unlabeled.numinstances ()/100 +1; for(inti=0; i< iterations-1; i++){ //Select 5 instances with minimum entropy value per 100//100 a groupInstances resultinstances = Uncertaintysample (labeled, unlabeled, i*100, (i+1) *100); for(intj = 0; J < Resultinstances.numinstances (); J + +) {Labeled.add (Resultinstances.instance (j)); } classify (labeled, test); } Instances resultinstances= Uncertaintysample (labeled, unlabeled, (iterations-1) *100, Unlabeled.numinstances ()); for(intj = 0; J < Resultinstances.numinstances (); J + +) {Labeled.add (Resultinstances.instance (j)); } classify (labeled, test); } Public Static voidMain (string[] args)throwsexception{//TODO auto-generated Method StubInstances Instances = getinstances ("Nasa//pc1.arff"); //10-fold Cross ValidationRandom Rand =NewRandom (1023); Instances.randomize (RAND); Instances.stratify (10); Instances Train= INSTANCES.TRAINCV (10, 0); Instances Test= INSTANCES.TESTCV (10, 0);//System.out.println (Train.numinstances ());//System.out.println (Test.numinstances ());sample (Train,test); }}
Invoke the Weka simulation to implement the "active learning" algorithm