1, K-Nearest neighbor algorithm (KNN)
The principle is that in one sample space, there are samples of known classifications, and when a sample of an unknown classification occurs, it is determined by the K-sample nearest to the unknown sample.
Examples: Love movies and action movies, where there are kissing plays and movements, an unknown category of movies will be determined based on the nearest K-point in the coordinate system established by the number of kisses and the number of movements.
2. Algorithm implementation steps
(1) Calculate the Euclidean distance of a little distance from unknown point
(2) Sorting all points
(3) Find the nearest K point from unknown point
(4) Calculate the frequency at which the K points appear in the classification
(5) Selection of the most frequent classification is the classification of unknown points
3. Java implementation
Point class
Public classPoint {Private LongID; Private Doublex; Private Doubley; PrivateString type; PublicPoint (LongIdDoubleXDoubley) { This. x =x; This. y =y; This. ID =ID; } PublicPoint (LongIdDoubleXDoubley, String type) { This. x =x; This. y =y; This. Type =type; This. ID =ID; }//get, set method omitted}
Distance class
public class Distance {//known point idprivate long id;//unknown point idprivate long nid;//the distance between the two private double disatance;public Distance (Long ID, long nid, Double disatance) {this.id = Id;this.nid = Nid;this.disatance = disatance;} Get, set method omitted}
Comparator Compareclass class
Import java.util.comparator;//Comparator class public class Compareclass implements Comparator<distance>{public int compare ( Distance D1, Distance D2) {return d1.getdisatance () >d2.getdisatance ()? 20:-1;}}
KNN Main class
/*** 1, input all known points 2, input unknown point 3, calculate all known points to unknown Euclidean distance 4, according to the distance to all known points of 5, select the nearest k points of the distance unknown point 6, calculate the K points in the category of the occurrence of the frequency 7, select the most frequent category is the unknown point category * * @authorFZJ **/ Public classKNN { Public Static voidMain (string[] args) {//One, enter all known pointsPoint point1 =NewPoint (1, 1.0, 1.1, "A"); Point Point2=NewPoint (2, 1.0, 1.0, "A"); Point Point3=NewPoint (3, 1.0, 1.2, "A"); Point Point4=NewPoint (4, 0, 0, "B"); Point Point5=NewPoint (5, 0, 0.1, "B"); Point Point6=NewPoint (6, 0, 0.2, "B"); //Second, input unknown pointPoint x =NewPoint (5, 1.2, 1.2); //calculate the Euclidean distance of all known points to unknown points and sort all known points based on distanceArraylist<point> List1 =NewArraylist<point>(); List1.add (POINT1); List1.add (Point2); List1.add (POINT3); List1.add (POINT4); List1.add (POINT5); List1.add (POINT6); Compareclass Compare=NewCompareclass (); Set<Distance> List3 =NewTreeset<distance>(Compare); for(Point point:list1) {List3.add (NewDistance (Point.getid (), X.getid (), Oudistance (point, X)); } //Iv. Select the nearest K points DoubleK = 5; /*** v. Calculate the frequency at which the K points are classified*/ //1. Calculate the number of points that each category containsList<distance> List4 =NewArraylist<distance>(LIST3); Map<string, integer> map =Getnumberoftype (List4, List1, K); //2. Calculation Frequencymap<string, double> p =Computep (map, k); X.settype (MAXP (p)); System.out.println ("The type of unknown point is:" +X.gettype ()); } //Euclidean distance calculation Public Static Doubleoudistance (Point point1, point Point2) {Doubletemp = Math.pow (Point1.getx ()-Point2.getx (), 2) + Math.pow (point1.gety ()-Point2.gety (), 2); returnmath.sqrt (temp); } //Find out the maximum frequency Public StaticString Maxp (map<string, double>map) {String key=NULL; DoubleValue = 0.0; for(Map.entry<string, double>Entry:map.entrySet ()) { if(Entry.getvalue () >value) {Key=Entry.getkey (); Value=Entry.getvalue (); } } returnkey; } //Calculation Frequency Public StaticMap<string, double> computep (map<string, integer>map,Doublek) {Map<string, double> p =NewHashmap<string, double>(); for(Map.entry<string, integer>Entry:map.entrySet ()) {P.put (Entry.getkey (), Entry.getvalue ()/k); } returnp; } //calculate the number of points that each category contains Public StaticMap<string, integer>Getnumberoftype (List<Distance> listdistance, Arraylist<point> Listpoint,Doublek) {Map<string, integer> map =NewHashmap<string, integer>(); inti = 0; System.out.println ("The selected K-points, from near in turn:"); for(Distance distance:listdistance) {System.out.println ("id" + distance.getid () + ", Distance:" +distance.getdisatance ()); LongID =Distance.getid (); //The owning type is found by ID and stored in HashMap for(Point point:listpoint) {if(Point.getid () = =ID) {if(Map.get (Point.gettype ())! =NULL) Map.put (Point.gettype (), Map.get (Point.gettype ())+ 1); Else{map.put (Point.gettype (),1); }}} I++; if(I >=k) Break; } returnmap; }}4. Operation result
Reference
[1] "machine learning combat"
Data Mining (ii) Java implementation of--KNN algorithm