Data Mining (ii) Java implementation of--KNN algorithm

Source: Internet
Author: User

1, K-Nearest neighbor algorithm (KNN)

The principle is that in one sample space, there are samples of known classifications, and when a sample of an unknown classification occurs, it is determined by the K-sample nearest to the unknown sample.

Examples: Love movies and action movies, where there are kissing plays and movements, an unknown category of movies will be determined based on the nearest K-point in the coordinate system established by the number of kisses and the number of movements.

2. Algorithm implementation steps

(1) Calculate the Euclidean distance of a little distance from unknown point

(2) Sorting all points

(3) Find the nearest K point from unknown point

(4) Calculate the frequency at which the K points appear in the classification

(5) Selection of the most frequent classification is the classification of unknown points

3. Java implementation

Point class

 Public classPoint {Private LongID; Private Doublex; Private Doubley; PrivateString type;  PublicPoint (LongIdDoubleXDoubley) { This. x =x;  This. y =y;  This. ID =ID; }     PublicPoint (LongIdDoubleXDoubley, String type) {         This. x =x;  This. y =y;  This. Type =type;  This. ID =ID; }//get, set method omitted}

Distance class

public class Distance {//known point idprivate long id;//unknown point idprivate long nid;//the distance between the two private double disatance;public Distance (Long ID, long nid, Double disatance) {this.id = Id;this.nid = Nid;this.disatance = disatance;}              Get, set method omitted}

Comparator Compareclass class

Import java.util.comparator;//Comparator class public class Compareclass implements Comparator<distance>{public int compare ( Distance D1, Distance D2) {return d1.getdisatance () >d2.getdisatance ()? 20:-1;}}

KNN Main class

/*** 1, input all known points 2, input unknown point 3, calculate all known points to unknown Euclidean distance 4, according to the distance to all known points of 5, select the nearest k points of the distance unknown point 6, calculate the K points in the category of the occurrence of the frequency 7, select the most frequent category is the unknown point category * * @authorFZJ **/ Public classKNN { Public Static voidMain (string[] args) {//One, enter all known pointsPoint point1 =NewPoint (1, 1.0, 1.1, "A"); Point Point2=NewPoint (2, 1.0, 1.0, "A"); Point Point3=NewPoint (3, 1.0, 1.2, "A"); Point Point4=NewPoint (4, 0, 0, "B"); Point Point5=NewPoint (5, 0, 0.1, "B"); Point Point6=NewPoint (6, 0, 0.2, "B"); //Second, input unknown pointPoint x =NewPoint (5, 1.2, 1.2); //calculate the Euclidean distance of all known points to unknown points and sort all known points based on distanceArraylist<point> List1 =NewArraylist<point>();        List1.add (POINT1);        List1.add (Point2);        List1.add (POINT3);        List1.add (POINT4);        List1.add (POINT5);                List1.add (POINT6); Compareclass Compare=NewCompareclass (); Set<Distance> List3 =NewTreeset<distance>(Compare);  for(Point point:list1) {List3.add (NewDistance (Point.getid (), X.getid (), Oudistance (point, X)); }        //Iv. Select the nearest K points        DoubleK = 5; /*** v. Calculate the frequency at which the K points are classified*/        //1. Calculate the number of points that each category containsList<distance> List4 =NewArraylist<distance>(LIST3); Map<string, integer> map =Getnumberoftype (List4, List1, K); //2. Calculation Frequencymap<string, double> p =Computep (map, k);        X.settype (MAXP (p)); System.out.println ("The type of unknown point is:" +X.gettype ()); }    //Euclidean distance calculation     Public Static Doubleoudistance (Point point1, point Point2) {Doubletemp = Math.pow (Point1.getx ()-Point2.getx (), 2)                + Math.pow (point1.gety ()-Point2.gety (), 2); returnmath.sqrt (temp); }    //Find out the maximum frequency     Public StaticString Maxp (map<string, double>map) {String key=NULL; DoubleValue = 0.0;  for(Map.entry<string, double>Entry:map.entrySet ()) {            if(Entry.getvalue () >value) {Key=Entry.getkey (); Value=Entry.getvalue (); }        }        returnkey; }    //Calculation Frequency     Public StaticMap<string, double> computep (map<string, integer>map,Doublek) {Map<string, double> p =NewHashmap<string, double>();  for(Map.entry<string, integer>Entry:map.entrySet ()) {P.put (Entry.getkey (), Entry.getvalue ()/k); }        returnp; }    //calculate the number of points that each category contains     Public StaticMap<string, integer>Getnumberoftype (List<Distance> listdistance, Arraylist<point> Listpoint,Doublek) {Map<string, integer> map =NewHashmap<string, integer>(); inti = 0; System.out.println ("The selected K-points, from near in turn:");  for(Distance distance:listdistance) {System.out.println ("id" + distance.getid () + ", Distance:" +distance.getdisatance ()); LongID =Distance.getid (); //The owning type is found by ID and stored in HashMap             for(Point point:listpoint) {if(Point.getid () = =ID) {if(Map.get (Point.gettype ())! =NULL) Map.put (Point.gettype (), Map.get (Point.gettype ())+ 1); Else{map.put (Point.gettype (),1); }}} I++; if(I >=k) Break; }        returnmap; }}
4. Operation result

Reference

[1] "machine learning combat"

Data Mining (ii) Java implementation of--KNN algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.