K-means Clustering Java instances

Source: Internet
Author: User
Tags gety pow

The sixth chapter of Mahout in action.

Datafile/cluster/simple_k-means.txt datasets such as the following:

1 12 11 22 23 38 88 99 89 9

1. K-means Clustering Algorithm principle


1. k elements are randomly taken from d. As the individual centers of the K-clusters.


2. Calculate the difference between the remaining elements and the center of k clusters, respectively, and assign these elements to clusters with the lowest degree of dissimilarity.


3, according to cluster results. Once again, the centers of the K clusters are computed by the arithmetic averages of the respective dimensions of all the elements in the cluster.


4. All elements in D are clustered again according to the new center.


5, repeat the 4th step, until the cluster results no longer change.


6, output the result.

2. Illustrative examples
2.1 Randomly take k elements from D, as the respective centers of the K clusters. Private final static Integer k=2; Choose K=2, which is an estimate of two clusters.
Choose 1 1,2,1 two points below.

C0:1 1
C1:2 1
2.2 Calculates the divergence of the remaining elements to the center of the k cluster, respectively, and classifies the elements into clusters with the lowest degree of dissimilarity. The result is:

C0:1 1C0: The point is: 1.0,2.0c1:2 1C1: The point is: 2.0,2.0c1: The point is: 3.0,3.0c1: The point is: 8.0,8.0c1:8.0,9.0c1: The point is: 9.0,8.0c1: The point is:  9.0, 9.0



2.3 According to the clustering result of 2.2. Once again, the centers of the K clusters are computed by the arithmetic averages of the respective dimensions of all the elements in the cluster.

Take the Euclidean distance formula. C0 The new cluster Heart is: 1.0,1.5
C1 The new cluster Heart is: 5.857142857142857,5.714285714285714

2.4 All elements in D are clustered again according to the new center.

The 2nd Iteration C0:1.0,1.0c0: The point is: 2.0,1.0c0: The point is: 1.0,2.0c0: The point is: 2.0,2.0c0: The point is: 3.0,3.0c1: The point is: 8.0,8.0c1: The point is: 8.0,9.0c1: The point is: 9.0,8.0c1: The point is: 9.0,9.0


2.5 Repeat the 4th step until the cluster result no longer changes. When the distance is less than a certain value. I think the cluster has been clustered. No need to iterate, the value here is 0.001 Private final static Double converge=0.001;

The cluster heart of the------------------------------------------------C0 is: The 1.6666666666666667,1.75C1 cluster heart is: 7.971428571428572, 7.942857142857143 the minimum distance for each cluster heart movement is, move=0.7120003121097943 3rd iteration C0: The point is: 1.0,1.0c0: The point is: 2.0,1.0c0: The point is: 1.0,2.0c0: The point is: 2.0,2.0c0: The point is: 3.0,3.0c1: The point is: 8.0,8.0c1: The point is: 8.0,9.0c1: The point is: 9.0,8.0c1: The point is: 9.0,9.0----------------------------- The cluster heart of the-------------------C0 is: The 1.777777777777778,1.7916666666666667C1 cluster heart is: 8.394285714285715, 8.388571428571428 the minimum distance for each cluster heart movement is. move=0.11866671868496578 4th Iteration C0: The point is: 1.0,1.0c0: The point is: 2.0,1.0c0: The point is: 1.0,2.0c0: The point is: 2.0,2.0c0: The point is: 3.0,3.0c1: The point is: 8.0, 8.0C1: The point is: 8.0,9.0c1: The point is: 9.0,8.0c1: The point is: 9.0,9.0------------------------------------------------ C0 cluster Heart is: 1.7962962962962965,1.7986111111111114c1 cluster heart is: 8.478857142857143,8.477714285714285 Each cluster heart movement the smallest distance is, move= 0.019777786447494432 5th Iteration C0: The point is: 1.0,1.0c0:2.0,1.0c0: The point is: 1.0,2.0c0: The point is: 2.0,2.0c0: The point is: 3.0,3.0c1: The point is: 8.0, 8.0C1: The point is: 8.0,9.0c1: The point is: 9.0,8.0c1: The point is: 9.0,9.0------------------------------------------------ The cluster heart of C0 is: 1.799382716049383,1.7997685185185184c1:8.495771428571429,8.495542857142857 the minimum distance for each cluster heart movement. move=0.003296297741248916 6th Iteration C0: The point is: 1.0,1.0c0: The point is: 2.0,1.0c0: The point is: 1.0,2.0c0: The point is: 2.0,2.0c0: The point is: 3.0,3.0c1: The point is: 8.0 , 8.0C1: The point is: 8.0,9.0c1: The point is: 9.0,8.0c1: The point is: 9.0,9.0------------------------------------------------ C0 's Cluster Heart is: 1.7998971193415638,1.7999614197530864C1 's cluster Heart is: 8.499154285714287,8.499108571428572 Each cluster heart movement the smallest distance is. Move=5.49382956874724e-4

3. Java implementation
Package Mysequence.machineleaning.clustering.kmeans;import Java.io.bufferedreader;import Java.io.FileInputStream; Import Java.io.ioexception;import java.io.inputstreamreader;import java.util.arraylist;import java.util.List; Import Java.util.vector;import Mysequence.machineleaning.clustering.canopy.point;public class MyKmeans {static Vector<point> li=new vector<point> ();//static list<point> li=new ArrayList<Point> (); static List<vector<point>> list=new arraylist<vector<point>> (); Each iteration saves the result, and a vector represents a cluster private final static Integer k=2; Choose K=2, which is an estimate of two clusters. Private final static Double converge=0.001; When the distance is less than a certain value. It is thought that the cluster has been clustered, no need to iterate, here the value of the 0.001//read data public static final void ReadF1 () throws IOException {String filepath= "datafile/cl Uster/simple_k-means.txt ";        BufferedReader br = new BufferedReader (new InputStreamReader (New FileInputStream (FilePath)));        for (String line = Br.readline (), line = null, line = Br.readline ()) {    if (Line.length () ==0| | "".        Equals (line)) continue;                           String[] Str=line.split ("");    Point P0=new Point ();    P0.setx (double.valueof (str[0));    P0.sety (double.valueof (str[1));            Li.add (P0);                       System.out.println (line);    } br.close (); }//math.sqrt (double n)//extended. Suppose you want to give m n times to use Java.lang.StrictMath.pow (m,1.0/n);//use Euclidean distance public static double Distancemeasure (Point p1,point p2) {double Tmp=strictmath.pow (P2.getx ()-p1.getx (), 2) +strictmath.pow (P2.gety ()-p1.gety (), 2); return MATH.SQRT (TMP); Calculates the new cluster heart public static Double calcentroid () {System.out.println ("------------------------------------------------") ;D ouble movedist=double.max_value;for (int i=0;i<list.size (); i++) {vector<point> subli=list.get (i); Point Po=new Point ();D ouble sumx=0.0;double sumy=0.0;double clusterlen=double.valueof (Subli.size ()), for (int j=0;j <clusterlen;j++) {point nextp=subli.get (j); Sumx=sumx+nextp.getx (); Sumy=sumy+nextp.gety ();} Po.setx (SumX/clusterlen);p o.sety (Sumy/clusterlen);//The distance between the new point and the old Point double dist=distancemeasure (subli.get (0), PO);//In the process of moving multiple clusters of cores, Returns the value of the minimum moving distance if (dist<movedist) movedist=dist;list.get (i). Clear (); List.get (i). Add (PO); System.out.println ("C" +i+ "The Cluster Heart is:" +po.getx () + "," +po.gety ());} String test= "ll"; return movedist;} This time the cluster heart//Next moving cluster heart private static Double move=double.max_value;//move distance//iterate continuously until the public static void Recursionkluster () {for (int times=2;move>converge;times++) {System.out.println ("+times+");//default vector for each list No. 0 element is centroid for (            int i=0;i<li.size (); i++) {point p=new Point (); P=li.get (i); int index =-1; Double neardist = double.max_value;for (int k=0;k<k;k++) {point centre=list.get (k). Get (0);d ouble currentdist= Distancemeasure (P,centre); if (currentdist<neardist) {neardist=currentdist;index=k;}} System.out.println ("C" +index+ ": The point is:" +p.getx () + "," +p.gety ()) ", List.get (Index). Add (P);} Compute the cluster heart again, and return the moving distance, the smallest distance move=calcentroid (); System.out.println ("The smallest distance in each cluster heart movement. Move= "+move);}} public static void Kluster () {for (int k=0;k<k;k++) {vector<point> vect=new vector<point> (); Point P=new Point ();p =li.get (k); Vect.add (P); List.add (Vect);} System.out.println ("1th iteration");//default vector for each list the No. 0 element is centroid for (int i=k;i<li.size (); i++) {point p=new Point (); p=            Li.get (i); int index =-1; Double neardist = double.max_value;for (int k=0;k<k;k++) {point centre=list.get (k). Get (0);d ouble currentdist= Distancemeasure (P,centre); if (currentdist<neardist) {neardist=currentdist;index=k;}} System.out.println ("C" +index+ ": The point is:" +p.getx () + "," +p.gety ()) ", List.get (Index). Add (P);}} public static void Main (string[] args) throws IOException {//TODO auto-generated method stub//read Data readF1 ();//First Iteration Kluste R ();//The Cluster Heart calcentroid () is computed after the first iteration;//iteration continues until convergence Recursionkluster ();}}

4. Execution Result: c0:1 1
C1:2 1
1th Iteration
C0: The point is: 1.0,2.0
C1: The point is: 2.0,2.0
C1: The point is: 3.0,3.0
C1: The point is: 8.0,8.0
C1: The point is: 8.0,9.0
C1: The point is: 9.0,8.0
C1: The point is: 9.0,9.0
------------------------------------------------
C0 's Cluster Heart is: 1.0,1.5
C1 's Cluster Heart is: 5.857142857142857,5.714285714285714
2nd Iteration
C0: The point is: 1.0,1.0
C0: The point is: 2.0,1.0
C0: The point is: 1.0,2.0
C0: The point is: 2.0,2.0
C0: The point is: 3.0,3.0
C1: The point is: 8.0,8.0
C1: The point is: 8.0,9.0
C1: The point is: 9.0,8.0
C1: The point is: 9.0,9.0
------------------------------------------------
C0 's Cluster Heart is: 1.6666666666666667,1.75
C1 's Cluster Heart is: 7.971428571428572,7.942857142857143
The minimum distance for each cluster heart movement is, move=0.7120003121097943
3rd Iteration
C0: The point is: 1.0,1.0
C0: The point is: 2.0,1.0
C0: The point is: 1.0,2.0
C0: The point is: 2.0,2.0
C0: The point is: 3.0,3.0
C1: The point is: 8.0,8.0
C1: The point is: 8.0,9.0
C1: The point is: 9.0,8.0
C1: The point is: 9.0,9.0
------------------------------------------------
C0 's Cluster Heart is: 1.777777777777778,1.7916666666666667
C1 's Cluster Heart is: 8.394285714285715,8.388571428571428
The minimum distance for each cluster heart movement is. move=0.11866671868496578
4th Iteration
C0: The point is: 1.0,1.0
C0: The point is: 2.0,1.0
C0: The point is: 1.0,2.0
C0: The point is: 2.0,2.0
C0: The point is: 3.0,3.0
C1: The point is: 8.0,8.0
C1: The point is: 8.0,9.0
C1: The point is: 9.0,8.0
C1: The point is: 9.0,9.0
------------------------------------------------
C0 's Cluster Heart is: 1.7962962962962965,1.7986111111111114
C1 's Cluster Heart is: 8.478857142857143,8.477714285714285
The minimum distance for each cluster heart movement is. move=0.019777786447494432
5th Iteration
C0: The point is: 1.0,1.0
C0: The point is: 2.0,1.0
C0: The point is: 1.0,2.0
C0: The point is: 2.0,2.0
C0: The point is: 3.0,3.0
C1: The point is: 8.0,8.0
C1: The point is: 8.0,9.0
C1: The point is: 9.0,8.0
C1: The point is: 9.0,9.0
------------------------------------------------
C0 's Cluster Heart is: 1.799382716049383,1.7997685185185184
C1 's Cluster Heart is: 8.495771428571429,8.495542857142857
The minimum distance for each cluster heart movement is. move=0.003296297741248916
6th Iteration
C0: The point is: 1.0,1.0
C0: The point is: 2.0,1.0
C0: The point is: 1.0,2.0
C0: The point is: 2.0,2.0
C0: The point is: 3.0,3.0
C1: The point is: 8.0,8.0
C1: The point is: 8.0,9.0
C1: The point is: 9.0,8.0
C1: The point is: 9.0,9.0
------------------------------------------------
C0 's Cluster Heart is: 1.7998971193415638,1.7999614197530864
C1 's Cluster Heart is: 8.499154285714287,8.499108571428572
The minimum distance for each cluster heart movement is. Move=5.49382956874724e-4

K-means Clustering Java instances

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.