1.k-means algorithm brief and code prototype
One of the most important algorithms in data mining is K-means, which I do not introduce in detail here. If you are interested, you can take a Chenhao blog:
Http://www.csdn.net/article/2012-07-03/2807073-k-means is a good speaker.
In general, K-means clustering requires the following steps:
①. Initializing data
②. Calculates the initial center point, which can be randomly selected
③. Calculates the distance from each point to each cluster center and divides it into a cluster center cluster with the shortest distance
④. Calculates the average of each cluster, which is the new cluster center, repeating step 3
⑤. If the maximum cycle is reached or if the center of the cluster is no longer changing or the center of the cluster changes less than a certain range, stop the loop.
Well, that's the way it works, super simple. But the Java algorithm does not implement a small amount of code. This code is not completely self-written, but also some reference. I have encapsulated the K-means implementation in a class so that it can be called at any time.
Import Java.util.arraylist;import Java.util.random;public class Kmeans {private int k;//cluster number private int m;//Iteration number Private int datasetlength;//DataSet length private arraylist<double[]> dataset;//data set private arraylist<double[]> center;/ /Center list private arraylist<arraylist<double[]>> cluster;//cluster private arraylist<float> jc;//error squared sum, This is used to move the center of the Centre. Private Random random;//Sets the original data collection public void Setdataset (arraylist<double[]> dataset) { This.dataset=dataset;} Get cluster group Public arraylist<arraylist<double[]>> Getcluster () {return this.cluster;} constructor, passing in the number of clusters to be divided public kmeans (int k) {if (k<=0) k=1;this.k=k;} Initialize private void init () {m=0;random=new random (); if (dataset==null| | Dataset.size () ==0) Initdataset ();d atasetlength=dataset.size (); if (k>datasetlength) k=datasetlength;center= Initcenters (); Cluster=initcluster (); jc=new arraylist<float> ();} Initialize data collection private void Initdataset () {dataset=new arraylist<double[]> ();d ouble[][] Datasetarray=new double[][]{ {8,2},{3,4},{2,5},{4, 2},{7,3},{6,2},{4,7},{6,3},{5,3},{6,3},{6,9},{1,6},{3,9},{4,1},{8,6}};for (int i=0;i<datasetarray.length;i++ ) Dataset.add (Datasetarray[i]);} Initialize the central linked list, divided into clusters of several centers private arraylist<double[]> initcenters () {arraylist<double[]> center= new ArrayList <double[]> ();//Generate a random sequence, int[] randoms=new int[k];boolean flag;int temp=random.nextint (datasetlength); randoms [0]=temp;for (int i=1;i<k;i++) {flag=true;while (flag) {temp=random.nextint (datasetlength); int J=0;while (j<i) { if (Temp==randoms[j]) break;j++;} if (j==i) Flag=false;} Randoms[i]=temp;} for (int i=0;i<k;i++) Center.add (Dataset.get (randoms[i)); return Center;} Initialize Cluster collection private arraylist<arraylist<double[]>> initcluster () {arraylist<arraylist<double[]> > Cluster=new arraylist<arraylist<double[]>> (); for (int i=0;i<k;i++) Cluster.add (New ArrayList< Double[]> ()); return cluster;} Calculate distance private double distance (double[] element,double[] center) {double distance=0.0f;double x=element[0]-center[0]; Double y=element[1]-center[1];d ouble z=element[2]-center[2];d ouble sum=x*x+y*y+z*z;distance= (double) Math.sqrt (sum); return distance;} Calculates the shortest distance private int mindistance (double[] distance) {double mindistance=distance[0];int minlocation=0;for (int i=0;i <distance.length;i++) {if (distance[i]<mindistance) {mindistance=distance[i];minlocation=i;} else if (distance[i]==mindistance) {if (Random.nextint (Ten) <5) {minlocation=i;}}} return minlocation;} Each point classification private void Clusterset () {double[] distance=new double[k];for (int i=0;i<datasetlength;i++) {// Calculate the distance to each center store for (int j=0;j<k;j++) distance[j]=distance (Dataset.get (i), Center.get (j));//calculate the shortest distance int minlocation= Mindistance (distance);//add him to the cluster Cluster.get (minlocation). Add (Dataset.get (i));}} Calculates the new center private void Setnewcenter () {for (int i=0;i<k;i++) {int n=cluster.get (i). Size (); if (n!=0) {double[] Newcenter ={0,0};for (int j=0;j<n;j++) {newcenter[0]+=cluster.get (i). Get (j) [0];newcenter[1]+=cluster.get (i). Get (j) [1];} Newcenter[0]=newcenter[0]/n;newcenter[1]=newcenTer[1]/n;center.set (i, Newcenter);}}} 2-point Error Square private double errosquare (double[] element,double[] center) {Double x=element[0]-center[0];d ouble y=element [1]-center[1];d ouble errosquare=x*x+y*y;return errosquare;} Calculation error squared sum criterion function private void countrule () {float jcf=0;for (int i=0;i<cluster.size (); i++) {for (int j=0;j<cluster.get (i). Size (); j + +) Jcf+=errosquare (Cluster.get (i). Get (j), Center.get (i)); Jc.add (JCF);} The core algorithm private void Kmeans () {//Initializes various variables, randomly selects the center, initializes the cluster init ();//Starts the loop while (true) {//points each point into the cluster to Clusterset ();// Calculate the target function countrule ();//Check the error change, because I specify the number of cycles of calculation is 50 times, so you do not have to calculate this, you want to use also can, is slow a bit/*if (m!=0) {if (Jc.get (m)-jc.get (m-1) ==0) break;} */if (m>=50) break;//otherwise continue to generate a new center setnewcenter (); M++;cluster.clear (); Cluster=initcluster ();}}
Exposes only one interface to the external class public void execute () {System.out.print ("Start kmeans\n"); Kmeans (); System.out.print ("Kmeans end\n");}
Used to print out the cluster public void Printdataarray (arraylist<double[]> data,string dataarrayname) {for (int i=0;i< Data.size (); i++) {System.out.print ("Print:" +dataarrayname+ "[" +i+ "]={" +data.get (i) [0]+ "," +data.get (i) [1]+ "}\n");} System.out.print ("==========================");}}
Well, that's the code. The notes are written in detail and can be understood. Below I give a test example.
Import Java.util.arraylist;public class Test {public static void main (string[] args) {Kmeans k=new Kmeans (2); Arraylist<double[]> dataset=new arraylist<double[]> ();d ataset.add (New double[]{2,2,2});d Ataset.add ( New double[]{1,2,2});d Ataset.add (New double[]{2,1,2});d Ataset.add (New double[]{1,3,2});d Ataset.add (New double[]{ 3,1,2});d Ataset.add (New double[]{-2,-2,-2});d Ataset.add (New double[]{-1,-2,-2});d Ataset.add (New double[]{-2,-1,-2 });d Ataset.add (New double[]{-3,-1,-2});d Ataset.add (New double[]{-1,-3,-2}); K.setdataset (DataSet); K.execute (); Arraylist<arraylist<double[]>> Cluster=k.getcluster (); for (int i=0;i<cluster.size (); i++) { K.printdataarray (Cluster.get (i), "cluster[" +i+ "]");}}}
No difficulty, that is, the input to write the initial data, and then perform K-means in the classification, and finally print. This prototype code is very rough, there is no addition of the number of clusters and the number of cycles of variables, these need to do it yourself.
2.k-means Application Image Segmentationwe can put the K-means clustering on the image segmentation, that is, a color of the pixels divided into a class, and then painted a color. Like this. The left side is the cluster before, the right is after the cluster, it still looks cool. In fact, the clustering algorithm is also very easy to expand here. have the following four tips (because it is homework, I decided not to put the horse, otherwise the homework will be the same as my credit on Curry Gaygay):①. The above prototype code is the two-dimensional data classification, we also know that a color has RGB three primary colors constitute, that is, we only need on the basis of two-dimensional, plus one-dimensional data on the Roar. It is simple to have wood, change the structure of the array, in the distance calculation programming three-dimensional Euclidean distance on the Roar. ②.java has its own image processing class, so reading data is easy to tap. I'll give you a little code hint .
//read the picture data of the specified directory, and writes the array, this data will continue to handle private int[][] Getimagedata (String path) { BufferedImage Bi=null;try{bi=imageio.read (new File);} catch (IOException e) {e.printstacktrace ();} int width=bi.getwidth (); int height=bi.getheight (); int [] [] data=new int[width][height];for (int i=0;i<width;i++) for (int j=0;j
The <pre name= "code" class= "Java" > private void Imagedataout (String path) {color C0=new color, which is used to output the image ( 255,0,0); Color C1=new color (0,255,0); Color C2=new color (0,0,255); Color C3=new color (128,128,128); BufferedImage nbi=new BufferedImage (SOURCE.LENGTH,SOURCE[0].LENGTH,BUFFEREDIMAGE.TYPE_INT_RGB); for (INT i=0;i< source.length;i++) {for (int j=0;j<source[0].length;j++) {if (source[i][j].group==0) Nbi.setrgb (i, J, C0.getRGB ()); else if (source[i][j].group==1) Nbi.setrgb (i, J, C1.getrgb ()), else if (source[i][j].group==2) Nbi.setrgb (i, J, C2.getrgb ()); else if (source[i][j].group==3) Nbi.setrgb (i, J, C3.getrgb ());//color c=new Color ((int) Center[source[i][j].group] . r,//(int) center[source[i][j].group].g, (int) center[source[i][j].group].b);//nbi.setrgb (i, J, C.getrgb ());}} Try{imageio.write (NBI, "JPG", new File (path)); catch (IOException e) {e.printstacktrace ();}}
You asked me what DataItem was. I'll let you know when I finish my homework.③. It's a little different, note the data format. Fat starts with the int type, and the result overflows when computing the new cluster center ... Fortunately, Peng Peng changed to double, but Peng Peng in the calculation of distance and write the wrong, and finally the wit of fat Peng solved all the bugs. ④. Note The order in which the data is protected when reading the picture, i.e., using a two-dimensional array to store it, so that it is not necessary to record the location of the pixel at the time of writing, and the output time is also very convenient. that's all .... I'll have a complete code explanation when I finish my homework!
Java implementation of the "Java" K-means algorithm and image segmentation