Canopy algorithm to compute cluster number of clusters

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kmeans is a classical algorithm in clustering, and the process is as follows:
Select K points as the initial centroid
Repeat
Assigns each point to the nearest centroid, forming a K-cluster
Recalculate the center of mass of each cluster
Until clusters do not change or reach the maximum number of iterations

The k in the algorithm needs to be artificially specified. There are a number of ways to determine k, such as multiple trials, calculation errors, the best K. This will take a long time. We can roughly determine the K value (which can be considered equal) according to the canopy algorithm. Look at the process of the canopy algorithm:

(1) Set the sample set to S, determine two thresholds T1 and T2, and t1>t2.
(2) To take a sample point P, as a canopy, recorded as C, remove p from S.
(3) Calculate the distance of all points to P in s Dist
(4) If the DIST<T1, then the corresponding point to C, as a weak association.
(5) If dist<t2, the corresponding point is moved out of S, as a strong association.
(6) Repeat (2) ~ (5) until S is empty.

The number of canopy can be used as the K value and the blindness of selection k is reduced to some extent. The following canopy algorithm for some points to calculate the number of canopy, if only the K value, then T1 has no effect, the use of designated T2 can be used here, the average distance of all points as a T2.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21st
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153

Package cn.edu.ustc.dm.cluster;

Import java.util.ArrayList;
Import java.util.List;

Import Cn.edu.ustc.dm.bean.Point;

/**
* Canopy algorithm calculates the K value in corresponding Kmeans with the help of canopy algorithm
* Which for the calculation of K value, the canopy algorithm T1 meaningless, only with the set T2 (T1&GT;T2) Here we will T2 set to the average distance
*
* @author YD
*
*/
public class Canopy {
Private list<point> points = new arraylist<point> (); The point of clustering
Private list<list<point>> clusters = new arraylist<list<point>> (); Storage Cluster
Private double T2 =-1; Threshold value

Public canopy (list<point> points) {
for (Point point:points)
Make a deep copy
This.points.add (point);
}

/**
* Clustering, according to the canopy algorithm to calculate, all the points to cluster
*/
public void cluster () {
T2 = getaveragedistance (points);
while (Points.size ()!= 0) {
list<point> cluster = new arraylist<point> ();
Point basepoint = points.get (0); Datum points
Cluster.add (Basepoint);
Points.remove (0);
int index = 0;
while (Index < points.size ()) {
Point anotherpoint = Points.get (index);
Double distance = math.sqrt ((basepoint.x-anotherpoint.x)
* (Basepoint.x-anotherpoint.x)
+ (BASEPOINT.Y-ANOTHERPOINT.Y)
* (BASEPOINT.Y-ANOTHERPOINT.Y));
if (distance <= T2) {
Cluster.add (Anotherpoint);
Points.remove (index);
} else {
index++;
}
}
Clusters.add (cluster);
}
}

/**
* Number of cluster received
*
* Number of @return
*/
public int Getclusternumber () {
return Clusters.size ();
}

/**
* Get the cluster corresponding to the center point (each point added to the average)
*
* @return
*/
Public list<point> getclustercenterpoints () {
list<point> centerpoints = new arraylist<point> ();
for (list<point> cluster:clusters) {
Centerpoints.add (Getcenterpoint (cluster));
}
return centerpoints;
}

/**
* The resulting center point (the sum of each point is averaged)
*
* @return return to the center point
*/
Private double getaveragedistance (list<point> points) {
Double sum = 0;
int pointsize = Points.size ();
for (int i = 0; i < pointsize; i++) {
for (int j = 0; J < Pointsize; J + +) {
if (i = = j)
Continue
Point Pointa = Points.get (i);
Point pointb = Points.get (j);
Sum + + math.sqrt ((pointa.x-pointb.x) * (pointa.x-pointb.x)
+ (POINTA.Y-POINTB.Y) * (POINTA.Y-POINTB.Y));
}
}
int distancenumber = pointsize * (pointsize + 1)/2;
Double T2 = SUM/DISTANCENUMBER/2; Half of the average distance
return T2;
}

/**
* The resulting center point (the sum of each point is averaged)
*
* @return return to the center point
*/
Private Point Getcenterpoint (list<point> points) {
Double sumx = 0;
Double SumY = 0;
for (point point:points) {
Sumx + = Point.x;
SumY + = Point.y;
}
int clustersize = Points.size ();
Point centerpoint = new Point (Sumx/clustersize, sumy/clustersize);
return centerpoint;
}

/**
* Get the threshold value T2
*
* @return Threshold value T2
*/
Public double Getthreshold () {
return T2;
}

/**
* Test 9 points for operation
* @param args
*/
public static void Main (string[] args) {
List<point> points = new arraylist<point> ();
Points.Add (new point (0, 0));
Points.Add (new Point (0, 1));
Points.Add (New Point (1, 0));

Points.Add (New Point (5, 5));
Points.Add (New Point (5, 6));
Points.Add (New Point (6, 5));

Points.Add (New Point (10, 2));
Points.Add (New Point (10, 3));
Points.Add (New Point (11, 3));

Canopy canopy = new canopy (points);
Canopy.cluster ();

Get Number of canopy
int clusternumber = Canopy.getclusternumber ();
System.out.println (Clusternumber);

Gets the value of T2 in canopy
System.out.println (Canopy.getthreshold ());
}
}

The above code is to 9 points using the canopy algorithm to calculate, get canopy number, also known as K.

More articles please go to Xiao Fat Xuan.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Canopy algorithm to compute cluster number of clusters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Canopy algorithm to compute cluster number of clusters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support