Welcome reprint, Reprint Please specify: This article from Bin column Blog.csdn.net/xbinworld.
This is a relatively new clustering method (the article did not see the author's name, here I would like to call this method for the local density CLUSTERING,LDC), in the cluster of this ancient theme seems to have a few recent years of breakthrough, this article is very good, The method is very enlightening (clairvoyant), and it is published in science, and the attention is naturally very great.
The core highlights of this article: 1 is to use a relatively novel method to determine the cluster center, 2 is the use of the distance of the local density to classify the cluster. In these two points, the common Kmeans algorithm uses the method is: uses each kind of mean as the midpoint, uses the distance the nearest point to determine the cluster division. The LDC method is described below.
First of all, the cluster center, what kind of point is a good cluster center it. The author believes that the following two points need to be fulfilled:
1, the density of the sample point itself is large, that is, the density of its surrounding points should be less than it;
2, and greater than its density of points between the distance should be as large as possible;
Define the (1) local density function for each sample point, using cut-off kernel:
Ρi=∑j∈is∖{i}χ (DIJ−DC)
which
χ (a) ={1,0,a<0a>=0
DC is the truncation distance (cutoff distance), is a predetermined value, Dij represents the distance between the point I and j, can be calculated by any distance calculation formula; Is represents a set of labels for all the sample points in a dataset; Obviously, this local density function is a discrete function, It means the number of sample points in all sample points (excluding I points) and I point distances within the DC. It is also possible to use a continuous function to define local density, such as with Gaussian Kernel:
Ρi