According to the characteristics of various industries, a variety of clustering algorithms are proposed, which are divided into several categories: hierarchy, Division, density, graph theory, grid and model.
Among them, the density-based clustering algorithm is the most representative in Dbscan.
Assuming a set of data, the R code of the generated data is as follows
X1 <-seq (0, Pi,length. out= -) Y1<-sin (x1) +0.1*rnorm ( -) X2<-1.5+ SEQ (0, Pi,length. out= -) Y2<-cos (x2) +0.1*rnorm ( -) Data<-Data.frame (C (X1,X2), C (y1,y2)) names (data)<-C ('x','y') Qplot (data$x, data$y)
Using the density clustering Dbscan method, we can see that the clustering effect is as follows:
<- Ggplot (Data,aes (x, y)) library ('FPC'<-dbscan (data,eps= 0.6, minpts=4+ geom_point (size=2.5, AES (Colour=factor (Model2$cluster))) +theme ( legend.position='top')
Similarly, readers should look at the clustering effect of K-means.
<-Kmeans (data,centers=2, nstart=<-+ geom_point (size=2.5, AES (Colour=factor (Model1$cluster))) +theme (legend.position='top')
Therefore, different data sets and scenarios need to use different clustering algorithms.
The following describes how the algorithm works.
among them, the Dbscan method is sensitive to parameters EPs and minpts.
In this algorithm framework, NEPs (x, D) represents the data set D contained within the eps-neighborhood of object X
All child objects. The card (n) represents the cardinality of the set N, which is the number of elements contained in the set N. in cluster expansion
The stack structure is used to stack all the neighbor objects of the current object x, and then recursively judge the stack members
Whether the core object conditions are met, thus deciding whether to expand further.
Postscript:
1 about the general introduction of the algorithm, you can see the introduction of Baidu Encyclopedia. http://baike.baidu.com/link?url=cnLtGJsF_a4CzmVbAev3nFH75nZUMgwClKv_kk2ZsXuXrP1gvY8eMvY75UDL29AMJFJ2n60xB680PMkjitrG4a
2 According to the above algorithm flow, the author wrote the Java code into the Baidu cloud disk (including the above test data), interested readers please download themselves. Http://pan.baidu.com/s/1i3J7Adf
3 References "Research on Dbscan Clustering algorithm for heterogeneous datasets" Chongqing University Master's thesis Chen Yootian II o April 13 Http://pan.baidu.com/s/1mgvKR7U
Finish
Dbscan algorithm based on density clustering