DBSCAN
- density-based Spatial Clustering of application with Noise
It can discover cluster of arbitrary shape
A cluster is defined as a maximal set of density-connected points
Parameters
- Eps:maximun radius of the neighbourhood
- Minpts:minimum number of points in the eps-neighbourhood of a.
Suppose we have a point Q, with the pre-determined parameters. If the number of neighbourhood within the Eps is larger than the value of minpts, we say this point is a core.
Three types of points
- Core Point:dense Neighborhood
- Border point:in the cluster, but neighbourhood isn't dense, or can be reached by other cluster
- Noise/outlier:not in a cluster and also cannot is reached by other cluster.
Directly density-reachable:a Point P was Directly density-reachable from Q if:
- P belongs to
- Q itself is a core point:
Density-reachable
A point P was density-reachable from a point q if there are a chain of points P1,... pn, s.t p1=q, Pn=p and pi+1 is directly Density-reachable from Pi
density-connected
A point was density-connected to a point q if there was a point O such this both P and Q are density-reachable from O.even I F both P and Q can be a border, they could are in the same cluster as long as there are a point o this it is Density-reachab Le to P and Q.
Algorithm
- Arbitrarily select a point p.
Retrieve all points density-reachable from P under the constrain of Eps and minpts.
- If P is a core point, a cluster is formed then the border is also found.
- If P is a border, no points was density-reachable from P. Then P was a noise or outlier, DBSCAN just skips to the next point.
Continue the process until all the points has been processed.
But DBSCAN are sensitive to the setting of Eps and minpts.
Intro to DBSCAN