"Turn" common clustering Algorithm (a) Dbscan algorithm

Source: Internet
Author: User

Original link: http://www.cnblogs.com/chaosimple/p/3164775.html#undefined

1, Dbscan Introduction

DBSCAN (density-based spatial clustering of applications with Noise, a density-based clustering method with noise) is a spatial clustering algorithm based on density. The algorithm divides the areas with sufficient density into clusters and discovers any shape clusters in a noisy spatial database, which defines clusters as the largest set of points connected by density.

The algorithm utilizes the concept of density-based clustering, which requires that the number of objects (points or other spatial objects) within a certain area of a cluster space is not less than a given threshold value. The significant advantage of the Dbscan algorithm is that the clustering speed is fast and can effectively deal with the noise point and discover the spatial clustering of arbitrary shape. However, since it operates directly on the entire database and is clustered using a global characterization of density parameters, it also has two more obvious weaknesses:

(1) When the amount of data increases, the need for large memory support I/O consumption is also very large;

(2) When the density of spatial clustering is uneven and the difference of cluster spacing is very large, the clustering quality is poor.

2. Comparison of Dbscan and traditional clustering algorithms

The purpose of the Dbscan algorithm is to filter low-density areas and find dense sample points. Unlike traditional hierarchical clustering and clustering-based convex clusters, the algorithm can find clusters of arbitrary shapes, which has the following advantages compared with traditional algorithms:

(1) compared with K-means, there is no need to enter the number of clusters to be divided;

(2) The shape of cluster cluster is not biased;

(3) The parameters of filtering noise can be entered when needed;

3, the basic definition of the algorithm involves:

( 1 ) Neighborhood : The area within a given object radius is called the neighborhood of the object.

( 2 Core Object : If the number of sample points in the neighborhood of a given object is greater than or equal to minpts, the object is called a core object.

( 3 direct density up to : Given an object Set D, if P is in the neighborhood of Q, and Q is a core object, then we say that the object p from the object q is directly density can be reached (directly density-reachable).

( 4 density up to : For sample Set D, if there is an object chain, for, is from the about and minpts direct density can be reached, then the object P is from the object Q about and minpts density can be reached (density-reachable).

( 5 density is connected : If an object is present, so that the object P and Q are both from O and minpts density, then the object p to Q is about the minpts density (density-connected).

It can be found that the density is up to the direct density of the transitive closure, and this relationship is asymmetric. Only the core objects are denser than each other. However, density is connected to a symmetric relationship. DBSCAN The goal is to find the largest set of density-connected objects.

4. Clustering process of Dbscan algorithm

The Dbscan algorithm is based on the fact that a cluster can be uniquely determined by any of its core objects . Equivalence can be expressed as: any data object that satisfies the condition of the core object P, all of the data objects in database D from the P- density can be composed of a set of a complete cluster C, and p belongs to c.

The specific clustering process for the algorithm is as follows:

Scan the entire data set to find any core point and expand the core point. The method of expansion is to find all the density-linked data points from the core point (note that the density is connected). Traverse All the core points in the neighborhood of the core point (because the boundary points are not extensible) and look for points that are connected to these data point densities until there are no data points to expand. Finally, the boundary nodes of clusters are non-core data points. Then you re-scan the dataset (excluding any data points in the cluster you were looking for), look for core points that are not clustered, and repeat the steps above to expand the core point until there is no new core point in the dataset. Data points that are not contained in any cluster in the dataset constitute an anomaly.

5. Algorithm pseudo-code

Algorithm Description:

Algorithm: DBSCAN

Input: e--radius

The minimum number of neighbors for the minpts--to be the core object within the E neighborhood.

d--collection.

Output: Target class Cluster collection

Method: Repeat

1) Determine if the input point is a core object

2) Find out all the direct density points in the E neighborhood of the core object.

Until all input points are judged

Repeat

A collection of maximum density connected objects is found for all direct density points within the E neighborhood of all core objects, involving a combination of density-able objects.

Until all the core object's E Realms are traversed

"Turn" common clustering Algorithm (a) Dbscan algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.