"Turn" common clustering Algorithm (a) Dbscan algorithm

Last Update:2016-10-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original link: http://www.cnblogs.com/chaosimple/p/3164775.html#undefined

1, Dbscan Introduction

DBSCAN (density-based spatial clustering of applications with Noise, a density-based clustering method with noise) is a spatial clustering algorithm based on density. The algorithm divides the areas with sufficient density into clusters and discovers any shape clusters in a noisy spatial database, which defines clusters as the largest set of points connected by density.

The algorithm utilizes the concept of density-based clustering, which requires that the number of objects (points or other spatial objects) within a certain area of a cluster space is not less than a given threshold value. The significant advantage of the Dbscan algorithm is that the clustering speed is fast and can effectively deal with the noise point and discover the spatial clustering of arbitrary shape. However, since it operates directly on the entire database and is clustered using a global characterization of density parameters, it also has two more obvious weaknesses:

(1) When the amount of data increases, the need for large memory support I/O consumption is also very large;

(2) When the density of spatial clustering is uneven and the difference of cluster spacing is very large, the clustering quality is poor.

2. Comparison of Dbscan and traditional clustering algorithms

The purpose of the Dbscan algorithm is to filter low-density areas and find dense sample points. Unlike traditional hierarchical clustering and clustering-based convex clusters, the algorithm can find clusters of arbitrary shapes, which has the following advantages compared with traditional algorithms:

(1) compared with K-means, there is no need to enter the number of clusters to be divided;

(2) The shape of cluster cluster is not biased;

(3) The parameters of filtering noise can be entered when needed;

3, the basic definition of the algorithm involves:

( 1 ) Neighborhood : The area within a given object radius is called the neighborhood of the object.

( 2 Core Object : If the number of sample points in the neighborhood of a given object is greater than or equal to minpts, the object is called a core object.

( 3 direct density up to : Given an object Set D, if P is in the neighborhood of Q, and Q is a core object, then we say that the object p from the object q is directly density can be reached (directly density-reachable).

( 4 density up to : For sample Set D, if there is an object chain, for, is from the about and minpts direct density can be reached, then the object P is from the object Q about and minpts density can be reached (density-reachable).

( 5 density is connected : If an object is present, so that the object P and Q are both from O and minpts density, then the object p to Q is about the minpts density (density-connected).

It can be found that the density is up to the direct density of the transitive closure, and this relationship is asymmetric. Only the core objects are denser than each other. However, density is connected to a symmetric relationship. DBSCAN The goal is to find the largest set of density-connected objects.

4. Clustering process of Dbscan algorithm

The Dbscan algorithm is based on the fact that a cluster can be uniquely determined by any of its core objects . Equivalence can be expressed as: any data object that satisfies the condition of the core object P, all of the data objects in database D from the P- density can be composed of a set of a complete cluster C, and p belongs to c.

The specific clustering process for the algorithm is as follows:

Scan the entire data set to find any core point and expand the core point. The method of expansion is to find all the density-linked data points from the core point (note that the density is connected). Traverse All the core points in the neighborhood of the core point (because the boundary points are not extensible) and look for points that are connected to these data point densities until there are no data points to expand. Finally, the boundary nodes of clusters are non-core data points. Then you re-scan the dataset (excluding any data points in the cluster you were looking for), look for core points that are not clustered, and repeat the steps above to expand the core point until there is no new core point in the dataset. Data points that are not contained in any cluster in the dataset constitute an anomaly.

5. Algorithm pseudo-code

Algorithm Description:

Algorithm: DBSCAN

Input: e--radius

The minimum number of neighbors for the minpts--to be the core object within the E neighborhood.

d--collection.

Output: Target class Cluster collection

Method: Repeat

1) Determine if the input point is a core object

2) Find out all the direct density points in the E neighborhood of the core object.

Until all input points are judged

Repeat

A collection of maximum density connected objects is found for all direct density points within the E neighborhood of all core objects, involving a combination of density-able objects.

Until all the core object's E Realms are traversed

"Turn" common clustering Algorithm (a) Dbscan algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Turn" common clustering Algorithm (a) Dbscan algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Turn" common clustering Algorithm (a) Dbscan algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support