DBSACN instance analysis based on Python clustering algorithm

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the Python clustering algorithm DBSACN, and analyzes in detail the principle and implementation skills of the DBSACN algorithm based on the example form, which has some reference value, for more information about the Python clustering algorithm DBSACN, see the example in this article. We will share this with you for your reference. The details are as follows:

DBSCAN:It is a simple density-based clustering algorithm. In this implementation, DBSCAN uses the center-based method. In the center-based method, the density of each data point is measured by the number of other data points in the grid (neighborhood) with the point as the center and the side length as 2 * EPs. Data points are classified into three types based on their density:

Core point: the density of the point in the neighborhood exceeds the given threshold value MinPs.
Boundary point: this is not the core point, but its neighbor contains at least one core point.
Noise point: it is neither a core point nor a boundary point.

With the division of the preceding logarithm data points, aggregation can be performed as follows: each core point is placed in the same cluster with all the core points in its neighborhood, place a boundary point and a core point in its neighborhood in the same cluster.

# Scoding = utf-8import pylab as plfrom collections import defaultdict, Counterpoints = [[int (eachpoint. split ("#") [0]), int (eachpoint. split ("#") [1])] for eachpoint in open ("points", "r")] # Calculate the adjacent data points of each data point, the neighborhood is defined as a grid with a side length of 2 * EPs centered on this point. Eps = 10 surroundPoints = defaultdict (list) for idx1, point1 in enumerate (points): for idx2, point2 in enumerate (points): if (idx1 <idx2): if (abs (point1 [0]-point2 [0]) <= Eps and abs (point1 [1]-point2 [1]) <= Eps): surroundPoints [idx1]. append (idx2) surroundPoints [idx2]. append (idx1) # defines the number of adjacent data points in the neighborhood. for example, MinPts = 5 corePointIdx = [pointIdx for pointIdx, surPointIdxs in surroundPoints. iteritems () if len (surPointIdxs)> = MinPts] # Non-core points of a core point in the neighborhood are defined as the boundpoint borderPointIdx = [] for pointIdx, surPointIdxs in surroundPoints. iteritems (): if (pointIdx not in corePointIdx): for onesurPointIdx in surPointIdxs: if onesurPointIdx in corePointIdx: borderPointIdx. append (pointIdx) break # The noise point is neither a boundary nor a core point noisePointIdx = [pointIdx for pointIdx in range (len (points )) if pointIdx not in corePointIdx and pointIdx not in borderPointIdx] corePoint = [points [pointIdx] for pointIdx in corePointIdx] borderPoint = [points [pointIdx] for pointIdx in Hangzhou] noisePoint = [points [pointIdx] for pointIdx in noisePointIdx] # pl. plot ([eachpoint [0] for eachpoint in corePoint], [eachpoint [1] for eachpoint in corePoint], 'OR') # pl. plot ([eachpoint [0] for eachpoint in borderPoint], [eachpoint [1] for eachpoint in borderPoint], 'oy') # pl. plot ([eachpoint [0] for eachpoint in noisePoint], [eachpoint [1] for eachpoint in noisePoint], 'OK ') groups = [idx for idx in range (len (points)] # each core point is placed in the same cluster as all the core points in its neighborhood for pointidx, surroundIdxs in surroundPoints. iteritems (): for oneSurroundIdx in surroundIdxs: if (pointidx in corePointIdx and when in corePointIdx and pointidx <oneSurroundIdx): for idx in range (len (groups )): if groups [idx] = groups [oneSurroundIdx]: groups [idx] = groups [pointidx] # place the boundary point in the same cluster with a core point in the neighboring region for pointidx, surroundIdxs in surroundPoints. iteritems (): for oneSurroundIdx in surroundIdxs: if (pointidx in borderPointIdx and oneSurroundIdx in corePointIdx ): groups [pointidx] = groups [oneSurroundIdx] break # obtain the five largest clusters of the cluster. wantGroupNum = 3 finalGroup = Counter (groups ). most_common (3) finalGroup = [onecount [0] for onecount in finalGroup] group1 = [points [idx] for idx in xrange (len (points )) if groups [idx] = finalGroup [0] group2 = [points [idx] for idx in xrange (len (points )) if groups [idx] = finalGroup [1] group3 = [points [idx] for idx in xrange (len (points )) if groups [idx] = finalGroup [2] pl. plot ([eachpoint [0] for eachpoint in group1], [eachpoint [1] for eachpoint in group1], 'or') pl. plot ([eachpoint [0] for eachpoint in group2], [eachpoint [1] for eachpoint in group2], 'oy') pl. plot ([eachpoint [0] for eachpoint in group3], [eachpoint [1] for eachpoint in group3], 'og ') # print noise point, Black pl. plot ([eachpoint [0] for eachpoint in noisePoint], [eachpoint [1] for eachpoint in noisePoint], 'OK') pl. show ()

The running effect is as follows:

I hope this article will help you with Python programming.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

DBSACN instance analysis based on Python clustering algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

DBSACN instance analysis based on Python clustering algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support