DBSACN instance analysis based on Python clustering algorithm

Source: Internet
Author: User
This article mainly introduces the Python clustering algorithm DBSACN, and analyzes in detail the principle and implementation skills of the DBSACN algorithm based on the example form, which has some reference value, for more information about the Python clustering algorithm DBSACN, see the example in this article. We will share this with you for your reference. The details are as follows:

DBSCAN:It is a simple density-based clustering algorithm. In this implementation, DBSCAN uses the center-based method. In the center-based method, the density of each data point is measured by the number of other data points in the grid (neighborhood) with the point as the center and the side length as 2 * EPs. Data points are classified into three types based on their density:

Core point: the density of the point in the neighborhood exceeds the given threshold value MinPs.
Boundary point: this is not the core point, but its neighbor contains at least one core point.
Noise point: it is neither a core point nor a boundary point.

With the division of the preceding logarithm data points, aggregation can be performed as follows: each core point is placed in the same cluster with all the core points in its neighborhood, place a boundary point and a core point in its neighborhood in the same cluster.

# Scoding = utf-8import pylab as plfrom collections import defaultdict, Counterpoints = [[int (eachpoint. split ("#") [0]), int (eachpoint. split ("#") [1])] for eachpoint in open ("points", "r")] # Calculate the adjacent data points of each data point, the neighborhood is defined as a grid with a side length of 2 * EPs centered on this point. Eps = 10 surroundPoints = defaultdict (list) for idx1, point1 in enumerate (points): for idx2, point2 in enumerate (points): if (idx1 <idx2): if (abs (point1 [0]-point2 [0]) <= Eps and abs (point1 [1]-point2 [1]) <= Eps): surroundPoints [idx1]. append (idx2) surroundPoints [idx2]. append (idx1) # defines the number of adjacent data points in the neighborhood. for example, MinPts = 5 corePointIdx = [pointIdx for pointIdx, surPointIdxs in surroundPoints. iteritems () if len (surPointIdxs)> = MinPts] # Non-core points of a core point in the neighborhood are defined as the boundpoint borderPointIdx = [] for pointIdx, surPointIdxs in surroundPoints. iteritems (): if (pointIdx not in corePointIdx): for onesurPointIdx in surPointIdxs: if onesurPointIdx in corePointIdx: borderPointIdx. append (pointIdx) break # The noise point is neither a boundary nor a core point noisePointIdx = [pointIdx for pointIdx in range (len (points )) if pointIdx not in corePointIdx and pointIdx not in borderPointIdx] corePoint = [points [pointIdx] for pointIdx in corePointIdx] borderPoint = [points [pointIdx] for pointIdx in Hangzhou] noisePoint = [points [pointIdx] for pointIdx in noisePointIdx] # pl. plot ([eachpoint [0] for eachpoint in corePoint], [eachpoint [1] for eachpoint in corePoint], 'OR') # pl. plot ([eachpoint [0] for eachpoint in borderPoint], [eachpoint [1] for eachpoint in borderPoint], 'oy') # pl. plot ([eachpoint [0] for eachpoint in noisePoint], [eachpoint [1] for eachpoint in noisePoint], 'OK ') groups = [idx for idx in range (len (points)] # each core point is placed in the same cluster as all the core points in its neighborhood for pointidx, surroundIdxs in surroundPoints. iteritems (): for oneSurroundIdx in surroundIdxs: if (pointidx in corePointIdx and when in corePointIdx and pointidx <oneSurroundIdx): for idx in range (len (groups )): if groups [idx] = groups [oneSurroundIdx]: groups [idx] = groups [pointidx] # place the boundary point in the same cluster with a core point in the neighboring region for pointidx, surroundIdxs in surroundPoints. iteritems (): for oneSurroundIdx in surroundIdxs: if (pointidx in borderPointIdx and oneSurroundIdx in corePointIdx ): groups [pointidx] = groups [oneSurroundIdx] break # obtain the five largest clusters of the cluster. wantGroupNum = 3 finalGroup = Counter (groups ). most_common (3) finalGroup = [onecount [0] for onecount in finalGroup] group1 = [points [idx] for idx in xrange (len (points )) if groups [idx] = finalGroup [0] group2 = [points [idx] for idx in xrange (len (points )) if groups [idx] = finalGroup [1] group3 = [points [idx] for idx in xrange (len (points )) if groups [idx] = finalGroup [2] pl. plot ([eachpoint [0] for eachpoint in group1], [eachpoint [1] for eachpoint in group1], 'or') pl. plot ([eachpoint [0] for eachpoint in group2], [eachpoint [1] for eachpoint in group2], 'oy') pl. plot ([eachpoint [0] for eachpoint in group3], [eachpoint [1] for eachpoint in group3], 'og ') # print noise point, Black pl. plot ([eachpoint [0] for eachpoint in noisePoint], [eachpoint [1] for eachpoint in noisePoint], 'OK') pl. show ()

The running effect is as follows:

I hope this article will help you with Python programming.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.