5. Unsupervised Learning-dbscan Clustering algorithm and its application

Last Update:2017-06-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Dbscan method and application 1.DBSCAN density cluster introduction

The DBSCAN algorithm is a density-based clustering algorithm:
1. Clustering does not require the number of pre-specified clusters
2. The number of final clusters is uncertain
The Dbscan algorithm divides data points into three categories:
1. Core point: In the RADIUS EPS contains more than minpts number of points.
2. Boundary point: The number of points within the radius EPS is less than minpts, but falls within the vicinity of the core point.
3. Noise point: A point that is neither a core point nor a boundary point.

As shown in: The yellow point in the figure is the boundary point, because within the radius EPS, the point within its domain is not more than minpts, we set the minpts here to 5, and the middle white point is the core point, because its neighbors are more than minpts (5) points in the point, The dots in its neighborhood are those yellow dots!

Process of the 2.DBSCAN algorithm

1. Mark all points as core points, boundary points, or noise points;
2. Delete the noise point;
3. Assign an edge to the distance between all core points within the EPS;
4. Each group of connected core points forms a cluster;
5. Assign each boundary point to a cluster of core points associated with it (within the radius of the core point).

3. Application examples

Data introduction

The existing university campus network log data, 290 college students of the campus network usage data, data including user ID, device MAC address, IP address, start the Internet time, stop the Internet time, Internet time, Campus network package. Using existing data, the model of students ' surfing the internet is analyzed.

Experimental purpose
Through Dbscan clustering, we analyze the mode of students ' Internet time and the length of Internet .

Technical Route
Adoption: Sklearn.cluster.DBSCAN Module

For an example of a data show:

Through clustering analysis of the online time and the cluster analysis of the Internet, we want the time of the students to surf the internet and the distribution results of time.

1. Set up the project, import Sklearn related package
Import NumPy as NP
From Sklearn.cluster import DBSCAN
Note: Dbscan main parameters:
1.eps: Two samples are considered the maximum distance from the neighbor node
2.min_samples: Number of samples in a cluster
3.metric: Distance calculation method
Example: Sklearn.cluster.DBSCAN (eps=0.5,min_samples=5,metric= ' Euclidean ') #euclidean表明我们要采用欧氏距离计算样本点的距离!

3-1. online time clustering, create Dbscan algorithm instances, and train to get tags:

4. Output tab, view results

In order to show the result better, we can draw it into the form of histogram, which is easy for us to analyze; we use the Hist function in the Matplotlib library to display the histogram:

5. Draw the histogram to analyze the experimental results:

6. Data Distribution vs Clustering

This is a small machine learning skills, the data distribution on the left is not suitable for clustering analysis, if we want to cluster analysis of such data, we need to do some mathematical transformation of these data, usually we take the logarithm of the transformation method, after the transformation of this data, the transformed data is more suitable for clustering analysis;

3-2. Cluster on the Internet , create an instance of the Dbscan algorithm, and train to get tags:

4-2. Output tab, view results

We can also see: The time-long clustering effect is not as obvious as the clustering effect of time!

5. Unsupervised Learning-dbscan Clustering algorithm and its application

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

5. Unsupervised Learning-dbscan Clustering algorithm and its application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

5. Unsupervised Learning-dbscan Clustering algorithm and its application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support