K_means Clustering algorithm using Sklearn

Last Update:2018-05-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, attach the official website description
[Http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#examples-using-sklearn-cluster-kmeans]

Attach a translation document
http://blog.csdn.net/xiaoyi_zhang/article/details/52269242

Another example of Baidu search (infringement delete):

#-*-Coding:utf-8-*-From Sklearn.clusterImport KmeansFrom Sklearn.externalsImport JoblibImport numpyfinal = open (' C:/test/final.dat ',' r ') data = [Line.strip (). Split (' \ t ')For linein final]feature = [[[] Float (x)For XIn row[3:]]For rowin data]  #调用kmeans类clf = Kmeans (N_clusters=9) s = Clf.fit (feature) print s #9个中心 print clf.cluster_centers _ #每个样本所属的簇 print clf.labels_# To assess whether the number of clusters is appropriate, the smaller the distance, the better the cluster, the number of clusters to select the critical point print clf.inertia_ #进行预测 print clf.predict (feature)  #保存模型joblib. Dump (CLF,  ' c:/km.pkl ')  #载入保存的模型clf = Joblib.load ( ' c:/ Km.pkl ')  "#用来评估簇的个数是否合适, the smaller the distance, the better the cluster, the number of clusters to select the critical point for I in range (5,30,1): CLF = Kmeans (n_ Clusters=i) s = clf.fit (feature) print I, Clf.inertia_ "

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21st
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Beginner's explanations are as follows:
Reference http://www.cnblogs.com/meelo/p/4272677.html
Sklearn has a consistent interface for all machine learning algorithms, which typically requires several steps to learn:
1, initialize the classifier, according to different algorithms, need to give different parameters, generally all parameters have a default value.

(1) for K-mean clustering, we need a given class number N_cluster, the default value is 8;
(2) Max_iter is the number of iterations, where the maximum number of iterations is set to 300;
(3) N_init set to 10 means 10 random initialization, select the best one to use as the model;
(4) init= ' k-means++ ' will automatically find the appropriate n_clusters by the program;
(5) Tol:float, the default value = 1e-4, combined with inertia to determine the convergence conditions;
(6) N_jobs: Specify the number of processes used in the calculation;
(7) verbose parameter set the degree of printing solution process, the higher the value, the more detailed printing;
(8) Copy_x: Boolean, default Value =true. When we precomputing distances, we get more accurate results from the data center. If the value of this parameter is set to True, the original data is not changed. If False, the original data is directly
Modify it and restore it when the function returns a value. However, due to the addition and subtraction of the data mean during the calculation, there may be a small difference between the original data and the calculation before the data is returned.
Property:

(1) Cluster_centers_: vector, [N_clusters, N_features]
Coordinates of cluster centers (coordinates of each cluster center?? )；
(2) Labels_: The classification of each point;
(3) Inertia_:float, the sum of the distances of each point to the centroid of its cluster.
For example, one of my code gets the result:

2, for unsupervised machine learning, the input data is the characteristics of the sample, Clf.fit (X) can input data into the classifier.
3, using the classifier to classify the unknown data, it is necessary to use the classifier Predict method.

K_means Clustering algorithm using Sklearn

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

K_means Clustering algorithm using Sklearn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

K_means Clustering algorithm using Sklearn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support