First, attach the official website description
[Http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#examples-using-sklearn-cluster-kmeans]
Attach a translation document
http://blog.csdn.net/xiaoyi_zhang/article/details/52269242
Another example of Baidu search (infringement delete):
#-*-Coding:utf-8-*-From Sklearn.clusterImport KmeansFrom Sklearn.externalsImport JoblibImport numpyfinal = open (' C:/test/final.dat ',' r ') data = [Line.strip (). Split (' \ t ')For linein final]feature = [[[] Float (x)For XIn row[3:]]For rowin data] #调用kmeans类clf = Kmeans (N_clusters=9) s = Clf.fit (feature) print s #9个中心 print clf.cluster_centers _ #每个样本所属的簇 print clf.labels_# To assess whether the number of clusters is appropriate, the smaller the distance, the better the cluster, the number of clusters to select the critical point print clf.inertia_ #进行预测 print clf.predict (feature) #保存模型joblib. Dump (CLF, ' c:/km.pkl ') #载入保存的模型clf = Joblib.load ( ' c:/ Km.pkl ') "#用来评估簇的个数是否合适, the smaller the distance, the better the cluster, the number of clusters to select the critical point for I in range (5,30,1): CLF = Kmeans (n_ Clusters=i) s = clf.fit (feature) print I, Clf.inertia_ "
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21st
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
Beginner's explanations are as follows:
Reference http://www.cnblogs.com/meelo/p/4272677.html
Sklearn has a consistent interface for all machine learning algorithms, which typically requires several steps to learn:
1, initialize the classifier, according to different algorithms, need to give different parameters, generally all parameters have a default value.
(1) for K-mean clustering, we need a given class number N_cluster, the default value is 8;
(2) Max_iter is the number of iterations, where the maximum number of iterations is set to 300;
(3) N_init set to 10 means 10 random initialization, select the best one to use as the model;
(4) init= ' k-means++ ' will automatically find the appropriate n_clusters by the program;
(5) Tol:float, the default value = 1e-4, combined with inertia to determine the convergence conditions;
(6) N_jobs: Specify the number of processes used in the calculation;
(7) verbose parameter set the degree of printing solution process, the higher the value, the more detailed printing;
(8) Copy_x: Boolean, default Value =true. When we precomputing distances, we get more accurate results from the data center. If the value of this parameter is set to True, the original data is not changed. If False, the original data is directly
Modify it and restore it when the function returns a value. However, due to the addition and subtraction of the data mean during the calculation, there may be a small difference between the original data and the calculation before the data is returned.
Property:
(1) Cluster_centers_: vector, [N_clusters, N_features]
Coordinates of cluster centers (coordinates of each cluster center?? );
(2) Labels_: The classification of each point;
(3) Inertia_:float, the sum of the distances of each point to the centroid of its cluster.
For example, one of my code gets the result:
2, for unsupervised machine learning, the input data is the characteristics of the sample, Clf.fit (X) can input data into the classifier.
3, using the classifier to classify the unknown data, it is necessary to use the classifier Predict method.
K_means Clustering algorithm using Sklearn