K_means Clustering algorithm using Sklearn

Source: Internet
Author: User

First, attach the official website description
[Http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#examples-using-sklearn-cluster-kmeans]

Attach a translation document
http://blog.csdn.net/xiaoyi_zhang/article/details/52269242

Another example of Baidu search (infringement delete):

#-*-Coding:utf-8-*-From Sklearn.clusterImport KmeansFrom Sklearn.externalsImport JoblibImport numpyfinal = open (' C:/test/final.dat ',' r ') data = [Line.strip (). Split (' \ t ')For linein final]feature = [[[] Float (x)For XIn row[3:]]For rowin data]  #调用kmeans类clf = Kmeans (N_clusters=9) s = Clf.fit (feature) print s #9个中心 print clf.cluster_centers _ #每个样本所属的簇 print clf.labels_# To assess whether the number of clusters is appropriate, the smaller the distance, the better the cluster, the number of clusters to select the critical point print clf.inertia_ #进行预测 print clf.predict (feature)  #保存模型joblib. Dump (CLF,  ' c:/km.pkl ')  #载入保存的模型clf = Joblib.load ( ' c:/ Km.pkl ')  "#用来评估簇的个数是否合适, the smaller the distance, the better the cluster, the number of clusters to select the critical point for I in range (5,30,1): CLF = Kmeans (n_ Clusters=i) s = clf.fit (feature) print I, Clf.inertia_ "        
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21st
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40

Beginner's explanations are as follows:
Reference http://www.cnblogs.com/meelo/p/4272677.html
Sklearn has a consistent interface for all machine learning algorithms, which typically requires several steps to learn:
1, initialize the classifier, according to different algorithms, need to give different parameters, generally all parameters have a default value.

(1) for K-mean clustering, we need a given class number N_cluster, the default value is 8;
(2) Max_iter is the number of iterations, where the maximum number of iterations is set to 300;
(3) N_init set to 10 means 10 random initialization, select the best one to use as the model;
(4) init= ' k-means++ ' will automatically find the appropriate n_clusters by the program;
(5) Tol:float, the default value = 1e-4, combined with inertia to determine the convergence conditions;
(6) N_jobs: Specify the number of processes used in the calculation;
(7) verbose parameter set the degree of printing solution process, the higher the value, the more detailed printing;
(8) Copy_x: Boolean, default Value =true. When we precomputing distances, we get more accurate results from the data center. If the value of this parameter is set to True, the original data is not changed. If False, the original data is directly
Modify it and restore it when the function returns a value. However, due to the addition and subtraction of the data mean during the calculation, there may be a small difference between the original data and the calculation before the data is returned.
Property:

(1) Cluster_centers_: vector, [N_clusters, N_features]
Coordinates of cluster centers (coordinates of each cluster center?? );
(2) Labels_: The classification of each point;
(3) Inertia_:float, the sum of the distances of each point to the centroid of its cluster.
For example, one of my code gets the result:

2, for unsupervised machine learning, the input data is the characteristics of the sample, Clf.fit (X) can input data into the classifier.
3, using the classifier to classify the unknown data, it is necessary to use the classifier Predict method.

K_means Clustering algorithm using Sklearn

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.