TensorFlow Foundation temporarily does not introduce
Installation of Python related libraries before entering the formal clustering experiment, we also need to configure the calculation and drawing needs to use the relevant support package. Install Seaborn:
Pip Install Seaborn
Install Matplotlib:
Pip Install Matplotlib
Install PYTHON3-TK:
sudo apt-get install python3-tk-y
K-means Clustering Algorithm Step introduction The K-means algorithm is the most classical clustering method based on partition, and it is one of the ten classical data mining algorithms. The basic idea of the K-means algorithm is to classify the objects closest to them by using K points in space as the centroid. Through iterative methods, the values of the centroid of each cluster are updated successively until the best clustering results are obtained. Among them, the centroid can be the actual point or the virtual point. The above block diagram is the most basic K-menas algorithm, and you can learn how to improve the algorithm yourself. Test data Preparation This tutorial is for random generation
200
Data for K-means clustering, let's start by understanding the form of some of the generated test dataGenerate Data code:
#-*-coding:utf-8-*-#-*-coding:utf-8-*-ImportMatplotlibmatplotlib.use ('Agg')ImportNumPy as NP fromNumpy.linalgImportCholeskyImportMatplotlib.pyplot as Plt########### #生成随机测试数据 ###############Sampleno = 200;#number of generated dataMu =3#Two-dimensional normal distributionmu = Np.array ([[1, 5]]) Sigma= Np.array ([[1, 0.5], [1.5, 3]]) R=Cholesky (Sigma) Srcdata= Np.dot (Np.random.randn (Sampleno, 2), R) +Muplt.plot (srcdata[:,0],srcdata[:,1],'Bo') Plt.savefig ('Data0.png')
You can view the results of random test data generation, the data is a two-dimensional normal distribution, the shape of the implementation of the above program, the results:
Implementation of K-means Clustering algorithm
Code:
#-*-coding:utf-8-*-ImportMatplotlibmatplotlib.use ('Agg')ImportNumPy as NP fromNumpy.linalgImportCholeskyImportMatplotlib.pyplot as PltImportSeaborn as SNSImportPandas as PDImportTensorFlow as TF fromRandomImportChoice, Shuffle fromNumPyImportArraydefkmeanscluster (vectors, noofclusters): Noofclusters=Int (noofclusters)assertNoofclusters <len (vectors)#Find out the dimensions of each vectorDim =Len (vectors[0])#assist randomly selecting the centroid from the available vectorsVector_indices =list (range len (vectors)) Shuffle (vector_indices)#calculation DiagramGraph =TF. Graph () with Graph.as_default ():#Calculated SessionsSess =TF. Session ()####### #从现有的点集合中抽取出一部分作为默认的中心点 ########Centroids =[TF. Variable ((Vectors[vector_indices[i])) forIinchrange (noofclusters)] Centroid_value= Tf.placeholder ("float64", [Dim]) Cent_assigns= [] forCentroidinchcentroids:cent_assigns.append (Tf.assign (centroid, Centroid_value)) Assignments= [TF. Variable (0) forIinchRange (len (vectors))] Assignment_value= Tf.placeholder ("Int32") cluster_assigns= [] forAssignmentinchAssignments:cluster_assigns.append (Tf.assign (Assignment, Assi Gnment_value))############ #下面创建用于计算平均值的操作节点 #############Mean_input = Tf.placeholder ("float", [None, Dim]) Mean_op=Tf.reduce_mean (mean_input, 0)################ #用于计算欧氏距离的节点 #################V1 = Tf.placeholder ("float", [Dim]) V2= Tf.placeholder ("float", [Dim]) Euclid_dist=tf.sqrt (Tf.reduce_sum (Tf.pow (tf.subtract (v1, v2),2)) centroid_distances= Tf.placeholder ("float", [noofclusters]) cluster_assignment=tf.argmin (centroid_distances, 0)################## #初始化所有的状态值 ###################Init_op =Tf.global_variables_initializer () sess.run (INIT_OP)###################### #集群遍历 ####################### #The largest expected algorithm is then used in K-means clustering iterations, and the maximum number of iterations is set directly to 30 times for simplicityNoofiterations = 30 forIteration_ninchRange (noofiterations):#################### #期望步骤 ##################### #first, iterate through all the vectors. forVector_ninchRange (len (vectors)): Vect=Vectors[vector_n]#calculates the Euclidean distance between the directed amount and the allocated centroiddistances = [Sess.run (Euclid_dist, feed_dict={v1:vect, V2:sess.run (centroid)}) forCentroidinchCentroids]#below you can use the cluster assignment operation to calculate the distance as inputAssignment = Sess.run (cluster_assignment, feed_dict ={centroid_distances:distances})#Next, assign the appropriate values to each vectorSess.run (Cluster_assigns[vector_n], feed_dict={assignment_value:assignment})################### #最大化的步骤 #################### #based on the desired steps above, the distance of each new centroid is calculated so that the sum of squares within the cluster is minimized forCluster_ninchRange (noofclusters):#collect all the vectors assigned to the clusterAssigned_vects = [Vectors[i] forIinchRange (len (vectors))ifSess.run (assignments[i]) = =Cluster_n]#calculate a new cluster centroidNew_location = Sess.run (Mean_op, feed_dict={Mean_input:array (assigned_vects)})#assign a proper centroid to each vectorSess.run (Cent_assigns[cluster_n], feed_dict={centroid_value:new_location})#return centroid and GroupingCentroids =Sess.run (centroids) Assignments=Sess.run (assignments)returnCentroids, Assignments########### #生成随机测试数据 ###############Sampleno = 200;#number of generated dataMu =3#data compliance with two-dimensional normal distributionmu = Np.array ([[1, 5]]) Sigma= Np.array ([[1, 0.5], [1.5, 3]]) R=Cholesky (Sigma) Srcdata= Np.dot (Np.random.randn (Sampleno, 2), R) +Muplt.plot (srcdata[:,0],srcdata[:,1],'Bo') Plt.savefig ('Data.png')############ Kmeans algorithm calculation ###############K=4Center,result=Kmeanscluster (srcdata,k)Print(center)########### #利用 Seaborn drawing ###############Res={"x":[],"y":[],"Kmeans_res":[]} forIinchRange (len (result)): res["x"].append (srcdata[i][0]) res["y"].append (srcdata[i][1]) res["Kmeans_res"].append (Result[i]) pd_res=PD. DataFrame (res) Sns.lmplot ("x","y", data=pd_res,fit_reg=false,size=5,hue="Kmeans_res") Plt.show () Plt.savefig ('Kmeans.png')
You can view the results of random test data generation in this cluster experiment, such as
As well as clustering results, the shape of the center will be in the terminal output, shaped like
Reference: https://codesachin.wordpress.com/2015/11/14/k-means-clustering-with-tensorflow/
TensorFlow (vii) K-means