Using R to complete Kmeans clustering requires calling the Kmeans method, using the dataset Iris to complete a small clustering experiment with the following code:
Newiris <-Iris;
Newiris$species <-NULL; #对训练数据去掉分类标记
KC <-Kmeans (Newiris, 3); #分类模型训练
fitted (KC); #查看具体分类情况
table (iris$species, kc$cluster); #查看分类概括
#聚类结果可视化
plot (newiris[c ("Sepal.length", "Sepal.width"), col = kc$cluster, pch = As.integer (iris$ species)); #不同的颜色代表不同的聚类结果, the different shapes represent the original classification of the training data set.
points (Kc$centers[,c ("Sepal.length", "Sepal.width")], col = 1:3, pch = 8, cex=2);
A visualization of the cluster results is shown below
There is a very good example in the R Help document, as follows, paying special attention to the conditions that the Kmeans method satisfies:
Require (graphics) # A 2-dimensional example x <-rbind (Matrix (rnorm (SD = 0.3), Ncol = 2), Matrix (rnorm (mean = 1, SD = 0.3), Ncol = 2)) colnames (x) <-C ("x", "Y") (CL <-Kmeans (x, 2)) plot (x, col = cl$cluster) poin TS (cl$centers, col = 1:2, pch = 8, CeX = 2) # Sum of squares # where the scale function provides data-centric functionality, the so-called centralization of data refers to the data in the dataset minus the mean value of the data set, which also provides the data Quasi-function, the so-called standardization of data refers to the data after centralization is divided by the standard deviation of the data set, that is, the data set in the dataset minus the average value of the dataset divided by the standard deviation of the dataset.
See http://it.zhans.org/10/1834.htm.
SS <-function (x) sum (scale (x, scale = FALSE) ^2) # # Cluster Centers "fitted" to each obs.: fitted.x <-fitted (CL);
Head (fitted.x);
Resid.x <-x-fitted (CL); # # Equalities:----------------------------------cbind (Cl[c ("Betweenss", "Tot.withinss", "TOTSS")], # The same of Colu MNS C (SS (Fitted.x), SS (Resid.x), SS (x)) # Kmeas clustering satisfies the following conditions Stopifnot (all.equal (cl$ totss, SS (x)), all. Equal (cl$ tot.withinss, SS (Resid.x)), # # These three is the same:all.equal (cl$ betweenss, SS (Fitted.x)), all . eqUAL (cl$ betweenss, Cl$totss-cl$tot.withinss), # # and hence also all.equal (SS (x), SS (Fitted.x) + SS (resid.x))) A visualization of the cluster results is shown below