See The programmer's self-accomplishment –selfup.cn there are k-means clustering algorithms for Spark mllib.
But it was the Java language, so I wrote one in Scala as usual and shared it here.
As a result of learning spark mllib But such detailed information is really difficult to find here to share.
Test data 0.0 0.0 0.0 0.1 0.1 0.10.2 0.2 0.2 9.0 9.0 9.0 9.1 9.1 9.19.2 9.2 9.215.1 15.1 15.118.0 17.0 19.020.0 21.0 22.0
Package com.spark.firstApp
Import Org.apache.spark.SparkContext
Import org.apache.spark.SparkConf
Import Org.apache.spark.mllib.clustering.KMeans
Import Org.apache.spark.mllib.linalg.Vectors
Object Hellospark {
def main (args:array[string]): Unit = {
Val conf = new sparkconf (). Setappname ("SIMPLESVM application")
Val sc = new Sparkcontext (conf)
Val data = Sc.textfile ("Hdfs://192.168.0.10:9000/user/root/home/data1.txt")
Val parseddata = Data.map (s = = Vectors.dense (S.split ("). Map (_.todouble))). Cache ()
Cluster the data into the classes using Kmeans
Val numclusters = 2
Val numiterations = 20
val clusters = Kmeans.train (Parseddata, Numclusters, numiterations)
Evaluate clustering by computing within Set Sum of squared Errors
Val Wssse = Clusters.computecost (parseddata)
println ("Within Set Sum of squared Errors =" + Wssse)
println ("Prediction of" (1.1, 2.1, 3.1): "+ clusters.predict (Vectors.dense (1.1, 2.1, 3.1)))
}
}
Sun Qiqung accompany you to learn--spark mllib K-means Clustering algorithm