Spark version 1.3.1
Scala version 2.11.6
Reference official website http://spark.apache.org/docs/latest/mllib-clustering.html
After running Spark-shell, first import the required modules
Import Org.apache.spark.mllib.clustering.KMeans
Import Org.apache.spark.mllib.linalg.Vectors
// Load and parse the data
val data = sc. Textfile ( "/home/hadoop/hadoopdata/data3.txt" ) //One sample per row, The characteristics of the sample are separated by spaces
< span class= "n" > Val parseddata = data. Map (s => vectors. Dense (s. Split ( " Map (_. Todoublecache ()
< Span class= "o" > < Span class= "NC" > < Span class= "o" >val clusters = kmeans.train ( Parseddata, Numclusters, Numiterations, parallrunnums)//The latter three to be replaced by the number you set
Cluster center Val clustercenters=clusters.clustercenters//cluster result label Val labels=clusters.predict (parseddata)// Save Results Labels.saveastextfile ("/HOME/HADOOP/HADOOPDATA/RESULT3")//Results saved in RESULT3 this folder
Realization of clustering algorithm by Spark-shell