Kmenas algorithm is relatively simple, not detailed introduction, directly on the code.
Importorg.apache.log4j. {level, Logger}ImportOrg.apache.spark. {sparkconf, sparkcontext}Importorg.apache.spark.mllib.linalg.VectorsImportorg.apache.spark.mllib.clustering._/*** Created by Administrator on 2017/7/11. */Object Kmenas {def main (args:array[string]): Unit={ //setting up the operating environmentVal conf =NewSparkconf (). Setappname ("Kmeans Test"). Setmaster ("spark://master:7077"). Setjars (Seq ("E:\\intellij\\projects\\machinelearning\\machinelearning.jar"))) Val SC=Newsparkcontext (conf) Logger.getRootLogger.setLevel (Level.warn)//read sample data and parseVal data = Sc.textfile ("Hdfs://master:9000/ml/data/kmeans_data.txt") Val parseddata= Data.map (s = = Vectors.dense (S.split ("). map (_.todouble)). Cache ()//New Kmeans Clustering model and trainingVal initmode = "k-means| |"Val numclusters= 2Val numiterations= 500Val Model=NewKmeans (). Setinitializationmode (Initmode). SETK (numclusters). Setmaxiterations (numiterations). Run (parseddata) Val Centers=model.clustercenters println ("Centers:") for(I <-0 to Centers.length-1) {println (Centers (i) (0) + "\ T" + Centers (i) (1)) } //Error CalculationVal Error =model.computecost (parseddata) println ("Errors =" +Error)}}
Operation Result:
Spark Machine Learning (7): Kmenas algorithm