Implementation and optimization of K-medoids clustering algorithm based on Hadoop
East China Normal University Shang
Based on the advantages of the K-medoids algorithm and the advantage of Hadoop platform, this paper presents a parallel clustering algorithm, which is MapReduce, mahout the parallel K clustering algorithm implemented in the open source project Hk-medoids, The computational speed of traditional clustering algorithm is greatly improved. In addition, in order to improve the clustering efficiency, the paper further optimizes the hk-medoids from the aspects of perfecting MapReduce dispatching, sampling method, preset clustering initial value center point and optimizing data source. In order to verify the effectiveness of Hk-medoids algorithm and its optimization, we have done a lot of experiments, compared and analyzed the algorithm's optimization rate and acceleration ratio, so as to verify the effectiveness of the hk-medoids algorithm.
Implementation and optimization of K-medoids clustering algorithm based on Hadoop