Implementation and optimization of K-medoids clustering algorithm based on Hadoop
East China Normal University Shang
Based on the advantages of the K-medoids algorithm and the advantage of Hadoop platform, this paper presents a parallel clustering algorithm, which is MapReduce, mahout the parallel K clustering algorithm implemented in the open source project Hk-medoids, The computational speed of traditional clustering algorithm is greatly improved. In addition, in order to improve the clustering efficiency, the paper further optimizes the hk-medoids from the aspects of perfecting MapReduce dispatching, sampling method, preset clustering initial value center point and optimizing data source. In order to verify the effectiveness of Hk-medoids algorithm and its optimization, we have done a lot of experiments, compared and analyzed the algorithm's optimization rate and acceleration ratio, so as to verify the effectiveness of the hk-medoids algorithm.
Implementation and optimization of K-medoids clustering algorithm based on Hadoop
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.