Research on random forest algorithm based on Hadoop platform and implementation of image classification system
Liang of Xiamen University
Based on the open source distributed computing platform of Hadoop, this paper studies the parallelization of image processing process and the distributed parallel design of random forest algorithm using MapReduce parallel computing model to improve the efficiency of large-scale image classification. Firstly, this paper introduces the Hadoop platform, including HDFs file system and MapReduce framework, then analyzes the key techniques of image classification, sums up some features commonly used in image classification, focuses on sift features and bagofvisual The realization process of words feature representation is described, and several kinds of important pattern classification methods are expounded. Then, the paper focuses on the stochastic forest algorithm, and on this basis, combined with the MapReduce calculation framework, the process of stochastic forest model construction is improved to improve the running efficiency of the stochastic forest algorithm. Then, Based on the above research results, this paper constructs the image classification prototype system based on Hadoop platform, including the implementation of image interface processing under the Hadoop platform, the feature sampling process based on densesift algorithm, the BOVW model representation image, The spatial pyramid model and the improved stochastic forest algorithm are used to train the classifier in five parts. Finally, the validity of the parallel processing of the image based on the Hadoop platform is validated, and the parallel random forest algorithm is proved to be more time-consuming than the single version of the Random forest, and the time of processing is shortened, and a good classification effect is obtained. The process of image parallel processing based on Hadoop platform greatly improves the speed of massive image processing, and also aims at the image classification, the improved stochastic forest algorithm has a significant improvement in efficiency.
Research on random forest algorithm based on Hadoop platform and implementation of image classification system
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.