Research on random forest algorithm based on Hadoop platform and implementation of image classification system

Source: Internet
Author: User
Keywords Hadoop random forest image classification
Tags based computing computing model computing platform design distributed distributed computing file

Research on random forest algorithm based on Hadoop platform and implementation of image classification system

Liang of Xiamen University

Based on the open source distributed computing platform of Hadoop, this paper studies the parallelization of image processing process and the distributed parallel design of random forest algorithm using MapReduce parallel computing model to improve the efficiency of large-scale image classification. Firstly, this paper introduces the Hadoop platform, including HDFs file system and MapReduce framework, then analyzes the key techniques of image classification, sums up some features commonly used in image classification, focuses on sift features and bagofvisual The realization process of words feature representation is described, and several kinds of important pattern classification methods are expounded. Then, the paper focuses on the stochastic forest algorithm, and on this basis, combined with the MapReduce calculation framework, the process of stochastic forest model construction is improved to improve the running efficiency of the stochastic forest algorithm. Then, Based on the above research results, this paper constructs the image classification prototype system based on Hadoop platform, including the implementation of image interface processing under the Hadoop platform, the feature sampling process based on densesift algorithm, the BOVW model representation image, The spatial pyramid model and the improved stochastic forest algorithm are used to train the classifier in five parts. Finally, the validity of the parallel processing of the image based on the Hadoop platform is validated, and the parallel random forest algorithm is proved to be more time-consuming than the single version of the Random forest, and the time of processing is shortened, and a good classification effect is obtained. The process of image parallel processing based on Hadoop platform greatly improves the speed of massive image processing, and also aims at the image classification, the improved stochastic forest algorithm has a significant improvement in efficiency.


Research on random forest algorithm based on Hadoop platform and implementation of image classification system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.