Research on parallel shared decision Tree Mining algorithm based on Hadoop

Source: Internet
Author: User
Keywords Hadoop

Research on parallel shared decision Tree Mining algorithm based on Hadoop

Chen Taotao Chao Hansi

Shared knowledge mining is to speed up the cognition of unknown things by learning the shared knowledge between different things and applying the learned knowledge to the unknown. In view of the inefficient problem of serial sharing knowledge mining algorithms in large data sets, a parallel shared decision tree Mining algorithm based on Hadoop is proposed ( PSDT). This algorithm uses the traditional attribute table structure to realize parallel mining, but it has too much I/O operation, which affects the performance of the algorithm, therefore, a hybrid parallel shared decision Tree Mining Algorithm (HPSDT) is proposed. This algorithm uses the mixed data structure, the attribute yuan structure is used to compute the splitting index The data recording structure is used in the split phase. The data analysis shows that the HPSDT algorithm simplifies the splitting process, and its I/O operation is about 0.34 of SDT. The experimental results show that both PSDT and HPSDT have good parallelism and expansibility, and the HPSDT is better than the PSDT performance, and as the dataset grows , the superiority of HPSDT is more obvious.


Research on parallel shared decision Tree Mining algorithm based on Hadoop

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.