Research on parallel shared decision Tree Mining algorithm based on Hadoop
Chen Taotao Chao Hansi
Shared knowledge mining is to speed up the cognition of unknown things by learning the shared knowledge between different things and applying the learned knowledge to the unknown. In view of the inefficient problem of serial sharing knowledge mining algorithms in large data sets, a parallel shared decision tree Mining algorithm based on Hadoop is proposed ( PSDT). This algorithm uses the traditional attribute table structure to realize parallel mining, but it has too much I/O operation, which affects the performance of the algorithm, therefore, a hybrid parallel shared decision Tree Mining Algorithm (HPSDT) is proposed. This algorithm uses the mixed data structure, the attribute yuan structure is used to compute the splitting index The data recording structure is used in the split phase. The data analysis shows that the HPSDT algorithm simplifies the splitting process, and its I/O operation is about 0.34 of SDT. The experimental results show that both PSDT and HPSDT have good parallelism and expansibility, and the HPSDT is better than the PSDT performance, and as the dataset grows , the superiority of HPSDT is more obvious.
Research on parallel shared decision Tree Mining algorithm based on Hadoop