A multi-link strategy based on Hadoop
Xu Chen Wang Li Shi bosom
When the Hadoop system handles multiple-link problems, a large number of intermediate results are written to the local disk per round, which severely reduces the processing efficiency of the system. To solve this problem, a "substitution-query" method is proposed, which reduces the I/O cost of intermediate result by indexing the linked table, substituting the output of the meta set with index information to the intermediate result, and participating in the multi-table link in the form of index. The index information is optimized and managed by buffer pool, two-order and multithread technology, which speeds up indexing query. Finally, in the TPC data set, the comparison experiment with the original Hadoop was designed, and the result showed that the method could reduce the storage space by 35.5% and improve the operation efficiency of 12.9%.
A multi-link strategy based on Hadoop