When the Hadoop system handles multiple-link problems, a large number of intermediate results are written to the local disk per round, which severely reduces the processing efficiency of the system. To solve this problem, a "substitution-query" method is proposed, which reduces the I/O cost of intermediate result by indexing the linked table, substituting the output of the meta set with index information to the intermediate result, and participating in the multi-table link in the form of index. The index information is optimized and managed by buffer pool, two-order and multithread technology, which speeds up indexing query. Finally, in the TPC data set, the comparison experiment with the original Hadoop was designed, and the result showed that the method could reduce the storage space by 35.5% and improve the operation efficiency of 12.9%.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.