Hadoop ++: Improves the local performance of hadoop

Source: Internet
Author: User

Hadoop ++ is a non-invasive Optimization of hadoop map reduce. It improves query and connection performance by customizing functions such as split in hadoop framework. The project is hosted by Professor Jens dittrich at the University of Saarland, Germany. The project homepage is http://infosys.uni-saarland.de/hadoop?#.php.

Hadoop ++ optimizes hadoop in three aspects: Trojan index, Trojan join, and Trojan layout.

1. Trojan Index

The core of Trojan index is to organize data into split composed of data, indexes, headers, and footer in sequence. footer is the split separator, the last footer must be at the end of the file. Mapreduce sorts indexes during indexing. During query, the split function parses each Split Based on the footer information from the end of the file. The itemize function quickly locates the content that meets the condition based on the search range condition.

Compared with the database technology, Trojan index is similar to the index organization table.

2. Trojan join

Trojan join divides related records from multiple tables into one split Based on the join property and organizes them into a structure similar to Trojan index. The records generated by itemize also contain the attributes of both parties involved in the join, in this way, you no longer need to use map, shuffle, or reduce to calculate the join Based on the join attribute during query.

Compared with the database technology, Trojan join is similar to multi-Table clustering.

3. Trojan Layout

Similar to Pax, the data organization method inside the block combines attributes frequently accessed in queries. Layout is used for different replicas. Calculate the optimal layout based on the load, similar to a backpackAlgorithm.

Similar to the database technology, Trojan layout is similar to a vertical partition. The highlight is that different copies use different vertical partitions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.