The execution process of hive query on MapReduce

Source: Internet
Author: User

The hive query is first converted to a physical query plan, and the physical query plan typically contains multiple mapreduce jobs, and the output of one mapreduce job can be used as input to another mapreduce job. The MapReduce job designed by Hive for hive queries has a fixed pattern: The Mapper class is the Org.apache.hadoop.hive.ql.exec.execmapper,reducer class Org.apache.hadoop.hive.ql.exec . Execreducer. The map process may have different inputformat for the MapReduce job, which InputFormat is determined when the physical query plan is generated. A operator tree is generated in the Execmapper initialization method, and the root is a Mapoperator object. In the Execmapper map method, the value of the Key-value pair from InputFormat is distributed to the root node of the operator tree, the Mapoperator object. Mapoperator converts the received value to a record (row) and passes it down the tree down to each subordinate operator processing, and the record (Row) may be filtered during delivery, Converting or attaching a key (the record itself is serialized as value) is output to the following reduce process. The reduce process , which is output from the map process, key-value the reduce method passed to Execreducer after the MapReduce framework is grouped by key. In the initialization method of Execreducer, a operator tree is also generated, and the groups Key-value passed to the reduce method are passed to the operator tree by order. Pass each set of Key-value to the start and end of each operator in the operator tree is notified.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.