A brief analysis on the principle of hive Architecture-mapreduce part

Source: Internet
Author: User

Transferred from: http://blog.csdn.net/yangbutao/article/details/8331937

The entire processing process consists mainly of parsing (abstract syntax tree, AST, using ANTLR), semantic analysis (sematic Analyzer generation query block), logical plan generation (OP tree), logical plan optimization, physical plan generation (Task tree), And the composition of the physical plan execution.

The following figure (who does not know who drew it) gives a brief description of the process

Here the emphasis is on the physical plan generation, as well as the execution.

The generation of the physical plan is generated from the logical operations Tree (operator), the physical plan is executed by the Task object, each task has a Woker object, and the work represents the description of the physical plan.

Mainly has Fetchwork,movework,mapredwork,copywork,ddlwork,functionwork,explainwork,conditionalwork

The execution of a physical plan invokes the Execute method for each physical plan.

Mainly has Fetchtask,conditionaltask,copytask,ddltask,explaintask,mapredtask,movetask

Where Mapredtask implements the Mapreuce client, it generates a plan XML file based on the woker description mapredwork, which is a command parameter related to the Hadoop jar [params], passed to

MapReduce to execute (execmapper,execreducer).

The following diagram illustrates the process of data processing in the MapReduce process:

FileFormat, you need to specify the storage format of the data (store as) when you define the table, such as Textflle,sequencefile,rcfile, and of course you can customize the format of the data store (store as ROW format),

The storage format of the data is mainly fileformat how the record (writable) is stored in the file, the file is read when the map is provided, and the write of the file is provided when reduce.

SerDe, the conversion of the format of the data, writable to the object used by the operator.

A brief analysis on the principle of hive Architecture-mapreduce part

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.