Pig system Analysis (4) Logical plan Optimizer

Source: Internet
Author: User
Tags foreach join split

Optimization process

Pig philosophy of the second--pigs Are domestic Animals. The user has sufficient control. Specific to the optimization of the logical execution plan, users can choose the appropriate optimization rules according to their own situation (also can be understood as the optimization of this piece has great potential to dig).

The logical execution plan is logicalplanoptimizer processed and matched with a series of optimization rules before being compiled into a physical execution plan, and the matching optimization rules transform the original execution plan, resulting in the optimized new execution plan. The whole process is as shown in the figure:

Pig's logical optimizer achieves optimization by simplifying, merging, inserting, and adjusting the order of logicalrelationaloperator in the logical execution plan. Each optimization rule is described below.

Rule-based Optimizer

Partitionfilteroptimizer

Push partition filter condition to loader (require loader support, such as Hcatloader support partition field push, please refer to the Loadmetadata interface described earlier)

Filterlogicexpressionsimplifier

Simplify the logical conditional expression in the filter statement, where the rules are more and delegated to logicalexpressionproxy for processing: constant computation, conversion of and/or operations according to Morgan Law, and use of DNF standardized logic formulas.

Splitfilter

Splits the conditions in the filter statement so that they are pushed down separately. Like what:

A = LOAD ' input1 ' As (a0, A1);     
B = LOAD ' Input2 ' as (B0, B1);     
C = JOIN A by A0 and B by B0;     
D = FILTER C by a1>0 and b1>0;

The filter conditions for a and B in D can be separated so that the two filter conditions can be pushed down separately.

X = FILTER C by a1>0;     
D = FILTER X by b1>0;

Pushupfilter

Push the filter condition down (push along the data stream dag graph), reduce the data transfer amount

Filteraboveforeach

To remove a filter condition that repeats with a previous operation from a foreach statement

Implicitsplitinserter

Insert a split statement (for more details, see the split section in "Other optimizations" below)

Mergefilter

After Pushupfilter, merge filter conditions to reduce filter statements

Pushdownforeachflatten

Putting the flatten in a foreach backward (pushed down the data flow dag graph) reduces the amount of data for subsequent join operations. Because if flatten to the bag operation, a record generates multiple records, reducing the performance of subsequent join operations, and after optimization, the flatten action is placed after the join operation.

Limitoptimizer

The limit statement pushes down and reduces the amount of data transfer as early as possible.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.