Read with me. PostgreSQL Source (13)--executor (--join node of query execution Module (top))

Source: Internet
Author: User
Tags joins postgresql


Join node


The Join node has the following three types:


T_NestLoopState,
    T_MergeJoinState,
    T_HashJoinState,


The connection type node corresponds to the join operation in the relational algebra, and the following connection types are defined in PostgreSQL (take T1 JOIN T2 as an example):


    • 1) Inner join: An inner connection that connects all tuples of the T1 to all tuples in T2 that satisfy the join condition.

    • 2) left Outer join: On the inner connection basis, for those T1 tuples that can not find a connection to the T2 tuple, connect to it with an empty value tuple.

    • 3) Right Outer join: On the inner connection basis, for those T2 tuples that can not find a connection to the T1 tuple, connect to it with an empty value tuple.

    • 4) Full Outer join: All-out connection, on the basis of an internal connection, for those T1 tuples that could not find a connection to the T2 tuple, and those T2 tuples that could not find a connection T1 tuple, are connected with an empty value tuple.

    • 5) Semi Join: Similar in operation, when a tuple of T1 is able to find a tuple in T2 that satisfies the join condition, it returns the T1 tuple, but does not connect to the matching T2 tuple.

    • 6) Anti join: Type not in operation, returns the connection of the T1 tuple to the empty tuple when a tuple in T2 that satisfies the join condition is not found in the T1 group.


Let's take a look at what the Postgres user manual says:



Conditional connections:


T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 USING ( join column list )
T1 NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2


So it seems that only the first four connections (INNER, left JOIN, right join, full join) are shown in the SQL statement, and the latter two are actually used internally as Postgres, such as semi JOIN, As I said before, for Subqueryscan nodes, it is possible to convert any and exist clauses to semi-joins. A half connection is a semi join.



For the connection you specify, the PostgreSQL internal meeting machine acts with a different connection operation .



Here, Postgres implements three connection operations: nested Loops (Nest loops), merge joins (merge joins), and hash joins (hash joins).



The merge connection algorithm can achieve the above six kinds of connections, and nested loops and hash connections can only achieve Inner join, left Outer join, Semi join and Antijoin four connections.



As shown in 6-52, the connection node has a common parent join, the join inherits all the properties of plan, and the extension defines the type of JoinType to store the connection, joinqual the condition for storing the connection.


typedef struct Join
{
    Plan        plan;
    JoinType    jointype;
    List       *joinqual;       /* JOIN quals (in addition to plan.qual) */
} Join;


The corresponding execution state node joinstate defines the JoinType storage connection type, joinqual stores the status chain list after the connection condition is initialized.


typedef struct JoinState
{
    PlanState   ps;
    JoinType    jointype;
    List       *joinqual;       /* JOIN quals (in addition to ps.qual) */
} JoinState;
1.NestLoop node


The Nestloop node implements a nested loop join method that enables the connection of the inner join, the left Outer join, the Semi join, and the anti join in four ways.
Examples are as follows:


postgres = # explain select a. *, b. * from test_dm a join test_dm2 b on a.id> b.id;
                                    QUERY PLAN
-------------------------------------------------- ------------------------------
  Nested Loop (cost = 0.00..150000503303.17 rows = 3333339000000 width = 137)
    Join Filter: (a.id> b.id)
    -> Seq Scan on test_dm2 b (cost = 0.00..223457.17 rows = 10000017 width = 69)
    -> Materialize (cost = 0.00..27346.00 rows = 1000000 width = 68)
          -> Seq Scan on test_dm a (cost = 0.00..22346.00 rows = 1000000 width = 68)
(5 lines)
typedef struct NestLoop
{
     Join join;
     List * nestParams; / * list of NestLoopParam nodes * /
} NestLoop;


The Nestloop node expands the Nestparams field based on the join node, which is a list of some executor parameters that are useful for passing the current row execution value of the external sub-plan to the internal sub-plan. At present the main transfer form is var type, the definition of this data structure in:


src/include/nodes/primnodes.hVar - expression node representing a variable (ie, a table column)


The following is the definition of the state node nestloopstate.


typedef struct NestLoopState
{
    JoinState   js;             /* its first field is NodeTag */
    bool        nl_NeedNewOuter;            //true if need new outer tuple on next call
    bool        nl_MatchedOuter;            //true if found a join match for current outer tuple
    TupleTableSlot *nl_NullInnerTupleSlot;  //prepared null tuple for left outer joins
} NestLoopState;


Initializes the Nestloopstate node in the Nestloop node's initialization process (Execendnestloop function), constructs the expression context these are self-contained, and the join conditions (joinqual fields) in the node are also processed. Translates to the joinqual linked list in the corresponding state node joinstate. And a nl_nullinnertupleslot is initialized for the left join and anti join. Why



Because for T1 JOIN T2, when a tuple of T1 is not found in T2 that satisfies the join condition, both connections return the T1 tuple's connection to the empty tuple, which is implemented by Nl_nullinnertupleslot.



Finally, the following two actions are performed:


    • 1) Mark Nl_neednewouter as true to indicate the need to get the left child node tuple.

    • 2) Mark Nl_matchedouter as false to indicate that the right child node tuple that matches the current left child node tuple is not found.


This is what initialization is all about.



Next is the execution of the Nestloop (Execnestloop function).



The basic idea of a loop nesting connection is as follows (in the Case of table R (left relationship) and table S (right Relationship) connection):


FOR each tuple s in S DO
        FOR each tuple r in R DO

            IF r and s join to make a tuple t THEN
            output t;


To implement this method iteratively, fields Nl_neednewouter and nl_matchedouter are defined in Nestloopstate. When the tuple is in the inner loop, Nl_neednewouter is false, and Nl_neednewouter is set to true at the end of the inner loop. To be able to handle the left Outer join and the anti join, you need to know whether the inner loop finds an inner tuple that satisfies the join condition, which is recorded by Nl_matchedouter and is marked true when the inner loop finds a tuple that meets the criteria.



The Nestloop execution process is done primarily by the Execnestloop function. The function is primarily a large loop as mentioned above.



The loop performs the following actions:


    • <1> If Nl_neednewouter is true, the tuple is fetched from the left child node, and if the obtained tuple is null, the empty tuple is returned and the execution process is ended. If Nlneednewouter is false, proceed to step 2.

    • <2> gets the tuple from the right child node, if NULL indicates that the inner scan is complete, set Nl_neednewouter to True, skip step 3 to continue the loop.

    • <3> determines if the right child node tuple meets the join condition with the current left child node tuple, and returns the result of the connection if compliant.


The above process can complete the recursive execution of inner join. However, in order to support several other connections, the following two special processes are required:


    • 1) Mark Nl_matchedouter as true when a tuple matching the join condition is found. When the inner layer is scanned, the nl_matchedouter can be judged to see if a tuple satisfying the join condition has been found, and a connection to the empty tuple (Nl_nullinnertupleslot) is required when processing the left Outer join and anti join. Then set the Nlmatchedouter to False.

    • 2) When a tuple that satisfies the matching criteria is found, the semi join and anti join methods need to be set Nl_neednewouter to true. The difference is that the anti join needs not satisfy the join condition to return, so skip back to the connection result to continue the execution loop.


The cleanup process of the Nestloop node (execendnestloop function) has no special handling, just recursive invocation of the cleanup process of the left and right child nodes.


2.MergeJoin node


Mergejoin realizes the merging connection algorithm to the sort relation, and the sender of the merging connection is already arranged. The pseudo-code implemented by the Mergejoin algorithm in PostgreSQL is as follows:


Join {
        get initial outer and inner tuples              INITIALIZE
        do forever {
            while (outer != inner) {                    SKIP_TEST
                if (outer < inner)
                    advance outer                       SKIPOUTER_ADVANCE
                else
                    advance inner                       SKIPINNER_ADVANCE
            }
            mark inner position                         SKIP_TEST
            do forever {
                while (outer == inner) {
                    join tuples                         JOINTUPLES
                    advance inner position              NEXTINNER
                }
                advance outer position                  NEXTOUTER
                if (outer == mark)                      TESTOUTER
                    restore inner position to mark      TESTOUTER
                else
                    break   // return to top of outer loop
            }
        }
    }


The algorithm initializes the left and right child nodes first, and then performs the following operations (where the comparison of the size refers to the comparison of the value of the Connection property):


    • 1) scan to the first matching location, if the left Dial Hand node (outer) is large, get the tuple from the Right child node (inner), and if the right child node is large, get the tuple from the left child node.

    • 2) mark the current position of the right child node.

    • 3) loop execution left dial hand node = = Right child node, if compliant, connect the tuple, and get the next right child node tuple, otherwise exit the loop to perform step 4.

    • 4) Gets the next left child node tuple.

    • 5) If the right child node at the left dial hand node = = Mark (indicating that the left child node is equal to the previous one), the right child node scan position needs to be rolled back to the scan position and 冋 step 3; otherwise, skip to step 1.


In order to illustrate the join algorithm of merge sort, we give the partial execution process with inner join as an example, two current points to the sender's present tuple, Mark is used to mark the location of the scan.



1) First find the left and right sequence first matching position, in the current (outer) =0 less than current (inner), so outer's current backward movement.






2), when a match is found, connect, use Mark to mark the current inner scan position, and move the inner to the back.






3) then judge that current (outer) = 1 is less than current (inner) = 2, move the current of outer backwards and determine if outer is the same as Mark (this is to find the current of outer in the same situation as the previous one).






4) Show current (outer) = 2 Not equal to mark (inner) = 1, continue the scanning process.






5) Determine whether two current is the same, found that Currem (outer) =2 is equal to present (inner) = 2, then make the connection, the same mark inner the existing position, and inner the cuirent to move backwards, as shown in. The current (inner) = 2 still satisfies the join condition, so inner's current continues to move backwards after the connection is complete.






6) as shown, current (outer) =2 is less than current (inner) = 5, the current pointer of outer is moved backwards.






7) At this point, the current (outer) and Mark (inner) are equal, then inner's current point to Mark's location, re-obtain the inner tuple to match, as shown in.






8) Repeat the matching pattern until one of the inner or outer is scanned, indicating that the connection is complete.



The Mergejoin node is defined as follows:


typedef struct MergeJoin
{
    Join        join;
    List       *mergeclauses;   /* mergeclauses as expression trees */
    /* these are arrays, but have the same length as the mergeclauses list: */
    Oid        *mergeFamilies;  /* per-clause OIDs of btree opfamilies */
    Oid        *mergeCollations;    /* per-clause OIDs of collations */
    int        *mergeStrategies;    /* per-clause ordering (ASC or DESC) */
    bool       *mergeNullsFirst;    /* per-clause nulls ordering */
} MergeJoin;


This node extends the definition of several MERGEXXX fields based on the join. Where mergeclauses storage is used to calculate whether the left and right sub-node tuples match the expression chain list, mergefamilies, Mergecollations, mergestrategies, Mergenullsfirst are corresponding to the expression list, Indicates the operator class for each of these operators, the executing policy (ASC or DEC), and the null-value ordering policy.



During initialization, the mergejoinstate structure is constructed using Mergejoin:


typedef struct MergeJoinState
{
    JoinState   js;             /* its first field is NodeTag */
    int         mj_NumClauses;
    MergeJoinClause mj_Clauses; /* array of length mj_NumClauses */
    int         mj_JoinState;
    bool        mj_ExtraMarks;
    bool        mj_ConstFalseJoin;
    bool        mj_FillOuter;
    bool        mj_FillInner;
    bool        mj_MatchedOuter;
    bool        mj_MatchedInner;
    TupleTableSlot *mj_OuterTupleSlot;
    TupleTableSlot *mj_InnerTupleSlot;
    TupleTableSlot *mj_MarkedTupleSlot;
    TupleTableSlot *mj_NullOuterTupleSlot;
    TupleTableSlot *mj_NullInnerTupleSlot;
    ExprContext *mj_OuterEContext;
    ExprContext *mj_InnerEContext;
} MergeJoinState;


Set the values for the following variables by judging the connection type:



1) Mj_fillouter: True indicates that there is no need to ignore the left sub-node tuple that does not have a match, and it needs to be connected to an empty tuple, which is true for the leave join, ANTI join, and full join.



2) Mj_fillinner: True indicates that the right child node tuple that does not have a match cannot be ignored, and it needs to be connected to an empty tuple, which is true when the correct join, full join.



3) Mj_innertupleslot: The empty tuple generated for the right child node tuple, constructed when the Mj_fillouter is true.



4) Mj_outertupleslot: The empty tuple generated for the left Dial hand node tuple, constructed when the Mj_fillinner is true.



In addition to this, you need to set the variable Mj_matchedouter (mj_matchedinner) that marks the current left (right) child node tuple to a tuple that can be connected to false to store the fields of the left (right) child node tuple mj_ Nulloutertupleslot (Mj_innertupleslot) is set to NULL, and the Mj_markedtupleslot is allocated storage space.



There is still a hashjoin, I have seen a half-day to see not quite understand, the next chapter of it ~



Read with me. PostgreSQL Source (13)--executor (--join node of query execution Module (top))


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.