PostgreSQL code Analysis, query optimization section, process_duplicate

PostgreSQL code Analysis, query optimization section, process_duplicate_ors

Last Update:2017-06-30 Source: Internet

Author: User

Tags postgresql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Here the parts of the canonical predicate expression are sorted out, and the order of reading is as follows:

One, PostgreSQL code Analysis, query optimization section, canonicalize_qual

Two, PostgreSQL code Analysis, query Optimization section, Pull_ands () and Pull_ors ()

Third, PostgreSQL code Analysis. Query optimization section, Process_duplicate_ors

/ *
 * process_duplicate_ors
 * Given a list of exprs which are ORed together, try to apply
 * the inverse OR distributive law.
 *
 * Returns the resulting expression (could be an AND clause, an OR
 * clause, or maybe even a single subexpression).
 * /

/ *
 * Suppose we have four tables, each of which is TEST_A, TEST_B, TEST_C, TEST_D, each table has a column,
 * That is, TEST_A has a column A, TEST_B has a column B, and so on.
 *
 * This function handles such situations. For a selection, SELECT * FROM TEST_A, TEST_B, TEST_C, TEST_D
 * WHERE (A = 1 AND B = 1) OR (A = 1 AND C = 1) OR (A = 1 AND D = 1);
 *
 * WHERE condition in the statement:
 * (A = 1 AND B = 1) OR (A = 1 AND C = 1) OR (A = 1 AND D = 1)
 * Can be rewritten as:
 * (A = 1) AND (B = 1 OR C = 1 OR D = 1)
 * This is the main function of this function.
 *
 * The parameter of this function is a list. For the above WHERE condition, the structure of orlist is as follows:
 * There is one element in orlist, which is BoolExpr of type OR_EXPR. The structure in BoolExpr is as follows:
typedef struct BoolExpr
{
Expr xpr; = slightly
BoolExprType boolop; = OR_EXPR
List * args; = 3 conditions in OR, namely (A = 1 AND B = 1) OR (A = 1 AND C = 1) OR (A = 1 AND D = 1)
bool plusFlag; = slightly
} BoolExpr;
 *
 * Detailed implementation of the following analysis functions, the approximate steps are:
 * 1) Analyze the common terms in each OR. 2) extract common items, 3) merge the remaining items into AND.
 * /
static Expr *
process_duplicate_ors (List * orlist)
{
List * reference = NIL;
int num_subclauses = 0;
List * winners;
List * neworlist;
ListCell * temp;

if (orlist == NIL)
return NULL; / * probably ca n‘t happen * /

/ * Assume there is only one. .
.

.

. Then forget it */
if (list_length (orlist) == 1) / * single-expression OR (can this
* happen?

) * /
return linitial (orlist);

/ *
* Choose the shortest AND clause as the reference list --- obviously, any
* subclause not in this clause is n‘t in all the clauses. If we find a
* clause that ’s not an AND, we can treat it as a one-element AND clause,
* which necessarily wins as shortest.
* /
/ *
* "Find the shortest".


* In the WHERE statement:
* (A = 1 AND B = 1) OR (A = 1 AND C = 1) OR (A = 1 AND D = 1)
* OR operation concatenates three sub-statements. Find the shortest one. Since there is a common term, the shortest one must be the same.
* Includes public items. Then by finding the shortest one. The number of comparisons can be reduced in subsequent operations.


* In the WHERE statement above. The three sub-statements are the same length. According to the following operation process, for example, (A = 1 AND B = 1) should be found.
* Is the first.


* /
foreach (temp, orlist)
{
Expr * clause = (Expr *) lfirst (temp);

if (and_clause ((Node *) clause))
{
List * subclauses = ((BoolExpr *) clause)-> args;
int nclauses = list_length (subclauses);
Ranch
/ *
* The length in the sub-statement is inferred here, for example for the (A = 1 AND B = 1) sub-statement,
* He is actually two sub-sentences connected by an AND, then his length is 2.
*
* Record the shortest sub-statements through nclauses, assuming shorter (nclauses <num_subclauses),
* Then replace it with the shortest.


* /
if (reference == NIL || nclauses <num_subclauses)
{
reference = subclauses;
num_subclauses = nclauses;
}
}
else
{
/ *
			 * another situation. It is possible that the clause is not an AND statement, so it does not seem to conform to the rules.
* Think of him as a whole, then this is the shortest element.
*
******************************
* Suppose the code runs here. Then there are only two cases:
* One is ... WHERE (A = 1 AND B = 1) OR (A = 1 AND C = 1) OR (A = 1).
* One is ... WHERE ((A = 1 OR C = 1) AND B = 1) OR (A = 1 OR C = 1).
* Assuming that in both cases, the following simplifications can be made, for example:
* The first case is simplified as A = 1
* Simplified in another case (A = 1 OR C = 1)
*
* The third case is to be added ...
* /
reference = list_make1 (clause);
break;
}
}

/ *
* Just in case, eliminate any duplicates in the reference list.
* /
/ * Find the shortest. Save to List * /
reference = list_union (NIL, reference);

/ *
* Check each element of the reference list to see if it ’s in all the OR
* clauses. Build a new list of winning clauses.
* /
/ *
* "Find Public Items".
*
* NOTE: At this time, the advantages brought by "find the shortest" can be reflected. Outer loops will be less frequent.
*
* Suppose the WHERE statement is:
* (A = 1 AND B = 1) OR (A = 1 AND C = 1) OR (A = 1 AND D = 1)
* It must be found in "find the shortest" (A = 1 AND B = 1).


* Then the outer layer will have two loops ... (foreach (temp, reference)), and the variables of the two loops are
* A = 1 and B = 1.
* The inner layer has three loops ... (foreach (temp2, orlist)). The variables of the three cycles are
* (A = 1 AND B = 1) and (A = 1 AND C = 1) and (A = 1 AND D = 1)
*
* The ratio is as follows:
* If the outer loop is now running for the first time, it will look for the common item of A = 1, and if the inner loop is also running for the first time,
* That is, find (A = 1 AND B = 1) whether there is a common item A = 1. Found to exist (list_member),
* Then infer the second clause of the inner loop in turn ...
*
* As the example above, in detail. The operations performed by these loops are:
* Outer layer for the first time:
* Infer whether A = 1 is in (A = 1 AND B = 1), in, infer the next
* Infer whether A = 1 is in (A = 1 AND C = 1), in, infer the next
* Infer whether A = 1 is present (A = 1 AND D = 1). Now, A = 1 is a public item, record (winners = lappend ...)
* Outer layer for the second time:
* Infer whether B = 1 is present (A = 1 AND B = 1). Infer next
* Infer whether B = 1 is in (A = 1 AND C = 1), not in, jump out of the loop, no need to infer the next one.
* Infer whether B = 1 is present (A = 1 AND D = 1). Not running, since the previous one does not contain a public item, extraction is not possible.
* /
winners = NIL;
foreach (temp, reference)
{
Expr * refclause = (Expr *) lfirst (temp);
bool win = true;
ListCell * temp2;

foreach (temp2, orlist)
{
Expr * clause = (Expr *) lfirst (temp2);

if (and_clause ((Node *) clause))
{
if (! list_member (((BoolExpr *) clause)-> args, refclause))
{
win = false;
break;
}
}
else
{
if (! equal (refclause, clause))
{
win = false;
break;
}
}
}

if (win)
winners = lappend (winners, refclause);
}

/ *
* If no winners, we ca n’t transform the OR
* /
if (winners == NIL)
return make_orclause (orlist);

/ *
* Generate new OR list consisting of the remaining sub-clauses.
*
* If any clause degenerates to empty, then we have a situation like (A
* AND B) OR (A), which can be reduced to just A --- that is, the
* additional conditions in other arms of the OR are irrelevant.
*
* Note that because we use list_difference, any multiple occurrences of a
* winning clause in an AND sub-clause will be removed automatically.
* /
/ *
* "Extract public items".


* Use list_difference to delete public items, implement Details are not repeated here.
* /
neworlist = NIL;
foreach (temp, orlist)
{
Expr * clause = (Expr *) lfirst (temp);

if (and_clause ((Node *) clause))
{
List * subclauses = ((BoolExpr *) clause)-> args;
Ranch
/ * Look here ... look here ..., eliminate public items * /
subclauses = list_difference (subclauses, winners);
if (subclauses! = NIL)
{
/ * After elimination, the remaining stitching is up. Spliced into: (B = 1 OR C = 1 OR D = 1) * /
if (list_length (subclauses) == 1)
neworlist = lappend (neworlist, linitial (subclauses));
else
neworlist = lappend (neworlist, make_andclause (subclauses));
}
else
{
/ *
* This is explained in the substatement. One of them is a public item, that is, for example the following form:
* ... WHERE (A = 1 AND B = 1) OR (A = 1)
*
* At this time, the common term is A = 1, the first clause is (A = 1 AND B = 1), and the second clause is (A = 1),
* The second clause goes through list_difference. The result returned is NULL.
* For such a situation, it can actually be simplified as: A = 1, because (A = 1 AND B = 1) must satisfy the case of A = 1.
* /
neworlist = NIL; / * degenerate case, see above * /
break;
}
}
else
{
if (! list_member (winners, clause))
neworlist = lappend (neworlist, clause);
else
{
neworlist = NIL; / * degenerate case, see above * /
break;
}
}
}

/ *
* Append reduced OR to the winners list, if it ’s not degenerate, handling
* the special case of one element correctly (can that really happen?).
* Also be careful to maintain AND / OR flatness in case we pulled up a
* sub-sub-OR-clause.
* /
if (neworlist! = NIL)
{
if (list_length (neworlist) == 1)
winners = lappend (winners, linitial (neworlist));
else
/ * neworlist should be (B = 1 OR C = 1 OR D = 1), so use make_orclause * /
winners = lappend (winners, make_orclause (pull_ors (neworlist)));
}

/ *
* And return the constructed AND clause, again being wary of a single
* element and AND / OR flatness.
* /
if (list_length (winners) == 1)
return (Expr *) linitial (winners);
else
/ * The returned form is: (A = 1) AND (B = 1 OR C = 1 OR D = 1), so make_andclause * /
return make_andclause (pull_ands (winners));
}

PostgreSQL code Analysis, query optimization section, process_duplicate_ors

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More