sqlite3-tree-type recursive query-with

Source: Internet
Author: User
Tags create index sqlite version control system

In a recursive common table expression, the recursive table is named with the CTE table name. In the recursive common table expression above, the recursive table must and only appear in the FROM clause of the recursive query, and cannot
appear in Initial-select or the recursive-select, including anywhere in subqueries.

The initial query can be a federated query, but it cannot contain order by, LIMIT, OFFSET.

A recursive query must be a simple query, not a federated query statement. Recursive queries allow the inclusion of order by, LIMIT, OFFSET.

The basic algorithm for executing recursive query table content is as follows:
1. Execute the initial query and put the query results into a queue;
2. If the queue is not empty
1. Extract a record from the queue;
2. Insert this record into the recursive table;
3. Assume that the record that has just been extracted is the only record in the recursive table, and then run a recursive query to put all the results into the queue

Above, the basic process may be modified by the following additional rules:

If, with a UNION operator, the initial query and the recursive query are linked together, the record is added to the queue only when the record is not in the queue. Duplicate records are discarded before joining the queue.
Even in recursive queries, duplicate queues have been extracted from the queue, and if the operator is union all, records generated during the initial and recursive queries are always added to the queue, even if they are duplicated.
Determines whether the record is duplicated, the null value is equal to another null value, and the comparison with other values is not equal.

The limit clause, if present, determines the number of records in the recursive table, and once the specified number of records is reached, the recursion stops. A value of 0 as the limit means that shimmering no records are added to the recursive table.
A negative value means there is no limit to the number of records added to the recursive table.

The offset clause, if present, has a positive value of n, which prevents the first n records from being added to the recursive table. The first n records are still recursively queried for processing. Only they are not added to the recursive table. Records are only started when all offset records have been skipped
The number of records until the limit value is reached.


If an ORDER BY clause appears, it determines in step 2 a The order in which records are fetched in the queue, and if there is no order BY, the order in which the records are extracted is undefined. In the current implementation, if the ORDER BY clause is omitted, the queue is a FIFO queue.

But the application should not rely on this fact, as it may change.

Recursive query Example:
The following query returns all integers from 1 to 1000000.

With recursive
CNT (x) as (values (1) union ALL select X+1 from CNT where x < 1000000 br style= ' Font-size:16px;font-style:normal;font-we Ight:normal;color:rgb (0, 0, 0); '/> select * from CNT;

Think about how this query works,

The initial query runs first and returns a single record 1 for a single field, which is added to the queue.
In step 2 A, this record is removed from the queue and added to the recursive table.
Next, perform a recursive query.
Follow step 2c to add the resulting single record 2 to the queue. Here the queue still has a record, so step 2 continues to repeat.
Record 2 is extracted according to steps 2a and 2c and added to the recursive table, then record 2 is used as if it were the entire contents of the recursive table, and the recursive query continues. Result produces a record 3 and is added to the queue
This repeats 999,999 times until the last queue contains a record of 1000000, which is extracted and added to the recursive table, but this time, the WHERE clause causes the recursive query to return results without logging, so the queue
The rest is empty and the recursion ends.

Optimization considerations:

In the above discussion, the "Insert row Insert recursive table" Report should be conceptually understood and cannot be literally.
It sounds like the accumulation of SQLite contains 1 million rows of a huge table, and then goes back and scans the table from top to bottom resulting from the results.
The actual situation is that the query optimizer considers that the "CNT" recursive table value is used only once.
The rows are then added to the recursive table, which is immediately returned as the result of the main SELECT statement and discarded.
SQLite does not accumulate 1 million rows of temporary tables. Very little memory is required to run the above example.
However, if, for example, a union is used instead of union all, SQLite will have to keep all the previously generated content to check for duplicates.
For this reason, programmers should try to use everything, not union, as feasible.

In the above discussion, a statement like inserting records into a recursive table should be conceptually understood, not literally, it sounds like SQLite is accumulating a huge table with 1 million rows,
Then go back and scan the table from top to bottom and produce the result, the actual situation is that the query optimizer believes that the values in the "CNT" recursive table are only used once, so when a record is added to the recursive table,
Records are returned directly as a result of the main query statement, and then discarded. SQLite does not accumulate a 1 million-row temporary table. Running the above example requires little memory space. No matter what, if the example
Use Union instead of union all, and SQLite has to keep all previously generated record content to check for duplicates.
For this reason, the programmer should try to use union all instead of union, where feasible.


Make some changes to the example above, as follows:

With recursive
CNT (x) as (select 1 UNION ALL select X+1 from CNT limit 1000000)
SELECT * from CNT;

There are two different places where the initial query replaces "VALUES (1)" with "Select 1". But these are just the same things that are done using different statements. Another different place is that recursion ends by a limit instead of
A WHERE clause that uses limit means that when 1 million rows of records are added to the recursive table (the main query execution returns, due to the query optimizer), the recursion ends directly regardless of how many records are in the queue.
In a more complex query, it is sometimes difficult to ensure that the WHERE clause eventually causes the queue to be empty and recursive aborts, but the limit clause can always stop recursion. If the upper bound size of the recursive record is known,
For security reasons, it is a good way to always include a limit clause.



Hierarchical Query Example:

Create a table that describes the members of an organization and the chain of relationships within the organization

CREATE TABLE org (
Name TEXT PRIMARY KEY,
Boss TEXT REFERENCES org,
Height INT,
--Other content omitted
);

Each member in the organization has a name, and all members have only one boss, which is the top of the entire organization, and all the record relationships of the table Form a tree.

Here's a query that calculates the average weight of everyone in Alice's organizational unit, including Alice

With RECURSIVE
Works_for_alice (n) as (
VALUES (' Alice ')
UNION
SELECT name from org, works_for_alice
WHERE ORG.BOSS=WORKS_FOR_ALICE.N
)
SELECT avg (height) from org
WHERE Org.name in Works_for_alice;


The following example uses two common table expressions in a with clause, and the following table represents a family tree

CREATE TABLE Family (
Name TEXT PRIMARY KEY,
Mom TEXT REFERENCES Family,
Dad TEXT REFERENCES Family,
Born DATETIME,
Died DATETIME,--NULL if still alive
-Other content
);

This family table is similar to the previous organization table, except that each member has two parent nodes. We want to know all of Alice's surviving grandparents, from old and young. An ordinary common table expression, "parent_of", is defined first
This common common table expression is a view that is used to find all fathers of everyone. Common common table expressions are used in recursive common table expression Ancestor_of_alice.
Next, the recursive common table expression is used in the following query:

With RECURSIVE
Parent_of (name, parent) as
(select name, mom from family UNION SELECT name, dad from family),
Ancestor_of_alice (name) as
(SELECT parent from parent_of WHERE name= ' Alice '
UNION All
SELECT parent from parent_of JOIN ancestor_of_alice USING (name))
SELECT family.name from Ancestor_of_alice, family
WHERE Ancestor_of_alice.name=family.name
and died is NULL
ORDER by born;

Query chart:
The version control system typically stores a variation version of each project, which, as a forward-loop graph, is checked in for each version of the project, one check-in may be 0 or there are many parent nodes.
Most check-in, except for the first time, has a parent node, but in the case of consolidation, a check-in May have two, three, or more parent nodes. Patterns that track check-in and the order in which they occur,
As shown below:

CREATE TABLE Checkin (
ID INTEGER PRIMARY KEY,
Mtime INTEGER--timestamp when this checkin occurred
);
CREATE TABLE Derivedfrom (
Xfrom INTEGER not NULL REFERENCES checkin,--parent checkin
XTO INTEGER not NULL REFERENCES checkin,--derived checkin
PRIMARY KEY (Xfrom,xto)
);
CREATE INDEX derivedfrom_back on Derivedfrom (Xto,xfrom);


This diagram is a loop-free diagram, and we assume that each child is checked in no more than the modification time of all its parent nodes, but, unlike the previous example, there may be multiple paths of different lengths between any two check-in.
We want to know the last 20 check-in on the timeline, and for check-in @BASELINE, which has thousands of ancestors throughout the DAG, this query is similar to using the Fossil version control system to display the most recent n check-in.
Example: http://www.sqlite.org/src/timeline?p=trunk&n=30


With RECURSIVE
Ancestor (Id,mtime) as (
SELECT ID, mtime from checkin WHERE [email protected]
UNION
SELECT Derivedfrom.xfrom, Checkin.mtime
From ancestor, Derivedfrom, checkin
WHERE Ancestor.id=derivedfrom.xto
and Checkin.id=derivedfrom.xfrom
ORDER by Checkin.mtime DESC
LIMIT 20
)
SELECT * from Checkin JOIN ancestor USING (ID);

Descending by time in a recursive query causes the query to execute faster, by preventing it from merging checked-in branch traversal early. Order by enables recursive queries to focus on the most recently checked-in records, which is exactly what we want.
If you do not use ORDER by in a recursive query, one might be to traverse all thousands of commit records and reorder them by timeline. Next, return to the first 20 records. Establish a priority queue based on order by
Enables recursive queries to find the most recent commit record first. The limit clause is allowed so that the query scope is limited to the check-in record of interest.

By using order BY, depth-first contrast breadth-first search traversal tree,
The recursive query of the ORDER by clause can be used to control whether the search tree is depth-first or breadth-first. To illustrate this, we'll make a modification to the Org table in the example above, without the height column,
and insert the following data.


CREATE TABLE org (
Name TEXT PRIMARY KEY,
Boss TEXT REFERENCES org
) without ROWID;
INSERT into org VALUES (' Alice ', NULL);
INSERT into org VALUES (' Bob ', ' Alice ');
INSERT into org VALUES (' Cindy ', ' Alice ');
INSERT into org VALUES (' Dave ', ' Bob ');
INSERT into org VALUES (' Emma ', ' Bob ');
INSERT into org VALUES (' Fred ', ' Cindy ');
INSERT into org VALUES (' Gail ', ' Cindy ');

Here is a query to the tree structure, using a depth-first policy

With RECURSIVE
Under_alice (Name,level) as (
VALUES (' Alice ', 0)
UNION All
SELECT Org.name, under_alice.level+1
from org JOIN under_alice on org.boss=under_alice.name
ORDER by 2
)
SELECT substr (' ... '), 1,level*3) | | Name from Under_alice;


The "Order BY2" (that is, the equivalent of "ORDER by under_alice.level+1") causes the higher-level members of the organization chart (with smaller "level" values) to be prioritized, resulting in a breadth-first search.
The output is:

Alice
... Bob
... Cindy
...... Dave
...... Emma
...... Fred
...... Gail


However, if we change the ORDER BY clause to add the "DESC" modifier, this will result in a lower level in the organization (with a larger "level" value) being processed first, resulting in a deep first search:


With RECURSIVE
Under_alice (Name,level) as (
VALUES (' Alice ', 0)
UNION All
SELECT Org.name, under_alice.level+1
from org JOIN under_alice on org.boss=under_alice.name
ORDER by 2 DESC
)
SELECT substr (' ... '), 1,level*3) | | Name from Under_alice;

The result of the modified query is:

Alice
... Bob
...... Dave
...... Emma
... Cindy
...... Fred
...... Gail

When Oreder by is omitted in a recursive query, the queue is like a FIFO, which results in a breadth-first search

sqlite3-tree-type recursive query-with

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.