Detailed description of start with... connect by statement
Simple tree query for Oracle (recursive query)
Deptid |
Paredeptid |
Name |
Number |
Number |
Char (40 bytes) |
Department ID |
Parent department ID (Department ID) |
Department name |
Tracing data to the root node through a subnode.
SQL code
- Select * from persons. Dept start with deptid = 76 connect by prior paredeptid = deptid
Select * from persons. Dept start with deptid = 76 connect by prior paredeptid = deptid
Traverse sub-nodes through the root node.
SQL code
- Select * from persons. Dept start with paredeptid = 0 connect by prior deptid = paredeptid
Select * from persons. Dept start with paredeptid = 0 connect by prior deptid = paredeptid
You can use the level keyword to query the level.
SQL code
- Select a. *, level from persons. dept a start with paredeptid = 0 connect by prior deptid = paredeptid
Select a. *, level from persons. dept a start with paredeptid = 0 connect by prior deptid = paredeptid
Review the usage of start with... connect by. What follows start with is the seeds of recursion.
The seeds of recursion are the places where recursion starts. Connect by is followed by "prior". If the default value is: only the starting row that meets the condition can be queried, and no recursive query is performed;
The fields placed after connect by prior indicate the direction of the query.
Exercise: Obtain the top node through the subnode
SQL code
- Select first_value (deptid) over (order by level DESC rows unbounded preceding) as firstdeptid from persons. Dept start with deptid = 76 connect by prior paredeptid = deptid
Select first_value (deptid) over (order by level DESC rows unbounded preceding) as firstdeptid from persons. Dept start with deptid = 76 connect by prior paredeptid = deptid
/*************************************** ****************************************
You can use the start with... connect by... clause to implement SQL hierarchical query.
Since Oracle 9i, you can use the sys_connect_by_path function to display the content of the parent node to the current row in the form of "path" or a list of hierarchical elements.
Since Oracle 10g, there are other new features about hierarchical queries. For example, sometimes users are more concerned with the content with the lowest level in each layer branch.
Then you can use the pseudo-column function connect_by_isleaf to determine whether the current row is a leaf. If it is a leaf, "1" is displayed in the pseudo column ",
If it is not a leaf but a branch (for example, the current content is the father of another row), "0" is displayed ".
In Oracle versions earlier than 10 Gb, if a ring loop occurs in your tree (for example, a child node references a Father's Day node ),
Oracle returns an error message: "ORA-01436: connect by loop in user data ". If the reference to the father is not deleted, the query operation cannot be performed.
In Oracle 10 Gb, you only need to specify "nocycle" to perform any query operation. There is also a pseudo column related to this keyword -- connect_by_iscycle,
If the content of a Father's Day point is referenced in the current row and a loop occurs in the tree, "1" is displayed in the pseudo column of the row; otherwise, "0" is displayed ".
The start with .. connect by clause can be used to select data that has a hierarchical relationship
(Usually some sort of parent-> child, boss-> employee or thing-> parts ).
It is also being used when an SQL Execution Plan is explained.
Syntax:
Select... [start with initial-condition] connect by [nocycle] recurse-Condition
Level
With level it is possible to show the level in the hierarchical relation of all the data.
-- Oracle 9i
Sys_connect_by_path
With sys_connect_by_path it is possible to show the entire path from the top level down to the 'actual' child.
-- Oracle 10g
Connect_by_root
Connect_by_root is a new operator that comes with Oracle 10g and enhances the ability to perform hierarchical queries.
Connect_by_is_leaf
Connect_by_isleaf is a new operator that comes with Oracle 10g and enhances the ability to perform hierarchical queries.
Connect_by_iscycle
Connect_by_is_cycle is a new operator that comes with Oracle 10g and enhances the ability to perform hierarchical queries.
-- Start with... connect...
How must a start with... connect by select statement be read and interpreted?
If Oracle encounters such an SQL statement, it proceeds as described in the following pseudo code.
For REC in (select * From some_table) loop
If fullfills_start_with_condition (REC) then
Recurse (REC, Rec. Child );
End if;
End loop;
Procedure recurse (REC in matches_select_stmt, new_parent in field_type) is
Begin
Append_result_list (REC );
For rec_recurse in (select * From some_table) loop
If fullfills_connect_by_condition (rec_recurse.child, new_parent) then
Recurse (rec_recurse, rec_recurse.child );
End if;
End loop;
End procedure recurse;
Created by zhouwf0726 2006.
**************************************** ***************************************/
-- Create a test table and add Test Data
Create Table Test (superid varchar2 (20), Id varchar2 (20 ));
Insert into test values ('0', '1 ');
Insert into test values ('0', '2 ');
Insert into test values ('1', '11 ');
Insert into test values ('1', '12 ');
Insert into test values ('2', '21 ');
Insert into test values ('2', '22 ');
Insert into test values ('11', '20140901 ');
Insert into test values ('11', '20140901 ');
Insert into test values ('12', '123 ');
Insert into test values ('12', '123 ');
Insert into test values ('21', '123 ');
Insert into test values ('21', '123 ');
Insert into test values ('22', '123 ');
Insert into test values ('22', '123 ');
Commit;
-- Hierarchical query example
Select level | 'lay', lpad ('', level * 5) | ID
From Test
Start with superid = '0' connect by prior id = superid;
Select level | 'level', connect_by_isleaf, lpad ('', level * 5) | ID
From Test
Start with superid = '0' connect by prior id = superid;
-- Two previous examples in "database string grouping plus four" are provided to understand start with... connect...
-- Function: grouping by superid to connect IDs ";"
-- Implementation: In the following two examples, the connect by connection is achieved by constructing two pseudo columns.
/* ------ Method One ------*/
Select superid, ltrim (max (sys_connect_by_path (ID, ';'), ';') from (
Select superid, ID, row_number () over (partition by superid order by superid) id1,
Row_number () over (order by superid) + dense_rank () over (order by superid) Id2
From Test
)
Start with id1 = 1 connect by prior Id2 = ID2-1
Group by superid order by superid;
/* ------ Method two ------*/
Select distinct superid, ltrim (first_value (ID) over (partition by superid order by l desc ),';')
From (
Select superid, Level L, sys_connect_by_path (ID, ';') ID
From (
Select superid, ID, superid | rownum parent_rn, superid | to_char (rownum-1) Rn
From Test
)
Connect by prior parent_rn = rn
);
-- The following example adds the numbers on each digit of an integer. Through this example, we can understand connect by again.
Create or replace function f_digit_add (innum integer) return number
Is
Outnum integer;
Begin
If innum <0 then
Return 0;
End if;
Select sum (Nm) into outnum from (
Select substr (innum, rownum, 1) nm from dual connect by rownum <length (innum)
);
Return outnum;
End f_digit_add;
/
Select f_digit_add (123456) from dual;
/*************************************** **************************************** ***
**************************************** **************************************** ***
The following is an error that the author mentioned in this example about how SQL solves the directed graph problem.
Select * from fares connect by prior arrive = depart start with depart = 'lhr ';
Error:
ORA-01436: connect by loop in user data
You can use the nocycle parameter of connect by in Oracle10g or later versions. If you are interested, you can use an SQL statement to implement the directed graph problem!
**************************************** **************************************** ***
**************************************** **************************************** **/
A common high-level computer science problem can be described under the scope of "Directed Graph. A directed graph is a finite set of nodes connected by a group of vectors and edges.
For example, a node can be considered as a "city", and each vector can be considered as a "Route" between two cities ".
There are many algorithms and papers on how to solve the traversal problem of each possible route and find the shortest path or the minimum cost path.
Most of these algorithms are procedural or recursive. However, the declarative language of SQL makes it easier to solve complex digraphs,
There is no need for much code.
Let's use the route between two cities as an example to create a table to save some hypothetical data:
Create Table airports
(
Code char (3) Constraint airports_pk primary key,
Description varchar2 (200)
);
Insert into airports values ('lhr', 'London Heathrow, UK ');
Insert into airports values ('jfk ', 'New York-Kennedy, USA ');
Insert into airports values ('gru', 'sao Paulo, Brazil ');
Create Table fares
(
Depart char (3 ),
Arrive char (3 ),
Price number,
Constraint fares_pk primary key (depart, arrive ),
Constraint fares_depart_fk foreign key (depart) References airports,
Constraint fares_arrive_fk foreign key (arrive) References airports
);
Insert into fares values ('lhr', 'jfk, 700 );
Insert into fares values ('jfk ', 'gru', 600 );
Insert into fares values ('lhr', 'gru', 1500 );
Insert into fares values ('gru', 'lhr', 1600 );
You cannot use the connect by syntax to solve how to migrate data from London to Sao Paulo, because data in the figure generates a loop (from Sao Paulo ):
Select * from fares connect by prior arrive = depart start with depart = 'lhr ';
Error:
ORA-01436: connect by loop in user data
To solve the directed graph problem, we need to create a temporary table to save all possible paths between two nodes. We must note that we do not copy paths that have already been processed,
In this case, we do not want to go back to the same location at the beginning. I also want to track the number of flights required to reach the destination and the descriptions of the routes taken.
Use the following script to create a temporary table:
Create global temporary table faretemp
(
Depart char (3 ),
Arrive char (3 ),
Hops integer,
Route varchar2 (30 ),
Price number,
Constraint faretemp_pk primary key (depart, arrive)
);
A simple view can slightly simplify the code used in this example. The view can calculate a path from the faretemp table based on a single range in the fares table.
Data that reaches the next voyage:
Create or replace view nexthop
As
Select SRC. Depart,
DST. arrive,
SRC. Hops + 1 hops,
SRC. Route | ',' | DST. Arrive route,
SRC. Price + DST. Price
From faretemp SRC, fares DST
Where SRC. Arrive = DST. depart
And DST. arrive! = SRC. depart;
/
Show errors;
This algorithm is quite simple. First, fill the faretemp table with the data in the fares table as the initial range. Then, retrieve all the data we just inserted,
Use them to create all possible two-hop paths. Repeat this process until a new path is created between the two nodes.
The loop process exits after all possible paths between nodes are described. If we are only interested in a certain starting condition,
We can also limit the first insertion to reduce the amount of data loaded. The code for path discovery is as follows:
Truncate table faretemp;
Begin
-- Initial connections
Insert into faretemp
Select depart, arrive, 1, depart | ',' | arrive, price from fares;
While SQL % rowcount> 0 Loop
Insert into faretemp
Select depart, arrive, hops, route, price from nexthop
Where (depart, arrive)
Not in (select depart, arrive from faretemp );
End loop;
End;
/
Show errors;
Select * From faretemp order by depart, arrive;
You can view the output in table.
There is a small problem with the previous data. Data is a set of Shortest Paths (minimum number of flights) between points. However, the flight from London to Sao Paulo is not the cheapest.
To solve the cheapest cost problem, we need to make an improvement on our cycle. We should use this route to replace the original route when we find a cheaper route in a flight.
The modified code is as follows:
Truncate table faretemp;
Declare
Rochelle count integer;
Begin
-- Initial connections
Insert into faretemp
Select depart, arrive, 1, depart | ',' | arrive, price from fares;
Rochelle count: = SQL % rowcount;
While l_count> 0 Loop
Update faretemp
Set (hops, route, price) =
(Select hops, route, price from nexthop
Where depart = faretemp. depart
And arrive = faretemp. Arrive)
Where (depart, arrive) in
(Select depart, arrive from nexthop
Where price <faretemp. Price );
Rochelle count: = SQL % rowcount;
Insert into faretemp
Select depart, arrive, hops, route, price from nexthop
Where (depart, arrive)
Not in (select depart, arrive from faretemp );
Rochelle count: = Rochelle count + SQL % rowcount;
End loop;
End;
/
Show errors;
Select * From faretemp order by depart, arrive;
You may view the output in table B.
The algorithm finds that LHR, JFK, and Gru routes are cheaper than LHR and Gru routes, so the former replaces the latter. The cycle will be at no lower cost,
And exit when there are no other possible routes.