Transferred from: http://www.oracle.com/technetwork/cn/articles/hartley-recursive-086819-zhs.html
Recursive database processing, also known as BOM or part decomposition , applies to a wide range of applications, including human resources, manufacturing, financial markets and education. The data involved in such processing is called tree structure data or hierarchical data. Oracle databases have long supported recursion through specialized syntax (CONNECT by clauses). Oracle Database, version 2nd, supports recursion by sub-query decomposition, which provides a better new way to solve the old problems: querying hierarchy data.
Table expression
First, let's review the SQL syntax established for the new feature in version 2nd. Here we choose education as an illustration of the recursive processing field. Our original example uses the following Course table:
CNO CNAME CRED clabfee cdept-----------------------------------------C11 INTRO to CS 3 CIS C22 DATA Structures 3 cis C33 Discrete Mathematics 3 0 cis C44 DIGITAL circuits 3 0 cis C55 computer AR CH. 3 cis C66 relational DATABASE 3 CIS C77 computer programming 1 3 CIS P11 empiricism
3 Phil P22 rationalism 3 Phil P33 existentialism 3 Phil P44 solipsism 6< C24/>0 PHIL
Each row in the table describes a course that is uniquely identified by the CNO column. Each course is opened by a department (Cdept), each of which assigns the credits (CRED) that the student should take to complete the course, and includes the course fees (CRED) to be paid for students enrolled in the course. The following query shows the results of a study conducted by the philosophy department in terms of the credit-course tuition fee combination:
SELECT CRED, Clabfee from COURSE WHERE cdept = ' PHIL ';
CRED clabfee----------- 3 3 3 6 0
Observation: The input to the query is a table. The output of the query is also a table-result table. The result table itself can be the target of the query by enclosing the query in parentheses and then including it in the FROM clause of another SELECT. Such a query can also be called a table expression because it produces a table. It can also be called a subquery because it is a query within another query.
Support for subqueries makes it possible to create queries in multiple ways for the same problem. If another department has also set up a credit-course tuition combination and the credit of a course offered by the Department of Philosophy-course tuition (i.e. the output of the first query) the same course, consider how to determine. Queries 1, 2, and 3 (given at the end of this article) provide three solutions. Executing these three queries will produce the following result set:
CNO CNAME CRED clabfee cdept-----------------------------------------C77 Computer Programming 1 3 CI S C55 computer ARCH. 3 CIS C11 INTRO to CS 3 cis C22 DATA structures 3 CIS
Sub-query decomposition
The use of subqueries can go to another level. Consider a query on a view. Conceptually, a view defines a result table on which a query can be executed. Suppose you can write an expression that allows a name to be associated with a result table. The query that uses that name will be a query against that result table. Sub-query decomposition (also known as Common table expressions ) is the embodiment of this thought. The WITH clause assigns a name to the subquery block. You can then use the assigned name to reference the query block in a query.
Using this method, query 4 finds the highest total tuition fee for the course. The query contains a level two aggregation. First, the SUM function is applied to each system through the GROUP by clause to determine the total cost of each system. Secondly, the system of the highest total cost is determined based on the total cost of each department. Dtotal is a named query that has multiple references designed for it. When there is no subquery decomposition, SELECT must be encoded against the Course table in two successive from clauses. Because the result of SUM (Clabfee) is an export value, the Totfee column alias is used in the subquery. Subsequent references to Dtotal will use this alias. In Oracle Database version 2nd, you can name rename in a query expression declaration (that is, use a column alias) instead of rename in a subquery:
With Dtotal (cdept, Totfee) as
Version 2nd further expands the WITH clause to support recursive queries through an attribute called a "recursive with clause". First, let's take a quick look at Oracle's dedicated recursive syntax.
Recursive "old" method
The process of retrieving data from a tree structure is called recursive processing. Traditionally, the Oracle database supports recursive processing with two specialized clauses, CONNECT by and START with. CONNECT by indicates that rows are to be retrieved in the tree structure sequence. The conditions specified in the clause indicate a parent-child relationship. If PRIOR appears in front of the parent column, it means that you want to traverse down. If PRIOR is in front of a child column, it is traversed upward. Start with specifies the starting point of the traversal, known as the provenance. You can enter the tree from any node and determine the incoming node through the START with clause.
To illustrate recursion, we use the modified version of the Course table, where each course has another course as a prerequisite for the course. The direct premise of a course will never be superfluous; However, a course can be a prerequisite for multiple courses. This relationship is recursive because it associates an entity with another entity of the same type. This relationship is represented in the Coursex table, as shown below.
CNO pcno CNAME CRED clabfee cdept---------------------------------------------C11 INTRO to CS 3 100 CIS C33 C11 Discrete Mathematics 3 0 cis C22 C33 DATA Structures 3 cis C44 C33 DIGITAL Circuits 3 0 CIS C55 C44 computer ARCH. 3 CIS C66 C22 relational DATABASE 3 cis C77 C33 INTRO to programming 1 3 C is P11 empiricism 3 Phil P22 P11 rationalism 3 Phil P33 P11 existentialism< C31/>3 Phil P44 solipsism 6 0 Phil
PCNO is the foreign key that establishes the relationship. If a course has no prerequisites, the foreign key value is NULL.
One of the main characteristics of recursive relationships is that it can be represented as a tree structure. When using this structure, the terms "parent" and "child" are used to describe the relationship between nodes on the tree. In Figure 1, C11 is the parent of C33, and C33 is a child of C11. Nodes that do not have a parent node (such as C11, P11) correspond to courses that do not have prerequisites. These nodes are at the top of the tree and act as the root node. Nodes that have no child nodes (such as C66, C55) appear at the bottom of the tree, called leaf nodes.
Figure 1 recursive relationships expressed in a tree-like structure
Query 5 Use the recursive processing method to identify the course designator and course name of all courses that are prerequisites for the course C22. Running the query produces the following output:
CNO pcno CNAME---------------------------C22 C33 DATA structures C33 C11 Discrete Mathematics C11 - INTRO To CS
The level pseudo-column can be referenced using the SELECT statement for CONNECT by. Always enter the tree from Level 1. As the source begins to traverse the nodes, the levels increase gradually. Then traverse back to the provenance to reduce the level.
Recursive "new" method
Recursive by sub-query decomposition, you need to define a named subquery using the WITH clause, and a query for that named subquery. Query 6 uses the new recursive with clause attribute to implement the same results as the CONNECT by query shown in Query 5. A named subquery contains two query blocks that are combined by UNION all operations. The first query block is an initialization subquery (also known as an anchor point) whose encoding is non-recursive, including the provenance that determines the starting point of the survey. The system will first process this subquery. The second query block is a recursive subquery that adds rows to the result based on a relationship to the rows already in the result. The trick here is to define how new rows are associated with old rows. The new row is identified by joining the named query with the original table determined by the anchor point. UNION all combines an anchor point with a recursive subquery to ensure that duplicate records are not purged from the results. These two query blocks must be compatible merges, meaning that the same number of columns must be selected in two query blocks.
The aliases in the list that follow the name of the query form the columns of the result table for the named query. These aliases can be referenced in recursive subqueries and subsequent queries on named queries.
Recursion requires a termination condition. Each time a recursive subquery is executed, because it wants to read a temporary view established by a common table expression, it can only see rows that were added to the view by the last iteration of the recursive query. The system continuously evaluates the recursive query until no new rows are added to the temporary view.
We now look at the concept of how this process works for query 6. First, the initial session subquery is executed to generate the temporary view. The execution of this subquery adds the following line to the temporary view (named C here):
C22 C33 DATA Structures
After the initialization query is executed, the recursive subquery is executed by merging the contents of the temporary view. Therefore, execute the following query:
Select X.cno, X.pcno, x.cname from (select CNO, Pcno, CNAME from coursex WHERE CNO = ' C22 ') C, COURS EX X WHERE c.pcno = x.cno;
Executing the query adds the following lines to the temporary view:
C33 C11 Discrete Mathematics
Execute the recursive subquery again, merging the newly added rows in the temporary view. Therefore, execute the following query:
Select X.cno, X.pcno, x.cname from (select X.cno, X.pcno, x.cname from (select CNO, Pcno, CNAME from COUR SEX WHERE CNO = ' C22 ') c, Coursex x where c.pcno = X.cno) C, coursex X where c.pcno = X.cno;
Executing the query adds the following lines to the temporary view:
C11- INTRO to CS
The recursive subquery is executed again by merging the newly added rows in the temporary view. This time, the query does not produce a result. The operation was completed because no rows were previously added to the temporary view. This event is the termination condition.
Traverse direction
The conditions specified in the recursive subquery indicate a parent-child relationship. Use the Named query (C) to qualify the parent column (CNO), which indicates that the traversal direction is downward. The starting point of the traversal is determined by the provenance in the initial subquery. You can also walk up the tree to access the information stored in the parent and ancestor nodes. Use a named query to qualify the child column (PCNO), which indicates that the traversal direction is upward.
Level pseudo-columns can only be used with the CONNECT by clause. However, the same effect can be achieved by introducing additional aliases into the query. This method will be demonstrated in query 7, and query 7 uses an alias named LVL to identify the level or distance from the provenance. Executing the query produces the following results:
LVL CNO PCNO CNAME------------------------------ 1 C22 C33 DATA Structures 2 C33 C11 discrete mathemat ICS 3 C11 INTRO to CS
The source of the query is C22, so the LVL value of the corresponding row in the result table is 1. As shown in Figure 1 above, the course C33 is the parent node of C22, so the value of LVL for that row is 2. Course C11 is the parent node of C33, so we have moved up a layer from the provenance, and the LVL value in the last row of the results table reflects this.
Recursion and looping
A special case of hierarchical data is a loop that occurs when descendants are also ancestors. If a loop is detected, CONNECT by reports an error in the recursive query. In Oracle Database, You can make the system return the results of the query by specifying nocycle. If you do not specify this parameter, the query will fail because there is a loop in the data. The connect_by_iscycle pseudo-column indicates whether the current row contains a child node of its own ancestor.
The following has_a_cycle table contains a loop: C33 and C22 are prerequisites, and each is the parent node of the other.
CNO pcno-------C11 C22 C11 C33 C22 C22 C33
Executing a recursive query without the nocycle parameter will result in the following error:
Ora-01436:connect by loop in user data
Recursion through sub-query decomposition uses the cycle clause to mark loops in the process. You can refer to the columns of a named query in this clause, and the system can also use the columns of a named query to detect loops. When using recursive subquery decomposition, the concept of loops is also more extensive. If the value of the loop column of the ancestor of a row is the same as the value of the loop column in the current row, there is a loop. The columns used to detect loops are not limited to columns that define a recursive relationship.
The SET clause generates a column in the result, called a loop tag, that sets the value of the column to indicate whether a loop is detected for the current row. If a loop is detected, the search for the row's child rows is stopped. If no loops are detected, the loop marker is set to the specified default value. The value of the loop tag must be a single character. As with the CONNECT by clause, if loop detection is not included in the query, that is, there is no cycle clause and an error occurs once the loop is discovered. Query 8 includes a cycle clause for detecting loops and continuing the process. As you can see in the result table below, the loop marker is accessible as a column, but it is not within the scope of the named query.
CNO pcno cyclemarker------------------C11 n C22 C11 n C33 C22 n C22 C33 Y
Search Order
Another enhancement to recursive processing is the ability to specify the traversal order. You can specify either DEPTH first or breadth first, both of which are sequence traversal. In the DEPTH first traversal, the child nodes of a node are returned, and then the sibling nodes of that node (that is, nodes with the same parent node) are returned. In the breadth first traversal, all the rows in the hierarchy are returned before descending to the next level. Therefore, the sibling node of a node is returned before its child nodes. The sibling nodes are sorted according to the values in the columns listed after the by keyword. It can be ascending (ASC) or Descending (DESC).
Use the SET clause to display the order in which nodes are accessed during the search. A column alias is also introduced that can be used in the final query to display the results or sort the results. Although the increase or decrease in the level concept in the Oracle recursion process reflects the departure or proximity of the provenance, the alias value in the SET clause continues to increase throughout the traversal. Query 9 illustrates the DEPTH first search and the sort of sibling nodes that are based on the course tuition value of a sibling node. Executing the query produces the following results.
CNO pcno clabfee XX-------------------------C11 1 C33 C11 0 2 C77 C33 3 C22 C33 4 C66 C22 5 C44 C33 0 6 C55 C44 7
Observe the order of sibling nodes C77, C22, and C44. They appear in the output based on the value of the tuition for each course. In these courses, the course C77 the highest tuition fees, so it appears first in the specified descending sequence.
To suggest the use of breadth first traversal, we made the following assumptions:
- Courses with no prerequisites are first-year courses.
- There is a prerequisite for a second-year course.
- There are several prerequisites for a third-grade/four-year course.
The breadth first traversal of the curriculum will generate a sort of course, which can reflect the university rankings of the school's Bachelor program. Modifying the search in Query 9 to breadth first produces the following output:
CNO pcno clabfee XX-------------------------C11 1 C33 C11 0 2 C77 C33 3 C22 C33 4 C44 C33 0 5 C66 C22 6 C55 C44 7
Traverse Network
The method used to traverse the tree structure can also be used to traverse the network. The network structure is composed of many-to-many relationships. For example, if you allow a course to have multiple prerequisites and allow a course as a precondition for multiple courses, you need a separate table to represent the relationship. You can use the previous CONNECT by syntax or the new subquery decomposition syntax to traverse such a table.
Summarize
The new recursive with clause feature of Oracle Database, version 2nd, provides a fresh way to work with hierarchical data. It also provides a more powerful loop detection capability, optionally using DEPTH first or breadth first traversal to process data.
In this article, we briefly describe these features in a few very simple use cases. For more details, see the "Resources" section.
Query 1: Associating subqueries using an existential testSELECT * from COURSE C1 where cdept <> ' PHIL ' and EXISTS (SELECT * from COURSE C2 WHERE C2. Clabfee = C1. Clabfee and C2. CRED = C1. CRED and C2. cdept = ' PHIL ') Query 2: Return multiple columns of subqueriesSELECT * from COURSE WHERE cdept <> ' PHIL ' and (CRED, Clabfee) in (select CRED, clabfee from COURSE wher E cdept = ' PHIL '); Query 3: Join using table expressionsSelect c1.* from COURSE C1, (select CRED, clabfee from COURSE where cdept = ' PHIL ') C2 where C1. Clabfee = C2. Clabfee and C1. CRED = C2. CRED and C1. Cdept <> ' PHIL '; Query 4: Sub-query Decomposition exampleDtotal As (select Cdept, SUM (Clabfee) as "Totfee" from COURSE dtotal WHERE totfee = (select MAX (totfee)
dtotal); Query 5: Recursive processing using CONNECT bySELECT CNO, Pcno, CNAME from Coursex CONNECT by CNO = PRIOR Pcno START with CNO = ' C22 '; Query 6: Using a recursive with clauseWith C (CNO, PCNO, CNAME) as (SELECT CNO, PCNO, CNAME – initialization subquery from coursex WHERE CNO = ' C22 ' --seed UNION all SELECT x.cno, X.pcno, X.cname – recursive subquery from C, coursex X WHERE c.pcno = X.CNO) SELECT CNO, Pcno, CNAME from C; Query 7: Report a hierarchy in a recursive subqueryWith C (LVL, CNO, PCNO, CNAME) as ((SELECT 1, CNO, PCNO, cname from coursex WHERE CNO = ' C22 ') UNION All (SELECT c.lvl+1, X.cno, X.pcno, x.cname from C, coursex X LVL, CNO, Pcno, CNAME from C; Query 8: Process loops in recursive subqueryWith C (CNO, Pcno) as (SELECT CNO, PCNO – initialization subquery from has_a_cycle WHERE CNO = ' C11 '--seed
union all SELECT x.cno, X.pcno – recursive subquery from C, has_a_cycle X WHERE x.pcno = c.cno)
CYCLE CNO SET cyclemarker to ' Y ' DEFAULT ' N ' SELECT CNO, Pcno, cyclemarker from C;
Query 9: Recursion using the search clauseWith C (LVL, CNO, Pcno, Clabfee) as (SELECT 0, CNO, Pcno, clabfee from coursex WHERE CNO = ' C11 ' UNION all
select c.lvl+1, X.cno, X.pcno, X.clabfee from C, coursex X WHERE c.cno = x.pcno)
SEARCH DEPTH first by C Labfee DESC SET xx
xx from C ORDER by
XX;
|
Resources
- oracle/sql A Professional Programmer ' sguide, Tim Hartley and Tim Martyn (Mcgraw-hill 1992)
- sql:1999 Understanding Relational Language Concepts,jim Melton and Alan Simon (Morgan Kaufmann 2002)
- Oracle Database 11g 2nd edition New in SQL language reference