Principle and Implementation of topological sorting

Last Update:2014-12-04 Source: Internet

Author: User

Tags getv

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Principle and Implementation of topological sorting

This article describes the topological sorting from the following aspects:

Definition and preconditions of topological sorting
And the concept of partial/full order in Discrete Mathematics
Typical Implementation Algorithm
- Kahn Algorithm
- DFS-based algorithms
  - Uniqueness of Solution
  - Actual Example
    
    Definition and preconditions:
    Definition: sorts vertices in a directed graph in a linear manner. That is, for any directed edge uv connecting from vertex u to vertex v, in the final sorting result, vertex u is always in front of vertex v.
    
    If this concept is a little abstract, consider a very classic example-course selection. I think anyone who has read books related to data structures knows about it. Suppose I really want to learn a machine learning course, but before taking this course, we must learn some basic courses, such as introduction to computer science, C language programming, data structure, algorithms. This process of selecting course order is actually a process of topological sorting. Each course is equivalent to a vertex in a directed graph, the directed edge between the connected vertices is the sequence of course learning. However, this process is not so complex and thus completed in our brain naturally. The result described in the form of an algorithm is the topological sorting.
    
    So can all Directed Graphs be sorted by topology? Apparently not. Continue to consider the above example. If you want to learn machine learning before taking the introduction to computer science course, will you be confused? In this case, Topology Sorting cannot be performed, because there is a dependency between them, so it is impossible to determine who is first and then who. In a directed graph, this situation is described as a loop. Therefore, a Directed Graph is a Directed Acyclic Graph (DAG: Directed Acyclic Graph ).
    
    Partial Order/Full-order relationship:
    Partial Order and full order are actually concepts in discrete mathematics.
    I am not going to talk about too many formal definitions here. The formal definitions are detailed in the textbooks or the links given above.
    
    The two concepts are described in the above examples. Suppose we can select machine learning or computer graphics after learning algorithms. This means that there is no particular sequence between the two courses of machine learning and computer graphics. Therefore, among all the courses we can choose, the relationship between any two courses is either definite (I .e., having a sequential relationship) or uncertain (I .e., there is no sequential relationship ), there is no conflicting relationship (loop ). The above is the meaning of partial order. In abstraction, there is no loop between two vertices in the directed graph. It doesn't matter whether the two vertices are connected or not. Therefore, Directed Acyclic graphs must satisfy the partial order.
    
    After understanding the concept of Partial Order, the full order is easy to handle. The so-called full order is based on the partial order. Any pair of vertices in a directed acyclic graph still need to have a clear relationship (reflected in the graph, it is a one-way connection relationship, note that it cannot be connected in two directions, so it becomes a ring ). It can be seen that full order is a special case of partial order. Back to our course selection example, if machine learning can only be learned after learning computer graphics (it is possible to learn machine learning algorithms related to the graphics field ......), Then there is a definite order between them, and the original partial order relationship becomes the full order relationship.
    
    In fact, in many places, the concept of partial and full order exists.
    For example, you can sort several unequal Integers to obtain the unique sorting result (from small to large, the same below ). This conclusion should not be followed by any doubt :) but if we consider this natural problem from the perspective of partial/full order, we may have other experiences.
    
    So how can we use partial/full order to explain the uniqueness of sorting results?
    We know that the relationship between the sizes of different integers is fixed, that is, 1 is always less than 4. No one will say that 1 is greater than or equal to 4. That is to say, this sequence satisfies the full-order relationship. For a structure with a fully ordered relationship (for example, an array with different integers), the result after its linearity (sorting) must be unique. For sorting algorithms, one of our evaluation indicators is to check whether the Sorting Algorithm is stable, that is, whether the sorting results of elements with the same values are consistent with the order in which they appear. For example, fast sorting is unstable because the order of the same elements in the final result is different from that before sorting. The concept of partial order can be used to explain this phenomenon: the relationship between elements with the same value cannot be determined. Therefore, their order of appearance in the final result can be arbitrary. For stable sorting such as insertion sorting, there is also a potential comparison method for elements with the same value, that is, to compare their appearance sequence, the first element is greater than the one that appears. Therefore, through this potential comparison, the partial order relationship is converted to the full order relationship, thus ensuring the uniqueness of the result.
    
    Extended to the topological sorting, the result has a unique condition that all its vertices have a full order relationship. Without this full-order relationship, the result of topological sorting is not unique. As we will see later, if the result of the topological sorting is unique, the result of the topological sorting also represents a Hamilton path.
    
    Typical implementation algorithms:
    KahnAlgorithm:
    Abstract A pseudo-code description of the Kahn Algorithm on Wikipedia:
    L merge Empty list that will contain in the sorted elements
    S defined Set of all nodes with no incoming edges
    While S is non-empty do
    Remove a node n from S
    Insert n into L
    Foreach node m with an edge e from nto m do
    Remove edge e from thegraph
    Ifm has no other incoming edges then
    Insert m into S
    If graph has edges then
    Return error (graph has at least onecycle)
    Else
    Return L (a topologically sortedorder)
    
    It is not difficult to see that the implementation of this algorithm is very intuitive. The key is to maintain a set of vertices with an inbound degree of 0:
    Each time a vertex is extracted from the set (there is no special fetch rule, random fetch is also a line, use the queue/stack is also a line, the same below), the vertex is placed in the List of save results.
    Then, the system cyclically traverses all edges derived from the vertex, removes the edge, and obtains another vertex of the edge. If the indegree of the vertex is 0 after the edge is subtracted, the vertex is also placed in the set with the inbound degree of 0. Then retrieve a vertex from the set ............
    
    When the set is empty, check whether any edge exists in the graph. If so, there is at least one loop in the graph. If it does not exist, the result List is returned. The order in this List is the result of topological sorting of the graph.
    
    Implementation Code:
    [Java]View plaincopyprint?
    1. Public class KahnTopological
    2. {
    3. Private List Result; // used to store the result set
    4. Private Queue SetOfZeroIndegree; // used to store vertices whose input degree is 0
    5. Private int [] indegrees; // record the current inbound degree of each vertex
    6. Private int edges;
    7. Private Digraph di;
    9. Public KahnTopological (Digraph di)
    10. {
    11. This. di = di;
    12. This. edges = di. getE ();
    13. This. indegrees = new int [di. getV ()];
    14. This. result = new ArrayList ();
    15. This. setOfZeroIndegree = new shortlist ();
    17. // Initialize the set with an inbound value of 0
    18. Iterable [] Adjs = di. getAdj ();
    19. For (int I = 0; I <adjs. length; I ++)
    20. {
    21. // V-> w for each edge
    22. For (int w: adjs [I])
    23. {
    24. Indegrees [w] ++;
    25. }
    26. }
    28. For (int I = 0; I <indegrees. length; I ++)
    29. {
    30. If (0 = indegrees [I])
    31. {
    32. SetOfZeroIndegree. enqueue (I );
    33. }
    34. }
    35. Process ();
    36. }
    38. Private void process ()
    39. {
    40. While (! SetOfZeroIndegree. isEmpty ())
    41. {
    42. Int v = setOfZeroIndegree. dequeue ();
    44. // Add the current vertex to the result set
    45. Result. add (v );
    47. // Traverse all edges drawn from v
    48. For (int w: di. adj (v ))
    49. {
    50. // Remove the edge, expressed by reducing the number of Edges
    51. Edges --;
    52. If (0 = -- indegrees [w]) // if the inbound degree is 0, add the set with the inbound degree of 0.
    53. {
    54. SetOfZeroIndegree. enqueue (w );
    55. }
    56. }
    57. }
    58. // If an edge exists in the graph, it indicates that the graph contains a loop.
    59. If (0! = Edges)
    60. {
    61. Throw new IllegalArgumentException (Has Cycle !);
    62. }
    63. }
    65. Public Iterable GetResult ()
    66. {
    67. Return result;
    68. }
    69. }
      
      Result of topological sorting:
      2-> 8-> 0-> 3-> 7-> 1-> 5-> 6-> 9-> 4-> 11-> 10-> 12
      
      Complexity Analysis:
      When initializing a set with 0 degrees of input, you need to traverse the entire graph and check each node and each edge. Therefore, the complexity is O (E + V );
      Then, you need to traverse the entire graph, and the complexity of each edge is O (E + V );
      Therefore, the complexity of the Kahn algorithm is O (E + V ).
      
      Based onDFSTopology Sorting:
      In addition to the preceding intuitive Kahn algorithm, the advanced traversal can also be used for Topology Sorting. In this case, we need to use the stack structure to record the Topology Sorting results.
      Also extract a piece of pseudo code on Wikipedia:
      L merge Empty list that will contain in the sorted nodes
      S defined Set of all nodes with no outgoing edges
      For each node n in S do
      Visit (n)
      Function visit (node n)
      If n has not been visited yet then
      Mark n as visited
      For each node m with an edgefrom m to ndo
      Visit (m)
      Add n to L
      The implementation of DFS is simpler and more intuitive, and Recursive Implementation is used. To use DFS for Topology Sorting, you only need to add one line of code, that is, the last line in the preceding pseudo code: add n to L.
      Note that the time to add a vertex to the result List is when the visit method is about to exit.
      The implementation of this algorithm is very simple, but it is more complicated to understand.
      The key lies in why adding the vertex to a set at the end of the visit method ensures that the set is the result of topological sorting?
      Because the time to add a vertex to the set is when the dfs method is about to exit, and the dfs method itself is a recursive method, as long as the current vertex still has an edge pointing to any other vertex, it recursively calls the dfs method without exiting. Therefore, exiting the dfs method means that the current vertex does not point to the edge of other vertices, that is, the current vertex is the last Vertex on a path.
      
      Below is a simple proof of its correctness:
      Consider any side v-> w. When you call dfs (v), there are three situations:
      1. Dfs (w) has not been called, that is, w has not been marked. At this time, dfs (w) is called, and dfs (v) will be returned after dfs (w) returns.
        
        Dfs (w) has been called and returned, that is, w has been marked
        
        Dfs (w)Called but called at this timeDfs (v)Not returned yet
        Note that the third case above is impossible in the scenario of topological sorting, because if case 3 is valid, a path from w to v exists. Now we have a premise that we have an edge from v to w, which leads to a loop in our graph, so this graph is not a directed acyclic graph (DAG ), as we know, non-Directed Acyclic graphs cannot be sorted by topology.
        
        In the first two cases, both case 1 and Case 2 will be added to the result list before v. Therefore, Edge v-> w always points to the first vertex after the result set. To make the results more natural, you can use the stack as the data structure for storing the final results, this ensures that the edge v-> w always points to the vertex that appears first in the result set.
        
        Implementation Code:
        [Java]View plaincopyprint?
        
        Public class DirectedDepthFirstOrder
        
        {
        
        // Visited array, required for DFS implementation
        
        Private boolean [] visited;
        
        // Use the stack to save the final result
        
        Private Stack ReversePost;
        
        /**
        
        * Topological Sorting Constructor
        
        */
        
        Public DirectedDepthFirstOrder (Digraph di, boolean detectCycle)
        
        {
        
        // DirectedDepthFirstCycleDetection is a class used to check whether loops exist in the directed graph.
        
        DirectedDepthFirstCycleDetection detect = new DirectedDepthFirstCycleDetection (
        
        Di );
        
        If (detectCycle & detect. hasCycle ())
        
        Throw new IllegalArgumentException (Has cycle );
        
        This. visited = new boolean [di. getV ()];
        
        This. reversePost = new Stack ();
        
        For (int I = 0; I <di. getV (); I ++)
        
        {
        
        If (! Visited [I])
        
        {
        
        Dfs (di, I );
        
        }
        
        }
        
        }
        
        Private void dfs (Digraph di, int v)
        
        {
        
        Visited [v] = true;
        
        For (int w: di. adj (v ))
        
        {
        
        If (! Visited [w])
        
        {
        
        Dfs (di, w );
        
        }
        
        }
        
        // Add the current vertex to the result set when the dfs method is about to exit
        
        ReversePost. push (v );
        
        }
        
        Public Iterable GetReversePost ()
        
        {
        
        Return reversePost;
        
        }
        
        }
        Complexity Analysis:
        The complexity is the same as that of DFS, that is, O (E + V ). Specifically, you must first ensure that the graph is a directed acyclic graph. to judge whether the graph is a DAG, you can use the DFS-based algorithm. The complexity is O (E + V ), the subsequent topological sorting is also dependent on DFS, and the complexity is O (E + V)
        
        We still perform topological sorting for the directed graph above, but this time we use the DFS-based algorithm. The result is:
        8-> 7-> 2-> 3-> 0-> 6-> 9-> 10-> 11-> 12-> 1-> 5-> 4
        
        Summary of the two implementation algorithms:
        These two algorithms use linked lists and stacks to represent result sets respectively.
        For DFS-based algorithms, the condition for adding a result set is that the output degree of the vertex is 0. This condition seems to be the same as that of the vertex set with an input of 0 in the Kahn algorithm. The idea of these two algorithms is like two sides of a coin. They seem to be in conflict, but they do not. One is to construct the result set from the perspective of inbound, and the other is to construct the result set from the perspective of outbound.
        
        Implementation differences:
        The Kahn algorithm does not need to check that the graph is a DAG. If the graph is a DAG, after the set with an outbound degree of 0 is empty, no removed edge exists in the graph, this shows that a loop exists in the figure. The DFS-based algorithm needs to determine the figure as DAG first. Of course, it can also make appropriate adjustments so that loop detection and Topology Sorting can be performed simultaneously. After all, loop detection can also be performed on the basis of DFS.
        The complexity is O (V + E ).
        
        Loop detection and Topology Sorting:
        
        [Java]View plaincopyprint?
        
        Public class DirectedDepthFirstTopoWithCircleDetection
        
        {
        
        Private boolean [] visited;
        
        // Call stack used to record the dfs Method for loop detection
        
        Private boolean [] onStack;
        
        // Used to construct a loop when it exists
        
        Private int [] edgeTo;
        
        Private Stack ReversePost;
        
        Private Stack Cycle;
        
        /**
        
        * Topological Sorting Constructor
        
        */
        
        Public DirectedDepthFirstTopoWithCircleDetection (Digraph di)
        
        {
        
        This. visited = new boolean [di. getV ()];
        
        This. onStack = new boolean [di. getV ()];
        
        This. edgeTo = new int [di. getV ()];
        
        This. reversePost = new Stack ();
        
        For (int I = 0; I <di. getV (); I ++)
        
        {
        
        If (! Visited [I])
        
        {
        
        Dfs (di, I );
        
        }
        
        }
        
        }
        
        Private void dfs (Digraph di, int v)
        
        {
        
        Visited [v] = true;
        
        // When the dfs method is called, the current vertex is recorded in the call stack.
        
        OnStack [v] = true;
        
        For (int w: di. adj (v ))
        
        {
        
        If (hasCycle ())
        
        {
        
        Return;
        
        }
        
        If (! Visited [w])
        
        {
        
        EdgeTo [w] = v;
        
        Dfs (di, w );
        
        }
        
        Else if (onStack [w])
        
        {
        
        // When w has been accessed and w also exists in the call stack, there is a loop
        
        Cycle = new Stack ();
        
        Cycle. push (w );
        
        For (int start = v; start! = W; start = edgeTo [start])
        
        {
        
        Cycle. push (v );
        
        }
        
        Cycle. push (w );
        
        }
        
        }
        
        // When the dfs method is about to exit, add the vertex to the Topology Sorting result set and exit from the call stack.
        
        ReversePost. push (v );
        
        OnStack [v] = false;
        
        }
        
        Private boolean hasCycle ()
        
        {
        
        Return (null! = Cycle );
        
        }
        
        Public Iterable GetReversePost ()
        
        {
        
        If (! HasCycle ())
        
        {
        
        Return reversePost;
        
        }
        
        Else
        
        {
        
        Throw new IllegalArgumentException (Has Cycle: + getCycle ());
        
        }
        
        }
        
        Public Iterable GetCycle ()
        
        {
        
        Return cycle;
        
        }
        
        }
        
        Uniqueness of the topological sorting solution:
        Hamilton path:
        A Hamilton path is a path that can access all vertices in the graph exactly once. This article only explains the relationship between the Hamilton path and the topological sorting. For the definition and application of the Hamilton path, see the link given in the beginning of this article.
        
        As mentioned above, when any two vertices in a DAG have a deterministic relationship, the Topology Sorting solution for this DAG is unique. This is because they form a fully ordered relationship, the result after the structure with a full-order relationship is linear must be unique (for example, the result of sorting a batch of integers using a stable Sorting Algorithm must be unique ).
        
        It should be noted that non-DAG can also contain the Hamilton path. In order to use the topological sorting for determination, we mainly discuss the algorithm for judging whether the DAG contains the Hamilton path, therefore, the following figures refer to DAG.
        
        Now that we know the relationship between the Hamilton path and the topological sorting, how can we quickly detect whether a graph has a Hamilton path?
        According to the previous discussion, the key to the existence of the Hamilton path is to determine whether the vertex in the graph has a fully ordered relationship, and the key to the full order, that is, the order between any pair of vertices can be determined. Therefore, we can design an algorithm to traverse each pair of vertices in the vertex set, and then check whether there is a sequential relationship between them. If all vertices have a sequential relationship, the vertex set of the graph has a full order relationship, that is, the graph has a Hamilton path.
        
        But obviously, such algorithms are very inefficient. This solution cannot be applied to large-scale vertex sets. An inefficient solution is usually caused by failing to grasp the characteristics of existing problems. So let's look back at this question and see what features make us useless. Here is an example of sorting integers:
        For example, if there are three integers: 3, 2, and 1, we need to sort them. According to the previous thought, we respectively perform (1, 2), (2, 3), (1, 3) the comparison requires three comparisons, but we know that the comparison between 1 and 3 is actually redundant. Why do we know that this comparison is redundant? In my opinion, we subconsciously use the integer comparison rule to satisfy the passability. However, computers cannot subconsciously use transmission. Therefore, it is unnecessary to tell computers in other ways. Therefore, you can select a sorting algorithm that is more efficient than insert sorting, such as Merge Sorting and quick sorting, to accelerate the n2 algorithm to nlogn. You can also use the characteristics of the problem to adopt a more unique solution, such as base sorting.
        
        Let's get back to the question. What we have not used now is the rule of pass in the full-order relationship. How can we use it? The simplest idea is often the most practical. We still choose sorting. After sorting, we do not indirectly use the transfer rule to detect each adjacent element? Therefore, we first use topological sorting to sort vertices in the graph. After sorting, check each pair of adjacent vertices to see if there is a sequential relationship. If each pair of adjacent vertices has a consistent sequential relationship (in a directed graph, this relationship is embodied in a directed edge, that is, to check whether there is a directed edge between adjacent vertex pairs ). Then, we can determine that there is a Hamilton path in the graph, and vice versa.
        
        Implementation Code:
        [Java]View plaincopyprint?
        
        /**
        
        * Hamilton Path Detection for DAG
        
        */
        
        Public class DAGHamiltonPath
        
        {
        
        Private boolean effectonpathpresent;
        
        Private Digraph di;
        
        Private KahnTopological kts;
        
        // Use the Kahn algorithm to sort the topology.
        
        Public dagstmtonpath (Digraph di, KahnTopological kts)
        
        {
        
        This. di = di;
        
        This. kts = kts;
        
        Process ();
        
        }
        
        Private void process ()
        
        {
        
        Integer [] topoResult = kts. getResultAsArray ();
        
        // Check each pair of adjacent vertices in sequence. If there is no path between them, there is no Hamilton path.
        
        For (int I = 0; I <topoResult. length-1; I ++)
        
        {
        
        If (! HasPath (topoResult [I], topoResult [I + 1])
        
        {
        
        Required tonpathpresent = false;
        
        Return;
        
        }
        
        }
        
        Required tonpathpresent = true;
        
        }
        
        Private boolean hasPath (int start, int end)
        
        {
        
        For (int w: di. adj (start ))
        
        {
        
        If (w = end)
        
        {
        
        Return true;
        
        }
        
        }
        
        Return false;
        
        }
        
        Public boolean hasHamiltonPath ()
        
        {
        
        Return effectonpathpresent;
        
        }
        
        }

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More