Breadth-first search (BFS) Algorithm

Source: Internet
Author: User
Breadth-first search (BFS) Algorithm

The width-first search algorithm (also known as the breadth-first search) is one of the simplest graph search algorithms. This algorithm is also a prototype of many important graph algorithms. Dijkstra single-source shortest path algorithm and prim Minimum Spanning Tree Algorithm both adopt the same idea as width-first search.

Given that the graph G = (V, E) and a Source Vertex s are known, the width first searches for the edge of G in a systematic way to "Discover" all vertices that s can reach, calculate the distance from S to all these vertices (minimum number of edges). This algorithm can generate a tree with a root of S and a width priority tree with all reachable vertices. For any vertex v that can be reached from S, the path from S to V in the width-First tree corresponds to the shortest path from S to V in graph G, that is, the path containing the minimum number of edges. This algorithm is also applicable to Directed Graphs and undirected graphs.

It is called the width-first algorithm because the algorithm has been extending the border between the located vertex and the last vertex from start to end. That is to say, the algorithm first searches all vertices whose distance from S is K, then search for other vertices whose distance from S is K + L.

To maintain the search trajectory, the width-first search colors each vertex: white, gray, or black. Before the algorithm starts, all vertices are white. As the search proceeds, each vertex gradually turns gray and then black. When we encounter a vertex for the first time in the search, we say the vertex is found, and the vertex becomes a non-white vertex. Therefore, gray and black vertices have been found, but the width-first search algorithm differentiates them to ensure that the search is executed in width-first mode. If (u, v) ε E and vertex u is black, then vertex v is either gray or black, that is, all vertices adjacent to the Black vertex have been found. Gray vertices can be adjacent to some white vertices, which represent the boundary between the existing and unfound vertices.

A width-First tree is created during the width-first search process. At the beginning, only the root node is included, that is, the Source Vertex S. when a white vertex v is detected in the list of adjacent vertex u, the vertex v and edge (u, v) are added to the tree. In the width of the Priority tree, we call the node u is the node v's predecessors or parent node. A node can only be found once at most, so it can have only one parent node at most. Relative to the root node, the relation between the ancestor and descendant is defined in the same way. If U is in the path from root s to node V in the tree, U is called the ancestor of v, V is the descendant of U.

In the following width-first search process, BFs assumes that the input graph G = (V, E) is represented by an adjacent table, and several additional data structures are used for each vertex in the graph, for each vertex uε V, the color is stored in the color [u] variable, and the parent of the node u is stored in the variable π [u. If u has no parent (for example, u = s or U has not been retrieved), π [u] = nil, the distance between the Source Vertex S and vertex u calculated by the algorithm is stored in the variable d [u]. The algorithm uses an FIFO queue Q to store the gray node set. In this example, head [Q] indicates the head element of queue Q, enqueue (Q, v) indicates that element V is included in the queue, and dequeue (q) indicates that the opposite element is out of the queue; adj [u] indicates the set of nodes adjacent to u in the graph.

Procedure BFS (G, S); begin1. for each node u, v [g]-{s} Do begin2. color [u] ← white; 3. d [u] ← ∞; 4. π [u] ← nil; end; 5. color [s] gradient gray; 6. d [s] limit 0; 7. π [s] ← nil; 8. Q bought {s} 9. while Q =phi do begin10. U ← head [Q]; 11. for each node v adj ε [u] do12. If color [v] = white then begin13. color [v] ← gray; 14. d [v] mongod [v] + 1; 15. π [v] ← U; 16. enqueue (Q, V); end; 17. dequeue (Q); 18. color [u] Begin black; end;

Figure 1 shows the search process with BFS In the example graph. The black side is a branch produced by BFs. The value in node u is d [u], and the queue Q shown in the figure is the queue at the start of each iteration in the while loop of the 9-18 rows. Below each node in the queue is the distance between the node and the source node.

Figure 1 execution of BFS on an undirected graph

In the process, BFs is executed as follows. Line 1-4 sets each node to white, d [u] to infinite, and the parent of each node sets it to nil, row 3 sets the source node s to gray, which means that the source node has been discovered at the beginning of the process. Row 3 initializes d [s] to 0, Row 3 sets the parent node of the source node to nil, and row 3 initializes queue to 0 so that it only contains the source node S, in the future, Q queues only contain gray nodes.

The main cycle of the program is in rows 9-18. As long as there are gray nodes in queue Q, that is, the nodes that have been found but have not completely searched for their adjacent tables, the cycle will continue. Row 3 determines that the gray node of the queue header is U. The cycle in the 11-16 rows examines each vertex v in the U's adjacent table. If V is a white node, it has not been found yet. The algorithm executes line 13-16 to find the node. First, it is set to gray, and the distance from D [v] to d [u] + 1. Then U is recorded as the parent of the node, and finally put at the end of the Q queue. When all nodes in the node u's adjacent table are retrieved, the queue for exercise U on the 17-18 page is displayed and black.

Analysis

Before proving the various properties of the width-first search, let's do some simple work-analyze the running time of the algorithm on the graph G = (V, E. After initialization, no node is set to white. Therefore, the 12th-row test ensures that each node can only be connected to a queue once, so the queue can only be popped up once. It takes O (1) to join and exit the queue. Therefore, the queue operation takes all the time O (V ), because the list of adjacent tables is queried only when each vertex is popped up, the list of adjacent tables of each vertex is scanned at most once. Because the length of all adjacent tables is Q (e), it takes O (e) to scan all adjacent tables ). The initialization operation overhead is O (V). Therefore, the full running time of BFS in the process is O (V + E, the running time of the width-first search is a linear function of the size of the graph's adjacent table.

Shortest Path

At the beginning of this section, we have discussed that for a graph G = (V, E), the width-first search algorithm can obtain the distance from the known source node S, V, to each reachable node, we define the shortest path length delta (S, V) as the number of edges contained in the path from vertex s to vertex v with the minimum number of edges, if there is no path from S to V, it is ∞. The path with this distance delta (S, v) is the shortest path from S to V (We will promote the shortest path to the authorization graph in the following article, each edge has a real-type weight value. The weight of a path is the sum of the weights of all edges in the path. Currently, no authorization diagram is discussed ). Before proving that the width-first search algorithm calculates the shortest path length, let's take a look at an important property of the shortest path length.

Theorem 1

If G = (V, E) is a directed graph or an undirected graph, and S, V is any node of G, then e,

Delta (S, v) ≤ delta (S, U) + 1

Proof:

If the vertex s can reach the vertex u, the second can also reach the v. In this case, the shortest path from S to v cannot be longer than the shortest path from S to U plus the edge (u, v). Therefore, the inequality is true; if the vertex u is not reachable from S, then Delta (S, v) = ∞, and the inequality remains true.

We try to illustrate the d [v] = delta (S, v) calculated for each vertex in the V, BFs process. Next we first prove that D [v] is delta (S, v.

Theorem 2

Set G = (V, E) to a directed or undirected graph. Assume that the algorithm BFS is executed from the known source node in G, S, and V. When the execution ends, the value of the variable d [v] for each vertex is equal to or greater than delta (S, V ).

Proof:

We sum up the number of times a vertex enters the queue Q. We assume that it is true for all vertices V, d [v] ≥delta (S, V.

The basis of induction is the situation where the node S is put into the queue Q in the BFS process 8th line. Then, the induction hypothesis is true, because for any node v in V-{s }, d [s] = 0 = delta (S, S) and D [v] = ∞ ≥ delta (S, V ).

Then, we will summarize and consider finding a white vertex v from the vertex u search. We will assume that D [u] Is ≥delta (S, U ). We can see from the value assignment statement of line 1 and the Theorem 1 in the process.

D [v] = d [u] + 1 ≥ delta (S, U) + 1 ≥ delta (S, V)

Then, node v is inserted into queue Q. It will not be inserted into the queue again, because it has been set to gray, and the then clause of row 13-16 only operates on the white node, so that the value of d [v] will not change, so the induction hypothesis is true.

To prove that D [v] = delta (S, v), we must first show more accurately how to operate the queue during BFS execution, the following quote indicates that the nodes in the queue have at most two different D values at any time.

Theorem 3

Assume that during the execution of BFS on Figure G = (V, E), the queue Q contains the following nodes: <V1, V2 ,..., VR>, where V1 is the queue Q header and VR is the end of the queue, then d [VI] ≤ d [V1] + 1 and D [VI] ≤ d [vi + 1], I = 1, 2 ,.., r-1.

Proof:

The process is to summarize the number of queue operations. Initially, the queue only contains vertex S, and the theorem is naturally correct.

We must prove that the theorem is still valid after a vertex is pressed and popped up. If the queue header V1 is popped up, the new queue header is V2 (if the queue is empty at this time, the theorem is true ), therefore, if d [VR] ≤ d [V1] + 1 ≤ d [V2] + 1, the remaining inequality remains true. Therefore, V2 is the topic of the team. To insert a node into the queue, You need to carefully analyze the process BFs. In the 16th rows of BFS, when vertex v joins the queue to become VR + 1, the queue header V1 is actually scanning the vertex u of its adjacent table, therefore, d [VR + 1] = d [v] = d [u] + 1 = d [V1] + 1, in this case, d [VR] ≤ d [V1] + 1 = d [u] + 1 = d [v] = d [VR + 1], the remaining inequality d [VR] ≤d [VR + 1] remains true. Therefore, when node v is inserted into the queue, the rationale is also correct.

Now we can prove that the width-first search algorithm can correctly calculate the shortest path length.

Theorem 1 Correctness of width-first search

If G = (V, E) is a directed graph or undirected graph, and the process BFS starts execution from a vertex in G, BFS can find that the source node S is reachable to every node v. At the end of the operation, it is directed to any V, d [v] = delta (S, V ). In addition, for any node that is reachable from s, one of the shortest paths from S to V is the shortest path from S to π [v] plus the above (π [v], v ).

Proof:

We first prove that node v is inaccessible from S. From the theorem 2, D [v] ≥delta (S, v) = ∞, according to the process of 14th rows, vertex v cannot have a finite d [v] value, it is impossible to have the first vertex that meets the following conditions: the D value of the vertex is set to ∞ by the 14th line statement of the process. Therefore, only the vertex with a finite D value exists, 14th rows of statements will be executed. Therefore, if v is inaccessible, it will not be found in the search.

The proof is mainly for the vertex reachable by S. Set VK to a set of vertices whose distance from S is K, that is, VK = {v ε v: Delta (S, v) = k }. The proof process is to summarize K. As an induction hypothesis, we assume that for each vertex v in VK, only a specific time is satisfied in BFS execution:

  • Node V is gray;
  • D [v] is set to K;
  • If V =s, π [v] is set to U for a U in Vk-1;
  • V is inserted into queue Q;

As we have previously stated, at most one specific moment meets the above conditions.

The initial situation of induction is k = 0. In this case, V0 = {s}, because the source node S is obviously the only node with the S distance of 0, during the initialization process, S is set to gray, d [s] is set to 0, and S is placed in Q, so the induction hypothesis is true.

We should note that, unless the algorithm is terminated, Q is not empty in the queue, and once a node u is inserted into the queue, d [u] and π [u] will not change. According to the Theorem 3, we can see that in the algorithm process, nodes are in the order of V1, V2 ,..., when VR is inserted into the queue, the corresponding distance sequence increases monotonically: d [VI] ≤ d [vi + 1], I = 1, 2 ,..., r-1.

Now we want to consider any node, V, VK, and K, ≥1. According to the monotonicity and D [v] ≥k (by Theorem 2) and inductive hypothesis, we can see that if V can be found, it will be after all nodes in the Vk-1 enter the queue.

From Delta (S, v) = K, we can know that there is a path with K edge from S to V, so there must be a node u in Vk-1, and (u, v) e. Without losing the general, set U is the first gray node to meet the conditions (according to the induction can be seen that all nodes in the Set Vk-1 are set to gray), BFs put each gray node into the queue, in this way, line 3 shows that node u will eventually appear as the head of the team. When it has become the head of the team, its adjacent table will be scanned and node V will be found (node v cannot be found before this because it is not with VJ (j <k-1) any node in the adjacent connection, otherwise v cannot belong to VK, and according to the assumption that U is the first node found in the Vk-1 adjacent to V ). Set V to gray in 13th rows, d [v] = d [u] + L = K in 14th rows, and π [v] To u in 15th rows, row 3 inserts V into the queue. Because V is any node in VK, the inductive hypothesis is proved to be true.

Prior to the proof of the ending theorem, we noticed that if V is in VK, We can get π [v] In Vk-1 as far as we know, so we get a shortest path from S to V: that is, the shortest path from S to π [v] goes through the edge (π [v], V ).

Width priority tree

In the process, BFs creates a width-First tree while searching for graphs. As shown in 1, this tree is represented by the π domain of each node. We formally define the first subgraph as follows, for graph G = (V, E), the Source Vertex is s, the first subgraph g π = (V π, e π) meet:

V π = {v ε v: π [v] =nil} else {s}

And

E π = {(π [v], v) ε E: V ε v π-{s }}

If V π is composed of vertices reachable from S, then the first subgraph g π is a width priority tree, and for all V, V π, the unique simple path from S to V in G π is also a shortest path from S to V in G. Because it is connected to each other and | E π | = | V π |-1 (by the nature of the tree), the width precedence tree is actually a tree, and the edges in E π are called branches.

When BFS is executed from the source node s of graph G, the following theorem shows that the first subgraph is a width-First tree.

Theorem 4

When the process of BFS is applied to a directed or undirected graph G = (V, E), the process is also established in the π domain to meet the conditions: the first subgraph g π = (V π, E π) is a width-First tree.

Proof:

In the BFS process, the first line of the statement is equivalent to (u, v) ε E and delta (S, v) <∞ (that is, V is reachable from S) and set π [v] = u, therefore, V π is composed of vertices reachable from V in V. Because G π forms a tree, it contains the unique path of each node from S to V π, according to Theorem 1, we can see that each path is a shortest path. (Certificate completed)

The following process prints all the nodes in the shortest path from S to V, assuming that the BFS has been run and the shortest path tree has been obtained.

 procedure Print_Path(G,s,v);  begin1.  if v=s 2.     then write(s)3.     else if π[v]=nil 4.             then writeln('no path from ',s,' to ',v, 'exists.')               else                  begin5.                 Print_Path(G,s,π[v]);6.                 write(v);                 end;   end;

Because the path of each recursive call is one vertex less than the previous call, the running time of this process is a linear function about the number of top points on the print path.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.