The Floyd-warshall algorithm (or Floyd algorithm) is a classical algorithm for finding the shortest path between fixed and ending points in weighted graphs, which is designed based on the dynamic programming idea. The Floyd algorithm we know today is presented and published in 1962 by computer scientists (and also Turing Award winners) Robert Floyd. But before that, Bernard Roy (1959) and Stephen Warshall (1962) separately proposed a similar algorithm. In this paper, we will mainly discuss the implementation of the parallel Floyd algorithm based on MPI.
Welcome to the White Horse negative Gold Http://blog.csdn.net/baimafujinji blog, in order to ensure that the formula, the chart is displayed correctly, it is strongly recommended that you view the original blog from this address. The main interests of this blog include: Digital image processing, algorithm design and analysis, data structure, machine learning, data mining, statistical analysis methods, natural language processing.
Serial implementation
Given that the Floyd algorithm is very well known, there is a lot of information explaining its rationale, and there is no intention to repeat too much of the Floyd algorithm itself. If you do not know the Floyd algorithm itself, you can refer to the beauty of the algorithm-hidden in the principle behind the data structure (c + + version) of the 8th Chapter 4th section of the introduction.
But in order to be the parallel implementation of the control version, we first give a C + + serial implementation of the Floyd algorithm. For the following procedure, we make the following conventions:
- The figure is a direction;
- The assignment is 1 (not_connected) If there is no edge directly between the two nodes when initializing.
The node's markup starts with 1, that is, the 1th node, the 2nd node, and so on, but not the No. 0 node.
Sample code is shown below:
#include <cstdio>#include <cstdlib>#include <cstring>using namespace STD;#define MAX Ten#define NOT_CONNECTED-1intDistance[max][max];//number of NodesintNodescount;//initialize all distances tovoidInitialize () {memset(Distance, not_connected,sizeof(distance)); for(intI=0; i<max;++i) distance[i][i]=0;}intMain () {Initialize ();//get the nodes Count scanf("%d", &nodescount);//edges Count intMscanf("%d", &m); while(m--) {//nodes-let The indexation begin from 1 intA, B;//edge Weight intCscanf("%d-%d-%d", &a, &c, &b); Distance[a][b]=c; }//floyd-warshall for(intk=1; k<=nodescount;++k) { for(intI=1; i<=nodescount;++i) {if(distance[i][k]!=not_connected) { for(intj=1; j<=nodescount;++j) {if(distance[k][j]!=not_connected && (distance[i][j]==not_connected | | distance[i][k]+distance[k][j]< DISTANCE[I][J])) {distance[i][j]=distance[i][k]+distance[k][j]; } } } } } for(inti =1; I <= Nodescount; ++i) { for(intj =1; J <= Nodescount; ++J) {printf("%d", Distance[i][j]); }printf("\ n"); }return 0;}
The above code reads a file to represent a weighted graph, the first line of the file represents the number of nodes, the second row represents the number of edges, and each row represents a weighted table, for example, the contents of the Graph.txt file are as follows
451-1-21-10-42-2-32-3-43-1-4
where 1-1-2 represents the weight of the edge from node 1 to Node 2 is 1.
Two other diagram files that can be used for testing: graph2.txt
591-5-22-2-31-3-35-1-14-1-51-2-44-4-33-7-52-3-5
Graph3.txt
571-13-51-6-45-2-21-6-22-3-34-1-34-5-5
The following results can be obtained from the execution of our procedures:
$ g++-5 floyd_s.cpp-o a.out$./a.out<graph3.txt0 6 7 6 11- 1 0 3 -1 -1 - 1 -1 0 -1 -1 - 1 7 1 ) 0 5 - 1 2 5 -1 0 $./a.out<graph2.txt0 5 3 2 3 4 0 2 6 3 8 0 7 2 7 4 0 1 1 6 4 3 0 $./A.OUT<GRAPH.TXT0 1 3 4- 1 0 2 3 - 1 -1 0 1 - 1 -1 -1 0
Parallel implementations
The
Now discusses the idea of parallel implementations. The basic idea is to divide a large matrix by rows, each processor (or compute node, note that the nodes on the distributed supercomputer, not the nodes in the graph) are responsible for several rows in the matrix, for example, our matrix size is 16 16, ready to be computed in parallel on four processors, then divide the entire matrix by rows into four small matrices: A, B, C, D. Each process is then responsible for one of them.
Below is the MPI-based parallel Floyd algorithm that I implemented under C + +.
//author:http://blog.csdn.net/baimafujinji/#include <cstdio>#include <cstdlib>#include <cstring>#include "mpi.h"using namespace STD;#define MAX Ten#define NOT_CONNECTED-1intDistances[max][max];intResult[max][max];//number of NodesintNodescount;//initialize all distances tovoidInitialize () {memset(Distances, not_connected,sizeof(distances));memset(Result, not_connected,sizeof(result)); for(intI=0; i<max;++i) distances[i][i]=0;}intCMP (Const void*a,Const void*B) {return*(int*) A-* (int*) b;}intMainintargcChar*argv[]) {Initialize ();//get the nodes Count scanf("%d", &nodescount);//edges Count intMscanf("%d", &m); while(m--) {//nodes-let The indexation begin from 1 intA, B;//edge Weight intCscanf("%d-%d-%d", &a, &c, &b); Distances[a][b]=c; }intSize, rank; Mpi_init (&ARGC,&ARGV); Mpi_datatype Rtype; Mpi_comm_size (Mpi_comm_world, &size); Mpi_comm_rank (Mpi_comm_world, &rank);intSlice = (nodescount)/size; Mpi_bcast (distances, Max*max, Mpi_int,0, Mpi_comm_world); Mpi_bcast (&nodescount,1, Mpi_int,0, Mpi_comm_world); Mpi_bcast (&slice,1, Mpi_int,0, Mpi_comm_world);//floyd-warshall intSent=1; for(intk=1; k<=nodescount;++k) {intth =1; for(; th <= size; th++) {if(1+slice* (th-1) <= K && k <= slice*th) sent = th; }if(1+slice* (th-1) <= K && k <= nodescount) sent = size; Mpi_bcast (&distances[k], nodescount+1, Mpi_int, sent-1, Mpi_comm_world);if(Rank! = size-1){ for(intI=1+slice* (rank); i<=slice* (rank+1); ++i) {if(distances[i][k]!=not_connected) { for(intj=1; j<=nodescount;++j) {if(distances[k][j]!=not_connected && (distances[i][j]==not_connected | | distances[i][k]+ Distances[k][j]<distances[i][j]) {Distances[i][j]=distances[i][k] + distances[k][j]; } } } } }Else{ for(intI=1+slice*rank;i<=nodescount;++i) {if(distances[i][k]!=not_connected) { for(intj=1; j<=nodescount;++j) {if(distances[k][j]!=not_connected && (distances[i][j]==not_connected | | distances[i][k]+ DISTANCES[K][J]<DISTANCES[I][J])) {distances[i][j]=distances[i][k]+distances[k][j]; } } } } } } for(intk=1; k<=nodescount;++k) {intth =1; for(; th <= size; th++) {if(1+slice* (th-1) <= K && k <= slice*th) sent = th; }if(1+slice* (th-1) <= K && k <= nodescount) sent = size; Mpi_bcast (&distances[k], nodescount+1, Mpi_int, sent-1, Mpi_comm_world); } mpi_reduce (distances, result, Max*max, Mpi_int, Mpi_min,0, Mpi_comm_world);if(rank==0) { for(inti =1; I <= Nodescount; i++) { for(intj =1; J <= Nodescount; J + +) {printf("%d", Result[i][j]); }printf("\ n"); }printf("\ n"); }/ * Shut down MPI * /Mpi_finalize ();return 0;}
Once the code is complete, let's verify that the results of the above program are consistent with the version of the serial implementation.
$ mpirun-n 3./a.out<graph3.txt0 6 7 6 11- 1 0 3 -1 -1 - 1 -1 0 -1 -1 - 1 7 1 ) 0 5 - 1 2 5 -1 0 $ mpirun-n 3./a.out<graph2.txt0 5 3 2 3 4 0 2 6 3 8 0 7 2 7 4 0 1 1 6 4 3 0 $ mpirun-n 3./a.out<graph.txt0 1 3 4- 1 0 2 3 - 1 -1 0 1 - 1 -1 -1 0
It can be seen that our parallel programs output the expected results.
(End of this article)
Floyd-warshall algorithm and its parallelization implementation (based on MPI)