Floyd-warshall algorithm and its parallelization implementation (based on MPI)

Source: Internet
Author: User

The Floyd-warshall algorithm (or Floyd algorithm) is a classical algorithm for finding the shortest path between fixed and ending points in weighted graphs, which is designed based on the dynamic programming idea. The Floyd algorithm we know today is presented and published in 1962 by computer scientists (and also Turing Award winners) Robert Floyd. But before that, Bernard Roy (1959) and Stephen Warshall (1962) separately proposed a similar algorithm. In this paper, we will mainly discuss the implementation of the parallel Floyd algorithm based on MPI.

Welcome to the White Horse negative Gold Http://blog.csdn.net/baimafujinji blog, in order to ensure that the formula, the chart is displayed correctly, it is strongly recommended that you view the original blog from this address. The main interests of this blog include: Digital image processing, algorithm design and analysis, data structure, machine learning, data mining, statistical analysis methods, natural language processing.

Serial implementation

Given that the Floyd algorithm is very well known, there is a lot of information explaining its rationale, and there is no intention to repeat too much of the Floyd algorithm itself. If you do not know the Floyd algorithm itself, you can refer to the beauty of the algorithm-hidden in the principle behind the data structure (c + + version) of the 8th Chapter 4th section of the introduction.

But in order to be the parallel implementation of the control version, we first give a C + + serial implementation of the Floyd algorithm. For the following procedure, we make the following conventions:

    • The figure is a direction;
    • The assignment is 1 (not_connected) If there is no edge directly between the two nodes when initializing.
    • The node's markup starts with 1, that is, the 1th node, the 2nd node, and so on, but not the No. 0 node.

      Sample code is shown below:

#include <cstdio>#include <cstdlib>#include <cstring>using namespace STD;#define MAX Ten#define NOT_CONNECTED-1intDistance[max][max];//number of NodesintNodescount;//initialize all distances tovoidInitialize () {memset(Distance, not_connected,sizeof(distance)); for(intI=0; i<max;++i) distance[i][i]=0;}intMain () {Initialize ();//get the nodes Count    scanf("%d", &nodescount);//edges Count    intMscanf("%d", &m); while(m--) {//nodes-let The indexation begin from 1        intA, B;//edge Weight        intCscanf("%d-%d-%d", &a, &c, &b);    Distance[a][b]=c; }//floyd-warshall     for(intk=1; k<=nodescount;++k) { for(intI=1; i<=nodescount;++i) {if(distance[i][k]!=not_connected) { for(intj=1; j<=nodescount;++j) {if(distance[k][j]!=not_connected && (distance[i][j]==not_connected | | distance[i][k]+distance[k][j]<                    DISTANCE[I][J])) {distance[i][j]=distance[i][k]+distance[k][j]; }                }            }        }    } for(inti =1; I <= Nodescount; ++i) { for(intj =1; J <= Nodescount; ++J) {printf("%d", Distance[i][j]); }printf("\ n"); }return 0;}

The above code reads a file to represent a weighted graph, the first line of the file represents the number of nodes, the second row represents the number of edges, and each row represents a weighted table, for example, the contents of the Graph.txt file are as follows

451-1-21-10-42-2-32-3-43-1-4

where 1-1-2 represents the weight of the edge from node 1 to Node 2 is 1.
Two other diagram files that can be used for testing: graph2.txt

591-5-22-2-31-3-35-1-14-1-51-2-44-4-33-7-52-3-5

Graph3.txt

571-13-51-6-45-2-21-6-22-3-34-1-34-5-5

The following results can be obtained from the execution of our procedures:

$ g++-5 floyd_s.cpp-o a.out$./a.out<graph3.txt0 6 7 6 11- 1 0 3 -1 -1 - 1 -1 0 -1 -1 - 1 7 1 ) 0 5 - 1 2 5 -1 0 $./a.out<graph2.txt0 5 3 2 3 4 0 2 6 3 8 0 7 2 7 4 0 1 1 6 4 3 0 $./A.OUT&LT;GRAPH.TXT0 1 3 4- 1 0 2 3 - 1 -1 0 1 - 1 -1 -1 0 
Parallel implementations

The

Now discusses the idea of parallel implementations. The basic idea is to divide a large matrix by rows, each processor (or compute node, note that the nodes on the distributed supercomputer, not the nodes in the graph) are responsible for several rows in the matrix, for example, our matrix size is 16 16, ready to be computed in parallel on four processors, then divide the entire matrix by rows into four small matrices: A, B, C, D. Each process is then responsible for one of them.



Below is the MPI-based parallel Floyd algorithm that I implemented under C + +.

//author:http://blog.csdn.net/baimafujinji/#include <cstdio>#include <cstdlib>#include <cstring>#include "mpi.h"using namespace STD;#define MAX Ten#define NOT_CONNECTED-1intDistances[max][max];intResult[max][max];//number of NodesintNodescount;//initialize all distances tovoidInitialize () {memset(Distances, not_connected,sizeof(distances));memset(Result, not_connected,sizeof(result)); for(intI=0; i<max;++i) distances[i][i]=0;}intCMP (Const void*a,Const void*B) {return*(int*) A-* (int*) b;}intMainintargcChar*argv[]) {Initialize ();//get the nodes Count    scanf("%d", &nodescount);//edges Count    intMscanf("%d", &m); while(m--) {//nodes-let The indexation begin from 1        intA, B;//edge Weight        intCscanf("%d-%d-%d", &a, &c, &b);    Distances[a][b]=c; }intSize, rank;    Mpi_init (&AMP;ARGC,&AMP;ARGV);    Mpi_datatype Rtype;    Mpi_comm_size (Mpi_comm_world, &size); Mpi_comm_rank (Mpi_comm_world, &rank);intSlice = (nodescount)/size; Mpi_bcast (distances, Max*max, Mpi_int,0, Mpi_comm_world); Mpi_bcast (&nodescount,1, Mpi_int,0, Mpi_comm_world); Mpi_bcast (&slice,1, Mpi_int,0, Mpi_comm_world);//floyd-warshall    intSent=1; for(intk=1; k<=nodescount;++k) {intth =1; for(; th <= size; th++) {if(1+slice* (th-1) <= K && k <= slice*th) sent = th; }if(1+slice* (th-1) <= K && k <= nodescount) sent = size; Mpi_bcast (&distances[k], nodescount+1, Mpi_int, sent-1, Mpi_comm_world);if(Rank! = size-1){ for(intI=1+slice* (rank); i<=slice* (rank+1); ++i) {if(distances[i][k]!=not_connected) { for(intj=1; j<=nodescount;++j) {if(distances[k][j]!=not_connected && (distances[i][j]==not_connected | | distances[i][k]+                        Distances[k][j]<distances[i][j]) {Distances[i][j]=distances[i][k] + distances[k][j]; }                    }                }            }        }Else{ for(intI=1+slice*rank;i<=nodescount;++i) {if(distances[i][k]!=not_connected) { for(intj=1; j<=nodescount;++j) {if(distances[k][j]!=not_connected && (distances[i][j]==not_connected | | distances[i][k]+                        DISTANCES[K][J]&LT;DISTANCES[I][J])) {distances[i][j]=distances[i][k]+distances[k][j]; }                    }                }            }        }    } for(intk=1; k<=nodescount;++k) {intth =1; for(; th <= size; th++) {if(1+slice* (th-1) <= K && k <= slice*th) sent = th; }if(1+slice* (th-1) <= K && k <= nodescount) sent = size; Mpi_bcast (&distances[k], nodescount+1, Mpi_int, sent-1, Mpi_comm_world); } mpi_reduce (distances, result, Max*max, Mpi_int, Mpi_min,0, Mpi_comm_world);if(rank==0)    { for(inti =1; I <= Nodescount; i++) { for(intj =1; J <= Nodescount; J + +) {printf("%d", Result[i][j]); }printf("\ n"); }printf("\ n"); }/ * Shut down MPI * /Mpi_finalize ();return 0;}

Once the code is complete, let's verify that the results of the above program are consistent with the version of the serial implementation.

$ mpirun-n 3./a.out<graph3.txt0 6 7 6 11- 1 0 3 -1 -1 - 1 -1 0 -1 -1 - 1 7 1 ) 0 5 - 1 2 5 -1 0 $ mpirun-n 3./a.out<graph2.txt0 5 3 2 3 4 0 2 6 3 8 0 7 2 7 4 0 1 1 6 4 3 0 $ mpirun-n 3./a.out<graph.txt0 1 3 4- 1 0 2 3 - 1 -1 0 1 - 1 -1 -1 0 

It can be seen that our parallel programs output the expected results.

(End of this article)

Floyd-warshall algorithm and its parallelization implementation (based on MPI)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.