1. Prepare
When MPI is used for parallel computing, it can be allocated by task or data according to the specific requirements of the program. Based on the characteristics of Matrix Products, data is distributed here, that is, each computer node computes different data. Due to the characteristics of matrix data, data is segmented by row. Because I am using the C language, the array in the C language indicates that the data address in the downlink is continuous (Note: if it is a FORTRAN language, the column is continuous ).
2. MPI Program Framework
The MPI program runs by inputting the doscommand. Therefore, the MPI program is generally in the main function, that is, the program entry function. Generally, there are four functions: mpi_init, mpi_comm_rank, mpi_comm_size, and mpi_finalize. The usage is as follows:
Remember to add MPI. Lib
1 # include "MPI. h "2 3 int main (INT argc, char * argv []) 4 {5 Int myid, numprocs; 6 mpi_init (& argc, & argv ); // MPI initialize 7 mpi_comm_rank (mpi_comm_world, & myid); // obtain the current process number 8 mpi_comm_size (mpi_comm_world, & numprocs ); // obtain the number of processes 9 // MPI calculation process 10 mpi_finalize (); // end 11}
3. Matrix Multiplication
Matrix Multiplication is used to block the matrix and then execute it by each process. Finally, the calculation result is transmitted to the main process.
Assume M * n. Before calculation, the matrix N is sent to all slave processes, then the matrix m is segmented, and the data in M is distributed to each slave process by row, calculate the product of data and N in the middle M branch from the process, and finally send the result to the main process. For convenience, M is divided into several blocks based on the number of processes. All the data blocks except the last one are of the same size, and the last one is the remaining data, the size is greater than or equal to the size of other data blocks, because the number of rows in the matrix is not necessarily the total number of processes. The last piece of data is calculated in the main process, while others are calculated from the process.
Define two matrices, M and N, which are required by all processes. m can be defined only in the main process. Other variables depend on the main process and slave process needs to be defined in the appropriate position as required.
The Code is as follows, including matrix initialization, data transmission, and matrix product calculation.
1 void matgen (float * a, int width); 2 // generate Matrix 3 void matgen (float * a, int width) 4 {5 Int I, J; 6 For (I = 0; I <width; I ++) 7 {8 for (j = 0; j <width; j ++) 9 {10 // A [I * width + J] = (float) rand ()/rand_max + (float) rand ()/(rand_max * rand_max ); // generate a matrix with 0 ~ 1 11 A [I * width + J] = 1.00; 12} 13} 14} 15 16 void main (INT argc, char * argv []) 17 {18 float * m, * n, * P, * buffer, * ans; 19 int width = 1000; 20 int myid, numprocs; 21 mpi_status status; 22 23 mpi_init (& argc, & argv ); // MPI initialize 24 mpi_comm_rank (mpi_comm_world, & myid); // obtain the current process number 25 mpi_comm_size (mpi_comm_world, & numprocs ); // obtain the number of processes 26 27 int line = width/numprocs; // divide the data into (number of processes) blocks, and the main process must also process data 28 m = (float *) malloc (sizeof (float) * width); 29 n = (float *) malloc (sizeof (float) * width); 30 p = (float *) malloc (sizeof (float) * width); 31 // The cache size is greater than or equal to the data size to be processed. If the cache size is greater than, you only need to pay attention to the 32 buffer = (float *) of the actual data *) malloc (sizeof (float) * width * line); // data group size 33 ans = (float *) malloc (sizeof (float) * width * line ); // Save the data block settlement result 34 35 // The main process assigns an initial value to the matrix and broadcasts matrix n to each process, broadcast the matrix M group to each process 36 IF (myid = 0) 37 {38 // The Matrix assigns the initial value 39 matgen (m, width); 40 matgen (n, width ); 41 // send matrix n to other slave processes 42 // mpi_bcast (n, width * width, mpi_float, 0, mpi_comm_world); 43 for (INT I = 1; I <numprocs; I ++) 44 {45 mpi_send (n, width * width, mpi_float, I, 0, mpi_comm_world ); 46} 47 // send the lines of m to each slave process in sequence 48 for (int m = 1; m <numprocs; m ++) 49 {50/* 51 // group by row, get the buffer, and send it to the slave process 52 for (INT I = (S-1) * line; I <m * line; I ++) 53 {54 for (Int J = 0; j <width; j ++) 55 {56 buffer [(m-1) * line) * width + J] = m [I * width + J]; 57} 58} 59 // The data buffer is sent to the process M 60 mpi_send (buffer, line * width, mpi_float, m, M, mpi_comm_world); 61 */62 mpi_send (m + (S-1) * line * width, width * line, mpi_float, M, 1, mpi_comm_world ); 63} 64 // receive the result calculated from the process 65 for (int K = 1; k <numprocs; k ++) 66 {67 mpi_recv (ANS, line * width, mpi_float, k, 3, mpi_comm_world, & status); 68 // pass the result to the array p 69 for (INT I = 0; I <line; I ++) 70 {71 for (Int J = 0; j <width; j ++) 72 {73 p [(k-1) * line + I) * width + J] = ans [I * width + J]; 74} 75} 76} 77 // calculate m remaining data 78 for (INT I = (numprocs-1) * line; I <width; I ++) 79 {80 for (Int J = 0; j <width; j + +) 81 {82 float temp = 0.0; 83 for (int K = 0; k <width; k ++) 84 temp + = m [I * width + k] * n [K * width + J]; 85 p [I * width + J] = temp; 86} 87} 88 // test result. Calculate the sum of one row and 89 float sum1 = 0; 90 float sum2 = 0; 91 for (INT I = 0; I <width; I ++) 92 {93 sum1 + = m [I]; 94 sum2 + = P [600 * width + I]; 95} 96 printf ("sum1 = % F sum2 = % F \ n", sum1, sum2); 97 // statistical time 98 double clockend = (double) Clock (); 99 printf ("myid: % d Time: %. 2fs \ n ", myid, (clockend-clockstart)/1000); // result test 100} 101 102 // other processes receive data. After calculation, sent to the main process 103 else104 {105 // receives broadcast data (matrix N) 106 // mpi_bcast (n, width * width, mpi_float, 0, mpi_comm_world ); 107 mpi_recv (n, width * width, mpi_float, 108, mpi_comm_world, & status); 109 mpi_recv (buffer, width * line, mpi_float, mpi_comm_world, & status ); 110 // calculate the product result and send the result to the main process 111 for (INT I = 0; I <line; I ++) 112 {113 for (Int J = 0; j <width; j ++) 114 {115 float temp = 0.0; 116 for (int K = 0; k <width; k ++) 117 temp + = buffer [I * width + k] * n [K * width + J]; 118 ans [I * width + J] = temp; 119} 120} 121 // transmit the computing result to the master process 122 mpi_send (ANS, line * width, mpi_float, 123, 124, mpi_comm_world); 125} mpi_finalize (); // end 126 127 return 0; 128}
You can test the above results. To facilitate the test, you do not need to enter a command on the DOS terminal every time and create a. bat batch processing command. The method is as follows:
Create a new text file, add the execution command in it, such as: mpiexec-N 5 xxx.exe, and rename the file to XXX. bat, double-click the BAT file to execute the MPI program, but the output results will not be visible. This method is useful when a program with a GUI is running another program or when you want to modify the running parameters, you only need to edit the BAT file. It is much more convenient than in a command prompt.
For the above program, there are actually some optimizations. After careful analysis, we can see that the data needed in the process is as follows:
Main processes: M, N, P (storage results), line
Slave process: N, buffer (storing data in M Blocks), ANS (storing computing results), line (number of m rows)
Buffer directly accepts data in M blocks, and matrix P in the main process directly accepts the result value transmitted from the process ans.
As follows:
1 for (int k=1;k<numprocs;k++)2 {
MPI_Recv(P+(k-1)*line*Width,line*Width,MPI_FLOAT,k,3,MPI_COMM_WORLD,&status);3 }
4. Out-of-Question
In addition to this method, all computation is performed from the process, and the main process is only responsible for data management. In this way, the calculation takes less time than the above method. Please verify it by yourself.