There is a not elegant but practical method, that is, to add the following code to the program:
TMP = 0Do while (TMP. eq.0) call sleep (2) enddo
Its function is equivalent to inserting a breakpoint. During MPI program debugging, it can also be used to determine whether the program before the breakpoint has an error that causes the program to crash and exit. I personally think it is very useful.
[Root @ c0109 zlt] # Cat hello. f program Hello implicit none include 'mpif. h 'integer myid, numprocs, ierr call mpi_init (ierr) Call mpi_comm_rank (mpi_comm_world, myid, ierr) call timeout (mpi_comm_world, numprocs, ierr) print *, 'Hello world, process ', myid call mpi_finalize (ierr) end [root @ c0109 zlt] # mpif90-G hello. f-O hello [root @ c0109 zlt] # mpirun-GDB-NP 2 hello0-1: (GDB) run0-1: continuing.0: Hello world, process 01: Hello world, process 10-1: 0-1: Program exited normally.0-1: (GDB)
This command generates four xterms to run GDB. Each instance corresponds to a process, just like debugging a serial program with GDB. Note that it cannot be used on an SSH terminal.
[Root @ c0109 zlt] # mpirun-NP 4 xterm-e GDB my_mpi_application
Open a new terminal window and view the hello process:
[Root @ c0108 test] # ps aux | grep helloroot 4342 0.0 0.0 11108 592? S/root/zmpi/test/Hello 4297 root 4359 0.0 0.1 136188 5160? S python2/usr/local/bin/mpdgdbdrv. py helloroot 4360 0.0 0.1 136188 5160? S python2/usr/local/bin/mpdgdbdrv. py helloroot 4361 0.0 0.1 81124 7348? S GDB-Q helloroot 4362 0.0 0.1 81124 7348? S GDB-Q helloroot 4388 0.0 0.0 65328 772 pts/1 R + grep hello
After finding the corresponding process number, you can enable GDB to debug each process separately.
0-1: (gdb) run 43620-1: Continuing.
This command runs the MPI program and uses memchecker
[Root @ c0109 zlt] # mpirun-NP 2 valgrind./Hello
Serial debugging:
[Root @ c0108 zlt] # gcc-G hello. c-o Hello [root @ c0108 zlt] # gdbgnu GDB (GDB) Red Hat Enterprise Linux (7.0.1-23. el5_5.2) Copyright (c) 2009 Free Software Foundation, Inc. license gplv3 +: gnu gpl Version 3 or later <pttp://gnu.org/licenses/gpl.html> This is free software: You are free to change and redistribute it. there is no warranty, to the extent permitted by law. type "show copying" and "show warrant Y "for details. this GDB was configured as "x86_64-redhat-linux-gnu ". for bug reporting instructions, please see: <pttp://www.gnu.org/software/gdb/bugs/>. (GDB) file helloreading symbols from/root/zmpi/zlt/Hello... done. (GDB) R # run the program (Run Command abbreviation). If the program has command line parameters, it will also be placed in starting program:/root/zmpi/zlt/Hello world! Program exited with code 01. (GDB) list # equivalent to list, listing source code 1 # include <stdio. h> 2 # include <stdlib. h> 34 int main () {5 char * Buf; 6 Buf = "Hello world! "; 7 printf (" % s/n ", Buf); 8 return 1; 9} (GDB) break 3 # Break test. c: 3. Set the breakpoint. Run breakpoint 1 at 0x4004a0: file hello in the source code line 3rd. c, line 3. (GDB) info break # view the breakpoint information num type disp ENB address what1 breakpoint keep y 0x00000000004004a0 in main at hello. c: 3 (GDB) runstarting program:/root/zmpi/zlt/Hello breakpoint 1, main () at hello. c: 66 Buf = "Hello world! "; (GDB) N # Next, run a single statement, C continues to run the program 7 printf (" % s/n ", Buf); (GDB) nhello world! 8 return 1; (GDB) print Buf # print the value of the variable Buf $1 = 0x4005b8 "Hello world! "(GDB) What is bufno symbol" is "in current context. (GDB) whatis Buf # Check the variable Buf type = char * (GDB) delete breakpoint 1 (GDB) info breakno breakpoints or watchpoints. (GDB) quit # exit gdba debugging session is active. inferior 2 [process 22747] will be killed. quit anyway? (Y or N) y [root @ c0108 zlt] #
Appendix: Use GDB to debug the program
Http://dsec.pku.edu.cn /~ Yuhj/wiki/gdb.html