MPI debugging-error information sorting

Source: Internet
Author: User

If you are writing a program in FORTRAN, we recommend that you add implicit.
None, especially when there are a lot of code, you can check many problems in the compilation process.

1,

  • [Root @ c0108 parallel] # mpiexec-N 5./simple
  • Aborting job:
  • Fatal ErrorInMpi_irecv: Invalid rank, error Stack:
  • Mpi_irecv (143): mpi_irecv (BUF = 0x25dab60, Count = 0, mpi_double_precision, src = 5, tag = 99, mpi_comm_world, request = 0x7fffa02ca86c) failed
  • Mpi_irecv (95): Invalid rank has value 5 but must be nonnegative and less than 5
  • Rank 4InJob 5 c0108_52041 caused collective abort of All Ranks
  • Exit status of Rank 4:ReturnCode 13




The above indicates that process number 5 is invalid because [root @ c0108 parallel] # mpiexec-N 5. /When simple runs, five processes are enabled: 0 1 2 3 4, so it must be a problem of the Code itself, but not necessarily a process number itself, it may also be that the passing of a parameter is not successful, and there will always be many inexplicable errors in MPI...

In my code, the mpi_irecv statement is limited. Therefore, debug by adding the print statement to find the line of the error code, as shown below:


Print *, myid + 1, '20140901 '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


Call mpi_irecv (P (1, 1, location), IMAX * Jmax * min (ITSP, Ke-Myke ),

& Mpi_double_precision, myid + 1, rely, mpi_comm_world, req, ierr)

2,

  • [Root @ c0109 test] # mpiexec-N 5./simple
  • Rank 3InJob 22 c0109_000064 caused collective abort of All Ranks
  • Exit status of rank 3: killed by signal 11
  • [Root @ c0109 test] #
  • Signal 11 is a segment error. Signal 11, or officially know as "segmentation fault", means that the program accessed a memory location that was not assigned. That's usually
    A bug in the program.




3,

  • [Root @ c0108 test] # mpirun-NP 4./simple
  • Aborting job:
  • Fatal ErrorInMpi_wait: Invalid mpi_request, error Stack:
  • Mpi_wait (139): mpi_wait (request = 0x7fff1f675228, status0x7fff1f675218) failed
  • Mpi_wait (75): Invalid mpi_request
  • Rank 2InJob 24 c0108_52041 caused collective abort of All Ranks
  • Exit status of Rank 2:ReturnCode 13



Solution:

Generally it's because mpi_test of mpi_wait is supplied a request thatis unknown to mpich (the request wasn't the one returned by mpich
Whenyou made the isend/irecv/send_init/recv_init) means that mpi_irecv does not match mpi_wait (req, status, ierr), and the handle has an error code .. If there are many mpi_wait () functions, you can use the annotation method to lock errors one by one... In addition, if it is a FORTRAN program, first check the status variable definition: integer
Req, status (mpi_status_size), ierr


4,

Aborting job: Fatal ErrorInMpi_init: Other MPI error, error Stack: mpir_init_thread (195): initialization failed mpid_init (170): failure during portals initialization failed (321): progress_init failed (653 ):
Out of memory



There is not enough memory on the nodes for the program plus MPI buffers to fit.



You can decrease the amount of memory that MPI is using for buffers by using mpich_unex_buffer_size environment variable.

Thank you for your comments and comments!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.