The program calls the non-blocking communication functions mpi_isend (), mpi_irecv (), and receives the mpi_wait () operation.
The following error occurs when iterations are performed more than 5,000th times:
5280 -1.272734378291617E-004 1.271885446338949E-004 1.93516788631215 -0.246120726174522 9.005226840169125E-006 1.00000247207768 [cli_3]: aborting job:Fatal error in MPI_Isend: Internal MPI error!, error stack:MPI_Isend(145): MPI_Isend(buf=0x12e37e40, count=5000, MPI_DOUBLE_PRECISION, dest=4, tag=77, MPI_COMM_WORLD, request=0x1890221c) failed(unknown)(): Internal MPI error![cli_2]: aborting job:Fatal error in MPI_Isend: Internal MPI error!, error stack:MPI_Isend(145): MPI_Isend(buf=0x12dbdd20, count=5000, MPI_DOUBLE_PRECISION, dest=3, tag=77, MPI_COMM_WORLD, request=0x1890221c) failed(unknown)(): Internal MPI error![cli_5]: aborting job:Fatal error in MPI_Isend: Internal MPI error!, error stack:MPI_Isend(145): MPI_Isend(buf=0x12f2c080, count=5000, MPI_DOUBLE_PRECISION, dest=6, tag=77, MPI_COMM_WORLD, request=0x1890221c) failed(unknown)(): Internal MPI error![cli_6]: aborting job:Fatal error in MPI_Isend: Internal MPI error!, error stack:MPI_Isend(145): MPI_Isend(buf=0x12fa61a0, count=5000, MPI_DOUBLE_PRECISION, dest=7, tag=77, MPI_COMM_WORLD, request=0x1890221c) failed(unknown)(): Internal MPI error!rank 6 in job 2 v3901_33329 caused collective abort of all ranks exit status of rank 6: return code 13 rank 5 in job 2 v3901_33329 caused collective abort of all ranks exit status of rank 5: return code 13 rank 2 in job 2 v3901_33329 caused collective abort of all ranks exit status of rank 2: return code 13
I still don't know the reason. Sometimes the out of memory error may occur. Is it because the mpi_isend () function has exhausted the memory? No.