First, there must be differences between programming and MPI separately. To change mpi_init () to mpi_init_thread (), you also need to determine whether the environment meets the requirements.
Second, the program cannot use the default OpenMP thread count, because torque cannot use the qsub script to set the environment variable of the computing node. The default number of threads in OpenMP is set by the omp_num_threads environment variable. For better applicability and portability, the number of threads is passed in with parameters and dynamic settings are made using omp_set_num_threads.
Test code:
#include <stdio.h>#include <omp.h>#include <mpi.h>#include <stdlib.h>int rank, size;int main (int argc, char* argv[]){int provided;MPI_Init_thread (&argc, &argv, MPI_THREAD_FUNNELED, &provided);if (MPI_THREAD_FUNNELED != provided){printf ("%d != required %d", MPI_THREAD_FUNNELED, provided);return 0;
}MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);char name[MPI_MAX_PROCESSOR_NAME];int length;MPI_Get_processor_name (name, &length);int omp_num_threads = 1;if (argc > 1){omp_num_threads = atoi (argv[1]);}printf ("%s \n", name);omp_set_num_threads (omp_num_threads);#pragma omp parallel{printf ("%d omp thread from %d mpi process\n", omp_get_thread_num(), rank);}MPI_Finalize();return 0;}
Second, you must note that the PPN (process per node) must be equal to the number of OpenMP threads for the maximum acceleration ratio.
Submit the script:
#!/bin/sh -f#PBS -N hybrid#PBS -l nodes=2:ppn=2#PBS -l walltime=00:05:00#PBS -l mem=1024mb#PBS -o moe.txt#PBS -e mee.txt#PBS -q defaultcd $PBS_O_WORKDIRMVA='/usr/mpi/intel/mvapich-1.2.0/bin/mpirun 'eval $MVA -n 2 ./hybrid 2
The basic framework is like this. on the local machine, the simplest thing is to set the omp_num_threads environment variable.