Introduction: CPU Parallel Computing and GPU parallel computing; Introduction to parallel computing gpu
Recently I took a course called "C ++ and parallel computing ". The multi-CPU (process) parallel principle is used, and the Implementation language is the MPI interface of C ++. I think that I used cuda c/C ++ for parallel computing last semester. I will summarize these two languages and share my understanding about parallel computing.
1. Basic principles of Parallel Computing
Parallel Computing generally has two dimensions: Instruction or Program, and Data ). In this way, we can summarize various parallel modes (S stands for Single, M stands for Multiple ).
In addition to SISD, other methods are considered parallel computing methods. Here we will focus on SPMD.
SPMD is the simplest parallel computing mode. SP means that programmers only need to write one copy of the Code. MD means that the Code should be processed separately for different data. Parallel processing requires that data processing be performed simultaneously. In layman's terms, a piece of code is copied multiple times, and each piece of code runs one copy of data separately, so as to achieve parallel processing. This raises the question: how is data stored?
Storage of 1.1 million data
Data storage can be divided into two categories: distributed storage and shared memory.
Distributed Storage means that different processes/commands process different data without mutual interference. This idea is used for multi-cpu mpi parallel computing interfaces.
Shared Memory requires that different processes/commands can modify the same piece of data at the same time. In this way, communication between processes becomes simple. The disadvantage is that it is easy to cause data read/write conflicts and should be treated with caution. This method is used for GPU-based cuda c/C ++ parallel computing.
2 MPI: multi-CPU Parallel Computing
In view of the rise of multiple CPUs in personal computers in recent years, using multiple CPUs to process the same task is the simplest method of parallel computing. The representative method is MPI. MPI stands for Message Passing Interface, which is a Message Passing Interface (or convention ). The features of MPI can be summarized as follows:
1. MPI belongs to the SPMD framework;
2. data is stored in a distributed manner;
3. The key to writing the MPI program is to grasp the transmission of messages between processes!
The simplest MPI "Hello, World !" The program is as follows: (stored as hello. c)
# Include <stdio. h> # include <mpi. h> // MPI library int main (int argc, char * argv []) {MPI_Init (& argc, & argv); // start MPI parallel computing printf ("Hello, world! \ N "); MPI_Finalize (); // end MPI parallel computing return (0 );}
MPI program compilation is different from general C/C ++ programs. For the c mpi program, the compilation command is mpicc; for the C ++ MPI program, the compilation command is mpigxx. For example, compile the above c mpi program:
$mpicc hello.c -o hello
Similarly, the Run Command mpirun or mpiexec is required to run the C/C ++ MPI program. The most common format is as follows:
$mpirun -np 3 hello
-Np is an optional parameter, indicating the number of processes started. The default value is 1. Three processes are started, and three rows of "Hello, World!" are printed on the screen! ".
3 cuda c/C ++: the most popular CPU + GPU parallel computing Language
The rise of GPU parallel computing has benefited from the arrival of the big data era, while traditional multi-CPU Parallel Computing is far from meeting the needs of big data. The biggest feature of a gpu is that it has over many computing cores, usually thousands of cores. Each core can simulate a CPU computing function, although the computing capability of a single GPU core is generally lower than that of the CPU.
CUDA, the full name of which is Compute Unified Device Architecture, is the Unified computing Architecture. It is the CPU + GPU hybrid programming framework proposed by NVIDIA, the most famous GPU manufacturer. Cuda c/C ++ has the following features:
1. It is also the SPMD framework;
2. It has the advantages of distributed storage and shared memory;
3. Grasping the GPU bandwidth is the key to making full use of GPU computing resources.
Generally, the computing speed of optimized cuda c/C ++ programs is several to dozens of times faster than that of traditional CPU programs. Because of this, in the hot field of deep learning, more and more researchers and engineers are using GPU and CUDA for parallel acceleration.
4 others
Parallel Computing, especially the use of GPU and CUDA for parallel acceleration, is an attractive technology. However, in my personal experience, writing a parallel computing program is much harder than writing a serial program. The difficulties are mainly reflected in the following points:
1. parallel programs require longer code, which increases the workload;
2. The execution progress of each process of the parallel program is unknown, which increases the difficulty of debugging.
3. You need to have a higher degree of control over the hardware architecture and memory.
However, considering the attractive prospects of parallel computing, these difficulties are worth overcoming. At least for a machine learning researcher, dozens of times faster will greatly reduce the time spent doing experiments. So why not?
Welcome to the discussion!
Refer:
1. SPMD: http://en.wikipedia.org/wiki/SPMD
2. MPI: https://www.sharcnet.ca/help/index.php/Getting_Started_with_MPI