Preface
How is the GPU implemented in parallel? What is the difference between the way it is implemented and the multithreading of the CPU?
This article will do a more detailed analysis.
GPU Parallel Computing Architecture
The core of GPU parallel programming is the thread , a thread is a single instruction flow in the program, the combination of threads together constitute a parallel computing grid, a parallel program, showing the multi-core CPU and GPU computing grid:
The difference between the two will be explored later.
Demonstrates a more granular parallel computing architecture for GPUs:
The graph shows that the computational grid is composed of multiple stream processors, each of which contains more than n blocks.
The following is a detailed analysis of some of the concepts in the GPU computing grid.
1. Threads
A thread is the smallest execution unit in a GPU operation, and the thread is able to perform a minimal logical meaning operation.
2. Thread Harness
The line Cheng is the Basic execution unit in the GPU. The GPU is a set of SIMD processors, so threads in each thread bundle are executed concurrently. This concept was introduced to conceal the delay in reading and writing memory.
at present, Nvidia's graphics card, this value is 32, can not be changed, and should not be changed.
3. Thread Blocks
A thread block contains multiple lines of Cheng, and all threads within a thread block can use shared memory for communication and synchronization. But a thread block can have the maximum thread/line Cheng, which is related to the graphics card model.
4. Stream Multi-processor
The stream multiprocessor is equivalent to the core in the CPU and is responsible for the execution of the line Cheng. Only one thread bundle can be executed at a time.
5. Stream Processor
The stream processor is only responsible for executing threads, and the structure is relatively simple.
The
difference between GPU and CPU in parallel computing
1. Number of tasks
The CPU is suitable for a small number of tasks, while the GPU is suitable for a lot of tasks.
2. Complexity of tasks
The CPU is suitable for complex logic tasks, while the GPU is suitable for handling logically relatively simple tasks (less usable statement descriptions).
3. Threading Support Methods
Because the register group of threads in the CPU is common, the CPU will store the contents of the thread's register in RAM when the thread is switched, and the data will be recovered from RAM when the thread starts again.
Each of the threads in the GPU has its own set of registers, so it can switch a lot faster.
Of course, the CPU is stronger for a single thread-handling capability.
4. Processor Allocation principle
CPUs are generally based on the principle of time-slice rotation scheduling, where each thread executes a single time slice in a fixed manner, while the GPU strategy is to quickly swap in and out when threads are blocked.
5. Data throughput
Each stream processor in the GPU is equivalent to a CPU core, and a GPU typically has 16 stream processors, and each stream processor can compute 32 numbers at a time.
Summary
1. Understanding CUDA's threading model is the basis for parallel GPU programming.
2. It is very important to organize the threading structure according to the type of data to be processed, which is not easy, especially when data that needs to be shared is present.
Third: Computing architecture for GPU parallel programming