Third: GPU Parallel programming Operation Architecture

Source: Internet
Author: User

Preface

How is the GPU implemented in parallel? What is the difference between the way it is implemented and the multithreading of the CPU?

This article will do a more detailed analysis.

GPU Parallel Computing Architecture

The core of GPU parallel programming is the thread , a thread is a single instruction flow in the program, the combination of threads together constitute a parallel computing grid, a parallel program, showing the multi-core CPU and GPU computing grid:

  

The difference between the two will be explored later.

Demonstrates a more granular parallel computing architecture for GPUs:

The graph shows that the computational grid is composed of multiple stream processors, each of which contains more than n blocks.

The following is a detailed analysis of some of the concepts in the GPU computing grid.

1. Threads

A thread is the smallest execution unit in a GPU operation, and the thread is able to perform a minimal logical meaning operation.

2. Thread Harness

The line Cheng is the Basic execution unit in the GPU. The GPU is a set of SIMD processors, so threads in each thread bundle are executed concurrently. This concept was introduced to conceal the delay in reading and writing memory.

at present, Nvidia's graphics card, this value is 32, can not be changed, and should not be changed.

3. Thread Blocks

A thread block contains multiple lines of Cheng, and all threads within a thread block can use shared memory for communication and synchronization. But a thread block can have the maximum thread/line Cheng, which is related to the graphics card model.

4. Stream Multi-processor

The stream multiprocessor is equivalent to the core in the CPU and is responsible for the execution of the line Cheng. Only one thread bundle can be executed at a time.

5. Stream Processor

The stream processor is only responsible for executing threads, and the structure is relatively simple.

The difference between GPU and CPU in parallel computing

1. Number of tasks

The CPU is suitable for a small number of tasks, while the GPU is suitable for a lot of tasks.

2. Complexity of tasks

The CPU is suitable for complex logic tasks, while the GPU is suitable for handling logically relatively simple tasks (less usable statement descriptions).

3. Threading Support Methods

Because the register group of threads in the CPU is common, the CPU will store the contents of the thread's register in RAM when the thread is switched, and the data will be recovered from RAM when the thread starts again.

Each of the threads in the GPU has its own set of registers, so it can switch a lot faster.

Of course, the CPU is stronger for a single thread-handling capability.

4. Processor Allocation principle

CPUs are generally based on the principle of time-slice rotation scheduling, where each thread executes a single time slice in a fixed manner, while the GPU strategy is to quickly swap in and out when threads are blocked.

5. Data throughput

Each stream processor in the GPU is equivalent to a CPU core, and a GPU typically has 16 stream processors, and each stream processor can compute 32 numbers at a time.

Summary

1. Understanding CUDA's threading model is the basis for parallel GPU programming.

2. It is very important to organize the threading structure according to the type of data to be processed, which is not easy, especially when data that needs to be shared is present.

Third: Computing architecture for GPU parallel programming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.