Architecture course report

Source: Internet
Author: User
I. Cuda Concept
Cuda, full name: Compute uniied device architecture. The Chinese name is the unified computing device architecture. It is a revolutionary parallel computing architecture. It supports both hardware and software technologies, unifies the general computing programming mode of GPU, and introduces shared memory to increase the computing speed. Cuda is developed in C-like language instead of using graphics APIs. At the same time, Cuda adopts a unified processing architecture, reducing the programming difficulty, which makes the Cuda architecture more suitable for general GPU computing. GPU (Graphic Processing Unit) is short for graphic processing units. It was first used for graphic rendering. We know that graphics processing has high computing performance requirements. GPU is a high-performance parallel processor used for graphics processing. Due to its parallel computing performance, developers try to improve their hardware and software, so that the GPU's programmable capability is gradually improved, and GPU general-purpose computing came into being. Because the GPU has more powerful computing performance than the CPU, it provides a new choice for scientific computing applications.

It can be seen that the GPU has more processing units than the CPU.

Ii. Cuda Architecture
The figure below shows the overall structure of Cuda:
 
1. software layer
The Cuda software stack consists of the following layers:
A. hardware driver
B. Application Programming Interface (API) and its runtime

C. Two Advanced General math libraries (cufft and cublas)


Among them, a belongs to the driver layer API, B and c belong to the runtime layer API, And the runtime layer API is built on the driver layer API, which encapsulates and extends the driver layer API. Just like the relationship between pure C ++ and STL, pure C ++ is the underlying implementation of STL.

2. Hardware Layer


SP: streaming processor, the most basic processing unit at the underlying layer. All commands and tasks are processed on the SP. The essence of GPU parallel computing is the simultaneous processing of multiple identical feature sp;
SM: streaming multiprocessor. multiple SPs and other related resources form an SM. Other resources here refer to storage resources: Shared Memory and storage device;
Warp: The scheduling unit when the GPU executes the program. The current Cuda Warp Size is 32, and the same command is executed in different data resources in the same warp thread;

Grid, block, thread: When Cuda is used for programming, a grid is divided into multiple blocks, and a block is divided into multiple threads. For example:


One SM will only execute the warp in one block, and the warp in other blocks will be executed after the warp in the block is executed. When partitioning, it is best to ensure that the warp in each block is reasonable, so that a Sm can execute the warp in turn, like multi-body cross-memory can be parallel to a certain extent, this improves the computing efficiency. In addition, the number of blocks should be reasonably allocated based on the number of GPUs SM, so that the GPU Sm can be used to improve the device utilization. When allocating resources, you must also consider the resource allocation issue of the same thread block.

Iii. Cuda Programming
Cuda is designed for Data Parallel Computing (parallel execution of a program on many data elements) and high computing density (ratio of arithmetic operations to memory operations.
1. Basic Cuda program mode
* Allocate memory space and video memory space
* Initialize memory space
* The data to be computed is copied from the memory to the memory.
* Perform kernel computing
* Copy the stored data after computation to the memory.
* Process data copied to memory
Cuda supports thread-level parallelism and dynamically creates, schedules, and executes threads in hardware. In the CPU, these operations are heavyweight, but in cuda, these operations are lightweight because threads in the CPU are parallel largely implemented by software, cuda mainly relies on hardware to achieve fast speed and high efficiency. The Cuda programming model uses the CPU as the host, and the GPU as the coprocessor or device, and uses the CPU to control the overall serial logic and task scheduling of the program, this allows the GPU to run some parallel high-performance computing parts of data that can be highly threaded. The two are organically combined, GPU and CPU work together, more specifically, CPU controls GPU work, GPU has a huge inherent advantage in high-performance computing (the structure determines the nature, this can be seen from the Cuda hardware composition ). In general, the Cuda parallel program also includes the serial computing part and the parallel computing part. The parallel computing part is called the kernel. the kernel is just a parallel data code segment executed on the GPU. Ideally, serial code only cleans up the previous kernel function and starts the next kernel function. However, the current technology has encountered a bottleneck, and the serial computing part still occupies a relatively large proportion, thus affecting the promotion and use of Cuda to a certain extent.

Iv. Cuda prospects
GPU is gradually pushing parallel computing to the mainstream, and the "marriage" between parallel computing and heterogeneous processor systems will be the trend of the times, and Cuda is the main revolution. As more and more developers join cuda, software supporting Cuda will gradually penetrate into all aspects of our lives. Because GPU features processing-intensive data and parallel data computing, Cuda is very suitable for applications that require high-performance parallel computing for big data. Currently, in addition to C language development, Cuda also provides FORTRAN application interfaces. In the future, it is expected that Cuda will support multiple advanced programming languages, such as C ++, Java, and python. Currently, Cuda technologies and ideas are mainly used in graphic animation, games, geology, biology, physical simulation, and other fields. Due to the general characteristics of GPU and the simple and efficient development environment provided by cuda, we imagine the application fields of Cuda:
◇ Search engine, text classification, and other related algorithms
◇ Database and data mining
◇ Telecommunications, finance, and securities Data Analysis
◇ Mathematical statistical analysis
◇ Biomedicine engineering
◇ Image Speech Recognition
Computing in these fields belongs to large-scale data-intensive computing and requires a high-performance, high-parallel computing structure. Therefore, we have reason to believe that, driven by cuda, GPUs can build a new era of their own in these fields.


V. Cuda Summary
GPU general-purpose computing generally refers to computing intensive and highly parallel computing. Cuda is developed based on this general-purpose computing architecture. Cuda is a revolutionary architecture of parallel unified computing devices, supported by a series of hardware and software technologies. Cuda uses C language as the programming language to provide a large number of high-performance computing command development capabilities, enabling developers to build a more efficient intensive data computing solution based on the powerful computing capabilities of the GPU.
Nowadays, the concept of big data is very popular. In The Big Data era, we must not only use the idea of using data, but also use technical means to process data. The development of Cuda provides technical support for solving some big data problems with high computing density in parallel computing.

Architecture course report

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.