Read opencl from Cuda

Source: Internet
Author: User

Just like a freshman C ++ or a sophomore compilation, I also wrote Cuda for a few months. Then, think about it, and I should start to explain it, I learned something at the lower layer of Cuda and may know more about heterogeneous programming.

1 Overview
Full name of opencl: Development Computing language, parallelProgramThe development standard, used in combination with any heterogeneous platform-including multi-CPU, GPU, CPU and GPU. Opencl is maintained by khronos group.
Opencl is an open industry standard for programming on heterogeneous platforms. This platform can include cpu gpu and other types of computing devices, such as DSP and cell/B. E.
The relationship between opencl and Cuda is very harmonious. The former is a standard for heterogeneous programming, and the latter is a more programmer-oriented gpuapi developed by NVIDIA Based on opencl. Therefore, opencl is suitable for graphics cards including nvidia and AMD.
Program development.

2. Understand the opencl framework

2.1 platform model

[1 Host]-[1 .. n devices] (host; device: device)
[1 device]-[1 .. n Cus] (unit: Cu)
[1 Cu]-[1 .. n PES] (Processing Unit: PU)

The host manages all the computing resources of the entire platform. The application sends computing commands from the host to the processing units of each opencl device. All processing units in a computing unit execute the same set of commands.
Process. The command stream can be in SIMD or SPMD mode. All applications written by opencl are started and ended from the host, and the final computation occurs in PE.

2.2 Memory Model

Memory introduction:
Global memory: All worker nodes in a workspace can read and write any element in the memory. Opencl C provides built-in functions for caching global Buer.
Constant memory: All worker nodes in a workspace can read any element in the memory. The host allocates and initializes the constant Buer, which remains unchanged during kernel execution.
Local Memory: from the memory of a working group, all worker nodes in the same working group can share the memory. Its implementation can allocate a dedicated memory space for opencl execution,
It may also be mapped directly to a global Buer.
Private memory: only from the current worker node. Private buckets in a worker node are completely invisible to other nodes.
At this point, it is basically the same as the memory introduced by Cuda. The local memory is similar to the private variable of Cuda.

Memory usage:
There are two ways to use memory: memory copy and memory ing.
Copying data means that the host writes data from the host to the memory of the opencl device or reads data from the memory of the opencl device to the host memory.
The memory ing method allows you to map the memory objects of opencl to the memory address space visible to the host through the corresponding openclapi. After ing, you can perform host-side ing.
The specified IP address reads and writes the memory. After reading and writing, you must use the corresponding API to unbind the ing. The same as the copy memory mode, the ing memory is also divided into block and non-block modes.

2.3 execution model

The opencl execution model can be divided into two parts: one is the host Program executed on the host, and the other is the kernel program (kernels) executed on the opencl device ), opencl uses the main program
Define the context and manage the execution of kernel programs on the opencl device.
The most important thing about the execution mode is to allocate the Thread Network, which is the same as Cuda and can be referenced.

2.4 Programming Model

Opencl supports data-based parallel programming models and task-based parallel programming models.
The Data Parallel model means that the same series of column commands will act on different elements of the memory object, that is, the unified operation is defined on different memory elements according to the command sequence.
In the task parallel programming model, each worker node in the workspace is absolutely independent from other nodes when executing the kernel program. In this mode, each worker node is equivalent to working on a single computing ticket.
Only one Working Group is in the unit, and only the node itself is in the Working Group. You can run tasks in parallel using the following methods:
-Use the vector data structure supported by the opencl Device
-Multiple kernels can be executed simultaneously or selectively.
-Execute some native kernels programs simultaneously

Opencl provides synchronization in two fields:
-Synchronization between all work nodes in the same working group
-Synchronization between different command queues in the same context and different commands in the same command queue

Understanding opencl from Cuda is a summary of the first and second chapters of opencl after reading and understanding the relevant Cuda programming knowledge, removed the similar knowledge described in the Cuda programming guide. After learning about and
After understanding what opencl is, I will start to use opencl Hello world. Although only one helloword may be meaningless, symbolic programs must be written.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.