Detailed introduction to writing CUDA programs using Python

Source: Internet
Author: User
Tags numba
The following small series will bring you a method to write CUDA programs using Python. I think this is quite good. now I will share it with you and give you a reference. Let's take a look at the following small series to bring you a method to write CUDA programs using Python. I think this is quite good. now I will share it with you and give you a reference. Let's take a look at it with Xiaobian.

There are two ways to write a CUDA program using Python:

* Numba
* PyCUDA

Numbapro is no longer recommended. it is split and integrated into accelerate and Numba.

Example

Numba

Numba optimizes Python code through the JIT mechanism. Numba can optimize the hardware environment of the local machine and support optimization of CPU and GPU, and can be integrated with Numpy, to enable Python code to run on the GPU, you only need to add the relevant command mark above the function,

As follows:

import numpy as np from timeit import default_timer as timerfrom numba import vectorize@vectorize(["float32(float32, float32)"], target='cuda')def vectorAdd(a, b):  return a + bdef main():  N = 320000000  A = np.ones(N, dtype=np.float32 )  B = np.ones(N, dtype=np.float32 )  C = np.zeros(N, dtype=np.float32 )  start = timer()  C = vectorAdd(A, B)  vectorAdd_time = timer() - start  print("c[:5] = " + str(C[:5]))  print("c[-5:] = " + str(C[-5:]))  print("vectorAdd took %f seconds " % vectorAdd_time)if name == 'main':  main()

PyCUDA

The kernel function (kernel) of PyCUDA is actually written in C/C ++. it is dynamically compiled as a GPU microcode. the Python code interacts with the GPU code, as shown below:

import pycuda.autoinitimport pycuda.driver as drvimport numpy as npfrom timeit import default_timer as timerfrom pycuda.compiler import SourceModulemod = SourceModule("""global void func(float *a, float *b, size_t N){ const int i = blockIdx.x * blockDim.x + threadIdx.x; if (i >= N) {  return; } float temp_a = a[i]; float temp_b = b[i]; a[i] = (temp_a * 10 + 2 ) * ((temp_b + 2) * 10 - 5 ) * 5; // a[i] = a[i] + b[i];}""")func = mod.get_function("func")  def test(N):  # N = 1024 * 1024 * 90  # float: 4M = 1024 * 1024  print("N = %d" % N)  N = np.int32(N)  a = np.random.randn(N).astype(np.float32)  b = np.random.randn(N).astype(np.float32)    # copy a to aa  aa = np.empty_like(a)  aa[:] = a  # GPU run  nTheads = 256  nBlocks = int( ( N + nTheads - 1 ) / nTheads )  start = timer()  func(      drv.InOut(a), drv.In(b), N,      block=( nTheads, 1, 1 ), grid=( nBlocks, 1 ) )  run_time = timer() - start   print("gpu run time %f seconds " % run_time)    # cpu run  start = timer()  aa = (aa * 10 + 2 ) * ((b + 2) * 10 - 5 ) * 5  run_time = timer() - start   print("cpu run time %f seconds " % run_time)   # check result  r = a - aa  print( min(r), max(r) )def main(): for n in range(1, 10):  N = 1024 * 1024 * (n * 10)  print("------------%d---------------" % n)  test(N)if name == 'main':  main()

Comparison

Numba uses some commands to mark some functions for acceleration (you can also use Python to write kernel functions). This is similar to OpenACC, and PyCUDA needs to write the kernel by itself and compile it during runtime, the underlying layer is implemented based on C/C ++. Through tests, the acceleration ratio of the two methods is almost the same. However, numba is more like a black box and does not know what is actually done internally. PyCUDA is very intuitive. Therefore, these two methods have different applications:

* It would be better to directly use numba if you don't care about CUDA programming just to speed up your own algorithms.

* If you want to learn and study the feasibility of a CUDA programming or experiment with an algorithm under CUDA, use PyCUDA.

* If the program to be written will be transplanted to C/C ++ in the future, you must use PyCUDA, the kernel written using PyCUDA is written in cuda c/C ++.

The above is a detailed description of how to write a CUDA program using Python. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.