The method of using Python to write Cuda programs is described in detail

Source: Internet
Author: User
Tags numba
Here's a small piece to bring you a Python program using the method of writing Cuda. Small series feel very good, now share to everyone, also for everyone to make a reference. Let's take a look at it with a little knitting.

There are two ways to use Python to write Cuda programs:

* Numba
* Pycuda

Numbapro is deprecated now, features are split and integrated into accelerate and Numba, respectively.

Example

Numba

Numba optimizes Python code through a timely compilation mechanism (JIT), Numba can be optimized for native hardware environments, supports CPU and GPU optimizations, and integrates with NumPy to enable Python code to run on the GPU. Simply add the relevant instruction tag above the function,

As shown below:

Import NumPy as NP from Timeit import Default_timer as Timerfrom Numba import vectorize@vectorize (["Float32 (float32, float "], target= ' Cuda ') def vectoradd (A, B):  return a + bdef main ():  n = 320000000  a = Np.ones (n, DTYPE=NP.FLOAT3 2)  B = Np.ones (n, dtype=np.float32)  C = Np.zeros (n, dtype=np.float32)  start = timer ()  c = Vectoradd (A, B)  vectoradd_time = Timer ()-Start  print ("c[:5] =" + str (c[:5))  print ("c[-5:] =" + str (c[-5:]))  print ("Vectoradd took%f seconds"% vectoradd_time) if name = = ' main ':  Main ()

Pycuda

Pycuda's kernel functions (kernel) are written in C + +, and are dynamically compiled into GPU microcode, and Python code interacts with the GPU code as follows:

Import Pycuda.autoinitimport Pycuda.driver as Drvimport numpy as Npfrom Timeit import Default_timer as Timerfrom pycuda.co  Mpiler Import sourcemodulemod = Sourcemodule ("" "" Global void Func (float *a, float *b, size_t N) {const int i = blockidx.x * Blockdim.x + threadidx.x; if (i >= N) {return;} float temp_a = A[i]; float temp_b = b[i]; A[i] = (TEMP_A * + 2) * ((temp_b + 2) * 10-5) * 5; A[i] = A[i] + b[i];} "" ")  Func = mod.get_function ("func") def Test (n): # n = 1024x768 * 1024x768 * # float:4m = 1024x768 * 1024x768 print ("n =%d"% n) n =  Np.int32 (n) A = NP.RANDOM.RANDN (n). Astype (np.float32) b = Np.random.randn (n). Astype (np.float32) # Copy A to AA AA = Np.empty_like (a) aa[:] = a # GPU run ntheads = nblocks = Int ((N + nTheads-1)/ntheads) start = Timer () F UNC (DRV. InOut (a), DRV.  In (b), N, block= (ntheads, 1, 1), grid= (Nblocks, 1)) Run_time = Timer ()-Start print ("GPU Run time%f seconds "% run_time) # CPU Run start = timer () AA = (AA * 10 + 2) * ((b + 2) * 10-5) * 5 run_time = timer ()-Start print ("CPU run time%f seconds"% run_time) # Check Res Ult r = A-aa print (min (r), Max (R)) def Main (): For n in range (1, ten): n = 1024x768 * 1024x768 * (n *) print ("----------- -%d---------------"% n) test (n) If name = = ' Main ': Main ()

Contrast

Numba uses some instructions to flag some functions for acceleration (or you can write kernel functions using Python), which is similar to OPENACC, and Pycuda needs to write kernel itself, compile at run time, and the bottom layer is based on C/s + + implementations. By testing, the two approaches are almost as fast as they are. However, Numba more like a black box, do not know what the inside exactly do, and Pycuda is very intuitive. Therefore, these two approaches have different applications:

* If you just want to speed up your algorithm and don't care about CUDA programming, then it would be better to use Numba directly.

* If in order to learn, study CUDA programming or experiment with the feasibility of an algorithm in Cuda, then use Pycuda.

* If you write a program to be ported to C + + in the future, then you must use Pycuda, because the use of Pycuda write kernel itself is in Cuda C + + written.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.