Exploration of Cuda C Programming

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Abstract: This article describes the basic methods for compiling windows console application, dynamic link library (DLL), and Cuda c dll in. net. 1. Write windows console application in Cuda C

Next we will learn Cuda C from a simple example.

Open Vs and create a cudawinapp project. The project name is vector and the solution name is cudademo. Click "OK", "Next", and select the empty project. Click finished ". Such a Cuda project is built.

Right-click the vector project and select "add", "new item", "code", and "Cuda ". Enter the name of the file to be added. For example, vector. cu. Click Add.

The following program adds two vectors in the vector. Cu file.

// Add the system library # include <stdio. h> # include <stdlib. h> // Add Cuda support # include <Cuda. h >__ global _ void vecadd (float * a, float * B, float * C) ;__ host _ void runvecadd (INT argc, char ** argv ); int main (INT argc, char ** argv) {runvecadd (argc, argv); cut_exit (argc, argv) ;__ host _ void runvecadd (INT argc, char ** argv) {// initialize the host memory data const unsigned int n = 8; // vector dimension const unsigned int memsize = sizeof (float) * N; // float * h_a = (float *) malloc (memsize); float * h_ B = (float *) malloc (memsize ); float * h_c = (float *) malloc (memsize); For (unsigned int I = 0; I <n; I ++) {h_a [I] = I; h_ B [I] = I;} // display space float * d_a, * d_ B, * d_c; // initialize devicecut_device_init (argc, argv ); cuda_safe_call (cudamalloc (void **) & d_a, memsize); cuda_safe_call (cudamalloc (void **) & d_ B, memsize )); compute (cudamalloc (void **) & d_c, memsize); compute (cudamemcpy (d_a, h_a, memsize, buffers); cuda_safe_call (cudamemcpy (d_ B, h_ B, memsize, cudamemcpyhosttodevice); vecadd <1, n, memsize> (d_a, d_ B, d_c); cut_check_error ("kernel execution failed"); cuda_safe_call (cudamemcpy (h_c, d_c, memsize, cudamemcpydevicetohost); For (unsigned int I = 0; I <n; I ++) {printf ("%. 0f ", h_c [I]);} Free (h_a); free (h_ B); free (h_c); cuda_safe_call (cudafree (d_a )); cuda_safe_call (cudafree (d_ B); cuda_safe_call (cudafree (d_c); }__ global _ void vecadd (float * a, float * B, float * C) {// allocate shared memoryextern _ shared _ float s_a []; extern _ shared _ float s_ B []; extern _ shared _ float s_c []; // copy from global memory to shared memoryconst unsigned int I = threadidx. x; s_a [I] = A [I]; s_ B [I] = B [I]; // calculate s_c [I] = s_a [I] + s_ B [I]; // copy to global memoryc [I] = s_c [I];}

Because it is not about Cuda programming, the programming model is beyond the scope I will introduce. You can read Cuda for GPU high performance computing to get the knowledge of Cuda programming model.

Compile the vector project. After executing this project, the following output is displayed in Figure 1:

Figure 1 execution result of a vector Project

2. Write the DLL module in Cuda C

In more cases, your software may only use Cuda to accelerate a program. In this case, we can use Cuda C to compile DLL to provide interfaces. Next we will compile Example 1 into a DLL.

Add a new Cuda project under the cudademo solution directory just now (you can also create a new solution ). The project name is vecadd_dynamic. Select DLL for application type and empty project for additional options.

Step 1: Add a header file with the same name as the project name to facilitate your maintenance. Here I add vecadd_dynamic.h to the project, and add the following code to the header file:

# Ifndef _ vecadd_dynamic_h _ # DEFINE _ vecadd_dynamic_h _ // Add _ declspec (dllexport) void vecadd (float * h_a, float * h_ B, float * h_c, int N); # endif Step 2: add the CPP file named vecadd_dynamic.cpp. Add the following code to this file: # include "vecadd_dynamic.h" # ifdef _ managed # pragma managed (push, off) # endifbool apientry dllmain (hmodule, DWORD ul_reason_for_call, lpvoid lpreserved) {return true;} # ifdef _ managed # pragma managed (POP) # endif

Step 3: add the def file. The function of this file is to ensure that the compilers of other vendors can call the functions in this DLL. This is critical because your program may use compilers from multiple manufacturers. The file name is vecadd_dynamic.def. Add:

EXPORTSVecAdd

Step 4: add the Cu file named vecadd_dynamic.cu. Note that it is best to add this file directly to the project directory. Do not add this file to the source file tab or other existing tabs. 2.

Figure 2 vecadd_dynamic project file organization

Add the following code to the Cu file to implement the function to be exported.

#include#include#include#if __DEVICE_EMULATION__bool InitCUDA(void){ return true;}#elsebool InitCUDA(void){int count = 0;int i = 0;cudaGetDeviceCount(&count);if(count == 0){fprintf(stderr, "There is no device./n");return false;}for(i = 0; i < count; i++){cudaDeviceProp prop;if(cudaGetDeviceProperties(&prop, i) == cudaSuccess){if(prop.major >= 1){ break; }}}if(i == count){fprintf(stderr, "There is no device supporting CUDA./n");return false;}cudaSetDevice(i);printf("CUDA initialized./n");return true;}#endif__global__ void D_VecAdd(float *g_A, float *g_B, float *g_C, int N){unsigned int i = threadIdx.x;if (i < N){ g_C[i] = g_A[i] + g_B[i]; }}void VecAdd(float* h_A, float* h_B, float* h_C, int N){if(!InitCUDA()){ return; }float *g_A, *g_B, *g_C;unsigned int size = N * sizeof(float);CUDA_SAFE_CALL(cudaMalloc((void**)&g_A, size));CUDA_SAFE_CALL(cudaMalloc((void**)&g_B, size));CUDA_SAFE_CALL(cudaMalloc((void**)&g_C, size));CUDA_SAFE_CALL(cudaMemcpy(g_A, h_A, size, cudaMemcpyHostToDevice));CUDA_SAFE_CALL(cudaMemcpy(g_B, h_B, size, cudaMemcpyHostToDevice));D_VecAdd<<<1,N>>>(g_A, g_B, g_C, N);CUDA_SAFE_CALL(cudaMemcpy(h_C, g_C, size, cudaMemcpyDeviceToHost));cudaFree(g_A);cudaFree(g_B);cudaFree(g_C);}

Step 5: If you have completed the above four steps correctly, only the compilation will be available. As long as you have used vs, I do not need to introduce this step. After the installation is successful, a vecadd_dynamic.dll file is displayed in the debug folder under your solution file directory.

3. Use the DLL compiled by Cuda C in. net

The following describes how to use vecadd_dynamic.dll in a hosted program.

Step 1: Add a C ++/CLR windows form application under the solution cudademo, and the project name is netdemo (you can also create a new solution, the project name is also random ).

Step 2: Add a button to the form with a random name. I will change the actual text to "Call cuda_dll" and add a click event to the button. Our code will add the program that calls vecadd () in this event. Add a text box to the form to display the call output result.

Step 3: implement the code. Add a header file for the project netdemo. I name it win32.h. This file mainly implements the import of the vecadd () function. Add the following code to this file:

#pragma oncenamespace Win32{using namespace System::Runtime::InteropServices;[DllImport("VecAdd_dynamic.dll",EntryPoint="VecAdd",CharSet=CharSet::Auto)]extern "C" void VecAdd(float* h_A, float* h_B, float* h_C, int N);}

In form1.h, # pragma Namespace after once Add the following code before netdemo.

#include "Win32.h"#include

Add the following code to button#click ():

int N = 8;float* h_A = (float*)malloc(N*sizeof(float));float* h_B = (float*)malloc(N*sizeof(float));float* h_C = (float*)malloc(N*sizeof(float));for (int i = 0; i < N; i++){h_A[i] = i;h_B[i] = i;}Win32::VecAdd(h_A, h_B, h_C,N);String ^reslut;for (int i = 0; i < N; i++){reslut += Convert::ToString(h_C[i]) + ", ";}this->textBox1->Text = Convert::ToString(reslut);free(h_A);free(h_B);free(h_C);

Step 4: Execute the netdemo project. Click "Call cuda_dll". The result shown in Figure 3 is displayed.

Figure 3 netdemo running result

So far, you have been able to use Cuda correctly.

Reference

[1] jeffrey Richter, lead e nasarre. Windows core programming (fifth edition) [M]. Beijing: Tsinghua University Press, 2008.

[2] Zhang Shu, Ruian Li. Cuda for GPU high-performance computation [M]. Beijing: China Water Conservancy and hydropower press, 2009.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Exploration of Cuda C Programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Exploration of Cuda C Programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support