Cuda 4.0 programming practices

Source: Internet
Author: User
Tags nvcc

The Cuda 4.0 version is very different from the previous Cuda 2.3 version. At least, the Cubin format is changed to the ELF File and is no longer the decuda input file. I tested my graphics card with a GPU-Z, gt218 supports opencl, Cuda, directcompute4.1. All right, everything is installed, including vs2008.

Below is a simple cuda-type Hello world.

/*************************************************************************  [!output PROJECT_NAME].cu *  This is a example of the CUDA program. ************************************************************************/#include <stdio.h>#include <stdlib.h>#include <cuda_runtime.h>/************************************************************************//* Example                                                              *//************************************************************************/__global__ static void HelloCUDA(char* result, int num, clock_t* time){    int i = 0;      char p_HelloCUDA[] = "Hello CUDA!";    clock_t start = clock();    for(i = 0; i < num; i++) {        result[i] = p_HelloCUDA[i];    }      *time = clock() - start;}int main(int argc, char** argv){    char        *device_result  = 0;    clock_t     *time           = 0;    char        host_result[12] ={0};    clock_t     time_used       = 0;    int deviceCount;    int device;        cudaGetDeviceCount(&deviceCount);    for (device = 0; device < deviceCount; ++device)    {      cudaDeviceProp deviceProp;      cudaGetDeviceProperties(&deviceProp, device);      printf("Device %d has compute capability %d.%d .\n",                device, deviceProp.major, deviceProp.minor);    }    cudaMalloc((void**) &device_result, sizeof(char) * 11);    cudaMalloc((void**) &time, sizeof(clock_t));    HelloCUDA<<<1, 1, 0>>>(device_result, 11 , time);    cudaMemcpy(&host_result, device_result, sizeof(char) * 11, cudaMemcpyDeviceToHost);    cudaMemcpy(&time_used, time, sizeof(clock_t), cudaMemcpyDeviceToHost);    cudaFree(device_result);    cudaFree(time);    printf("%s,%d\n", host_result, time_used);    return 0;}

The command line is used for compilation and compilation. nvcc.exe -- help> nvcc.txt gets help documents for easy viewing. Corresponding to the above program, the batch processing file is as follows (save as make. Bat double-click to run ):

@echo off  set myFun=sample   call "%VS90COMNTOOLS%vsvars32.bat"set include=%CUDA_INC_PATH%;%include%set lib=%CUDA_LIB_PATH%;%lib%set path=%CUDA_BIN_PATH%;%path%echo ------------------===By GoldenSpider 2011-10-8===------------------ nvcc %myFun%.cu -c  -Xcompiler "/MD " -o "%myFun%.obj"  link /OUT:"%myFun%.exe" /SUBSYSTEM:console /nologo %myFun%.obj cudart.lib kernel32.lib msvcrt.libecho -------------------------------------------------------------------echo Good Job, Compiler Success!! Run EXE(Y/?)  pause  %myFun%.exe    pause  

Effect: (it seems that images cannot be uploaded. Copy the result under cmd)

Setting environment for using Microsoft Visual Studio 2008x86 tools.
------------------ === By goldenspider 2011-10-8 === ------------------
Sample. Cu
Tmpxft_00000cf0_00000000-3_sample.cudafe1.gpu
Tmpxft_00000cf0_00000000-8_sample.cudafe2.gpu
Sample. Cu
Tmpxft_00000cf0_00000000-3_sample.cudafe1.cpp
Tmpxft_00000cf0_00000000-14_sample.ii
-------------------------------------------------------------------
Good job, compiler success !! Run EXE (y /?)
Press any key to continue...
Device 0 has compute capability 1.2.
Hello Cuda !, 8876
Press any key to continue...

The above is the basic introduction. What should I do if I want to compile with vc6.0? How should I write with compilation? The idea is also very simple, that is, using Cuda driver API. the device code is compiled by nvcc to obtain PTX or Cubin. The host code is compiled by VC or assembler. PTx and Cubin are only used as data. This is actually the case. For more information, see vectoradddrv. You can:

call "%VS90COMNTOOLS%vsvars32.bat"set include=%CUDA_INC_PATH%;%include%set lib=%CUDA_LIB_PATH%;%lib%set path=%CUDA_BIN_PATH%;%path%nvcc -ptx  VecAdd.cu

Again:

@echo off call "E:\Microsoft Visual Studio\VC98\Bin\vcvars32.bat"set include=%CUDA_INC_PATH%;%include%set lib=%CUDA_LIB_PATH%;%lib%set myHost=maincl /c /MD %myHost%.cpplink  /SUBSYSTEM:console /nologo %myHost%.obj cuda.lib kernel32.lib msvcrt.lib%myHost%.exepause

Execution result:

cuDeviceGet returns: 0cuCtxCreate returns: 0cuModuleLoad returns: 0allocating d_a returns: 0copy data for a returns: 0getting the function handle returns: 0kernel launch returns: 0copy from device to host returns: 02.1000  ....

View its import and export database:

Section of the imported table:. RDATA
----------------------------------------------------------
Import to database: nvcuda. dll
----------------------------------------------------------
Originalfirstthunk limit 20fc
Timedatestamp 00000000
Forwarderchain 00000000
Firstthunks 00002044
----------------------------------------------------------
Import sequence number import function name
----------------------------------------------------------
00000084 cuinit
00000059 cudevicegetcount
00000057 cudeviceget
0000000d cuctxcreate_v2
000000e0 cumoduleload
0000008e cumemalloc_v2
201700c6 cumemcpyhtod_v2
000000db cumodulegetfunction
00000088 culaunchkernel
000000be cumemcpydtoh_v2

----------------------------------------------------------
Import to database: msvcrt. dll
----------------------------------------------------------
Originalfirstthunk limit 20b8
Timedatestamp 00000000
Forwarderchain 00000000
Firstthunks 00002000

The Runtime Library is no longer msvcr90.dll. The preceding vecadd. Cu code is as follows:

__global__ void VecAdd(const float* A, const float* B, float* C, int N){    int i = blockDim.x * blockIdx.x + threadIdx.x;    if (i < N)        C[i] = A[i] + B[i];}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.