Cuda from getting started to mastering

Source: Internet
Author: User
Tags prepare cuda toolkit nvcc

http://blog.csdn.net/augusdi/article/details/12833235



Cuda from entry to Mastery (0): written in front

At the request of the boss, this Bo master from 2012 on the High Performance Computing course began to contact Cuda programming, and then the technology applied to the actual project, so that the processing program accelerated more than 1K, visible based on the parallel computing graphics display for the pursuit of speed of application is undoubtedly an ideal choice. There are less than a year to graduate, even after graduation of these technologies will go with the graduation, prepare this summer to open up a cuda column, from the introduction to proficient, step-by-step, by the way to share some of the design experience, hope to learn cuda shoes to provide a certain guide. Personal ability, mistakes are unavoidable, welcome to the discussion.

PS: The application column seems to need to first send more than 15 original posts ... Forget it, just write enough to apply again, and then turn over.

Cuda from entry to Mastery (i): Environment construction

Nvidia launched the Cuda (Compute Unified Devices Architecture) in 2006 to use its GPU for general computing, extending parallel computing from a large cluster to a regular video card. This allows the user to run a larger-scale parallel handler with a notebook with GeForce graphics.

The advantage of using a video card is that it is very low and expensive compared to a large cluster, but the performance is outstanding. Take my Notebook For example, Geforce 610M, with the Devicequery program test, you can get the following hardware parameters:

Computational capacity up to 48x0.95 = 45.6 gflops. And the CPU parameters of the notebook are as follows:

CPU Computing Capacity (4 core): 2.5g*4 = 10GFLOPS, visible, graphics card computing performance is 4 cores i5 CPU 4~5 times, so we can make full use of this resource to some time-consuming applications to accelerate.

Well, 工欲善其事 its prerequisite, in order to use Cuda to program the GPU, we need to prepare the following necessary tools:

1. Hardware platform, is the graphics card, if you are not using NVIDIA graphics card, then can only say sorry, others do not support Cuda.

2. Operating system, I used Windows xp,windows 7 is no problem, this blog with Windows7.

3. C compiler, recommended VS2008, and this blog consistent.

4. Cuda compiler NVCC, you can free registration license from the official website download cuda toolkitcuda Download, the latest version of 5.0, this blog is the version.

5. Other tools (such as visual Assist, auxiliary code highlighting)

Ready to start installing the software. VS2008 Installation comparison time, it is recommended to install the full version (Nvidia website said Express version can also), the process does not have to elaborate. CUDA Toolkit 5.0 contains the necessary raw materials for NVCC compilers, design documents, design routines, CUDA run-time libraries, CUDA header files, and so on.

Installation completed, we found this icon on the desktop:

Yes, that's it, double click to run, you can see a lot of routines. We find simple OpenGL this run see effect:

Point to the right Yellow line mark run can see the wonderful three-dimensional sine surface, drag the left mouse button can be converted angle, right-drag can be scaled. If the operation succeeds, your environment is basically built successfully.

The possibility of a problem occurs:

1. You use a Remote Desktop connection to log on to another server that has a graphics support cuda on it, but your remote terminal cannot run the CUDA program. This is because the remote login is using your local graphics resources, remote logins can not see the server side of the video card, so it will be an error: There is no support Cuda graphics card. Solution: 1. The remote server installs two video cards, one for display only and the other for calculation; 2. Do not log on with a graphical interface, but log on using a command-line interface such as Telnet.

2. There are more than two graphics cards that support Cuda, and how to tell which video card to run on. This requires you to control in the program, select a certain condition of the graphics card, such as a higher clock frequency, large video memory, a higher version of the calculation. See the following blog for detailed operation. Well, let's talk about this a lot, and in the next section we'll show you how to program the GPU in VS2008.

Cuda from entry to Mastery (ii): first CUDA procedure

As the book goes on, we run the routine successfully, and then we know how to implement each link in the routine. Of course, we start with the simple, general programming language will find a helloworld example, and our graphics card is not talking, can only do some simple subtraction operations. So, the HelloWorld of the CUDA program, I think the most suitable is the vector plus.

Open VS2008, select File->new->project, and eject the following dialog box, set as follows:

Then click OK and go directly to the engineering interface.

In engineering, we see only one. cu file, which reads as follows:[CPP]  View plain copy #include   "cuda_runtime.h"    #include   "Device_launch_parameters.h"        #include  <stdio.h>      Cudaerror_t addwithcuda (int * c, const int *a, const int *b, size_t size);      __ Global__ void addkernel (int *c, const int *a, const int *b)     {       int i = threadIdx.x;        c[i] = a[i] + b[i];  }      int main ()    {        const int arraySize = 5;        const int a[arraySize] = { 1, 2, 3, 4, 5 };    &NBSP;&NBSP;&NBSP;&NBSP;CONST&NBSP;INT&NBSP;B[ARRAYSIZE]&NBSP;=&NBSP;{&NBSP;10,&NBSP;20,&NBSP;30,&NBSp;40, 50 };       int c[arraySize] = { 0 };           // Add vectors in parallel.       cudaerror_t cudastatus = addwithcuda (C, a, b, arraySize) ;       if  (cudastatus != cudasuccess)  {            fprintf (stderr,  "addwithcuda failed!");            return 1;        }          printf ("{1,2,3,4,5} + {10,20,30,40,50} =  {%d,%d,%d,%d,%d}\n ",           c[0], c[1], c[2], &NBSP;C[3],&NBSP;C[4]);          // cudaThreadExit must  Be called before exiting in order for profiling and       //  tracing tools such as nsight and visual profiler to show  complete traces.       cudastatus = cudathreadexit ();       if  (cudastatus != cudasuccess)  {            fprintf (stderr,  "cudathreadexit failed!");   

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.