NVIDIA Cuda 3.0 Update

Last Update:2018-12-04 Source: Internet

Author: User

Tags nvcc

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

-Section 1.2:
-Updated figure

The illustration graph is added to better explain that Cuda is not just a language, but a platform and a platform. It can be used to build other language platforms or programming environments on Cuda. Cuda has its own ISA architecture and PTx code. Therefore, do not simply think of Cuda as a programming language. You can develop your own chips or hardware based on the Cuda architecture. Of course, this requires detailed Cuda data ·~ At least not now...

-Section 2.5:
-Mentioned the Fermi Architecture

It shows that Fermi is a 2. x architecture, which is a 1. X architecture before. Fermi is an improvement.

-Section 3.1:
-Heavily rewritten to clarify binary, PTX, application, C ++ compatibility
-_ Noinline _ behaves differently for compute capability 2.0 and higher

This article introduces the relationship between nvcc and binary, PTX and applications, as well as C ++. The Cuda kernel program can be written using Cuda commands. The Command similar to assembly is PTX, PTx can be found in its Manual for more details;

3.1.1 describes in detail the nvcc compilation process, how to compile the Cu file or Cuda program into the target file, how to submit the C/C ++ part to the C or C ++ compiler for compilation.

3.1.2 describe the binary file and the meaning of Code. For example, the 1.3 mark indicates that the binary file can only run on 1.3 hardware or later hardware.

3.1.3 it is explained that PTX commands can be executed normally, but some commands can only be executed on higher hardware devices;

3.1.4 describe the binary files and PTx code of different versions and the implementation of the Code on the hardware in the future. Of course, the PTX code format is recommended in the manual, so that the code can be automatically escaped during running, in this way, we can adapt to the update feature, because some of the current hardware may actually use more commands in hardware when compiling a PTX command, because the original PTX commands are not yet supported, when subsequent PTX commands can be executed one by one, they will change, so this section provides a description;

3.1.5

Some supported C ++ features are described. Not all c ++ support is supported. You can find them in the appendix below;

-Section 3.2:
-Clarified that a Cuda context is created under the hood when initializing
The runtime and therefore Cuda resources are only valid in the context
The host thread that initialized the runtime
-Updated graphics interoperability sections to new API

It indicates that every resource running in current Cuda is in the same context. This will also be said later, a thread controls a GPU to run;

-Section 3.2.1
-Mentioned 40-bit address space for devices of compute capability 2.0

2.0 of hardware devices have 40-bit addressing capability;

-Section 3.2.5.3
-Mentioned atomics to mapped page-locked memory

It indicates that there is an atomic operation in page-locked, which follows the host or is not a safe atomic operation on other devices;

-Section 3.2.6
-Added concurrent kernel execution and concurrent data transfer for devices
Of compute capability 2.0

Previously, only the kernel function can be executed once. Now Multiple kernel functions can be executed at a time;

-Section 3.3
-Updated graphics interoperability sections to new API

Some new functions
-New section 3.4 about interoperability between runtime and driver APIs
-Chapter 4 and 5 mostly rewritten with additional information
-Part of appendix A moved to new Appendices g with additional information
-Section B .1.4
-Mentioned that kernel parameters are passed via constant memory
Devices of compute capability 2.0
-Section B .6
-Added new functions _ syncthreads_count (), _ syncthreads_and (), and
_ Syncthreads_or ()
-Section B .10
-Mentioned atomics to mapped page-locked memory
-Section B .11
-Added new functions _ ballot ()
-New section B .12 on profiler counter Function
-New section B .14 on launch Bounds
-Section C.1.1
-Updated error for some functions
-Updated based fmad being fused for compute capability 2.0
-Section c.1.2
-Atomicadd works with single-precision floating-point numbers for devices
Of compute capability 2.0
-Updated error for some functions
-Section c.2.1
-Added new functions
-Section C.2.2
-Added new functions
-New section D.6 about classes with non virtual member functions for devices
Of compute capability 2.0
-New Appendix E for nvcc specifics (Moved _ noinline __, # pragma unroll to this
Appendix and added _ restrict)

Note:

The 3.0 Update looks forward to some new features, but the overall change is not great. The 3.0 guide is quite good, and it can be a great deal of ups and downs. There are a lot of detailed explanations in it, if you have time, you can take a look at that part.

PS: after reading the vs2010 advertisement, I can't help but sigh, who will be my next line of code ......

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More