-Section 1.2:
-Updated figure
The illustration graph is added to better explain that Cuda is not just a language, but a platform and a platform. It can be used to build other language platforms or programming environments on Cuda. Cuda has its own ISA architecture and PTx code. Therefore, do not simply think of Cuda as a programming language. You can develop your own chips or hardware based on the Cuda architecture. Of course, this requires detailed Cuda data ·~ At least not now...
-Section 2.5:
-Mentioned the Fermi Architecture
It shows that Fermi is a 2. x architecture, which is a 1. X architecture before. Fermi is an improvement.
-Section 3.1:
-Heavily rewritten to clarify binary, PTX, application, C ++ compatibility
-_ Noinline _ behaves differently for compute capability 2.0 and higher
This article introduces the relationship between nvcc and binary, PTX and applications, as well as C ++. The Cuda kernel program can be written using Cuda commands. The Command similar to assembly is PTX, PTx can be found in its Manual for more details;
3.1.1 describes in detail the nvcc compilation process, how to compile the Cu file or Cuda program into the target file, how to submit the C/C ++ part to the C or C ++ compiler for compilation.
3.1.2 describe the binary file and the meaning of Code. For example, the 1.3 mark indicates that the binary file can only run on 1.3 hardware or later hardware.
3.1.3 it is explained that PTX commands can be executed normally, but some commands can only be executed on higher hardware devices;
3.1.4 describe the binary files and PTx code of different versions and the implementation of the Code on the hardware in the future. Of course, the PTX code format is recommended in the manual, so that the code can be automatically escaped during running, in this way, we can adapt to the update feature, because some of the current hardware may actually use more commands in hardware when compiling a PTX command, because the original PTX commands are not yet supported, when subsequent PTX commands can be executed one by one, they will change, so this section provides a description;
3.1.5
Some supported C ++ features are described. Not all c ++ support is supported. You can find them in the appendix below;
-Section 3.2:
-Clarified that a Cuda context is created under the hood when initializing
The runtime and therefore Cuda resources are only valid in the context
The host thread that initialized the runtime
-Updated graphics interoperability sections to new API
It indicates that every resource running in current Cuda is in the same context. This will also be said later, a thread controls a GPU to run;
-Section 3.2.1
-Mentioned 40-bit address space for devices of compute capability 2.0
2.0 of hardware devices have 40-bit addressing capability;
-Section 3.2.5.3
-Mentioned atomics to mapped page-locked memory
It indicates that there is an atomic operation in page-locked, which follows the host or is not a safe atomic operation on other devices;
-Section 3.2.6
-Added concurrent kernel execution and concurrent data transfer for devices
Of compute capability 2.0
Previously, only the kernel function can be executed once. Now Multiple kernel functions can be executed at a time;
-Section 3.3
-Updated graphics interoperability sections to new API
Some new functions
-New section 3.4 about interoperability between runtime and driver APIs
-Chapter 4 and 5 mostly rewritten with additional information
-Part of appendix A moved to new Appendices g with additional information
-Section B .1.4
-Mentioned that kernel parameters are passed via constant memory
Devices of compute capability 2.0
-Section B .6
-Added new functions _ syncthreads_count (), _ syncthreads_and (), and
_ Syncthreads_or ()
-Section B .10
-Mentioned atomics to mapped page-locked memory
-Section B .11
-Added new functions _ ballot ()
-New section B .12 on profiler counter Function
-New section B .14 on launch Bounds
-Section C.1.1
-Updated error for some functions
-Updated based fmad being fused for compute capability 2.0
-Section c.1.2
-Atomicadd works with single-precision floating-point numbers for devices
Of compute capability 2.0
-Updated error for some functions
-Section c.2.1
-Added new functions
-Section C.2.2
-Added new functions
-New section D.6 about classes with non virtual member functions for devices
Of compute capability 2.0
-New Appendix E for nvcc specifics (Moved _ noinline __, # pragma unroll to this
Appendix and added _ restrict)
Note:
The 3.0 Update looks forward to some new features, but the overall change is not great. The 3.0 guide is quite good, and it can be a great deal of ups and downs. There are a lot of detailed explanations in it, if you have time, you can take a look at that part.
PS: after reading the vs2010 advertisement, I can't help but sigh, who will be my next line of code ......