#pragma unroll

Last Update:2018-07-26 Source: Internet

Author: User

Tags nvcc

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cuda in the given instance program appeared a lot of times #prama unroll usage, collected data collated as follows:

1. Description given in the Official document CUDA C Programming Guide v6.5:

By default, the compiler unrolls small loops with a known trip count. The #pragma unroll directive however can be used to control unrolling of any given loops. It must is placed immediately before the loop and only applies to that loop. It's optionally followed by a number, this specifies how many times, the loop must be unrolled. For example, in this code sample:

[CPP] view plain copy #pragma unroll 5 for (int i = 0; i < n; ++i)

The loop would be unrolled 5 times. The compiler would also insert code to ensure correctness (in the example above, to ensure that there would only be n Iterat Ions if n is less than 5, for example). It's up to the programmer-make sure, the specified unroll number gives the best performance.

#pragma unroll 1 would prevent the compiler from ever unrolling a loop. If no number is specified after #pragma unroll, the loop is completely unrolled if it trip count is constant, otherwise I T is no unrolled at all.

By default, the compiler expands on a small loop of known number of times, #pragma unroll can be used to control any given loop. However #pragma unroll must be placed in front of the controlled loop, followed by the expand Times option.

The compiler guarantees correctness when compiling, while performance is determined by the programmer.

followed by parameter 1, the compiler does not expand the loop. If there are no parameters, and the number of loops is a constant, the compiler will fully expand the loop, and if it is not a constant, it will not expand at all.

2. #pragma unroll usage

#pragma宏命令主要是改变编译器的编译行为, other parameters online more information, I just want to simply say #pragma unroll usage, because the online information is relatively small, and said more general, please see the following section of code [CPP] view Plain copy int main () {int a[100]; #pragma unroll 4 for (int i=0;i<100;i++) {a[i]=i; } return 0; }

The loop is the main manifestation of a program's run time, and by using the #pragma unroll command, the compiler encounters the command to expand the loop as it compiles, such as a loop that has fewer loops

[CPP] view plain copy for (int. i=0;i<4;i++) cout<< "Hello World" <<endl;

Can be expanded to: [CPP] view plain copy cout<< "Hello World" <<endl; cout<< "Hello World" <<endl; cout<< "Hello World" <<endl; cout<< "Hello World" <<endl;

This will make the program more efficient and, of course, most compilers are now automatically optimized for this, and by using the #pragma unroll command you can control how much the compiler will expand the loop. Or back to the very beginning of the program, his loop unfolded in the form of:

[CPP] view plain copy for (int i=0;i<100;i+=4) {a[i]=i; a[i+1]=i+1; a[i+2]=i+2; a[i+3]=i+3; }

3. Cuda's compilation

Cuda's compiler integrates various compilation tools for NVCC,NVCC, which implement different stages of compilation. The basic workflow for NVCC is to detach the device code from the host code and then compile it into a binary or Cubin project. During execution, the host code is ignored, and the device code is loaded and executed through the Cuda appliance API.

Cuda source code is based on C + + syntax in the compiler front-end. C + + is fully supported in the host code, but only C in C + + can be supported in device. Classes in C + +, inheritance, and syntax for defining variables in basic blocks are not allowed in kernel. The void type pointer in C + + cannot be assigned to a non-void pointer without a type conversion.

For more information on NVCC, see: http://download.csdn.net/source/1173428

__noinline__

The __device__ function is inline by default, and the __noinline__ qualifier can prompt the compiler not to inline the specified function. The compiler does not support pointer arguments and a number of parameters for functions using __noinline__

#pragma unroll

By default, the compiler will iterate a small number of times, #pragma unroll can specify how many times the loop is expanded (the programmer must ensure that the expansion is correct), for example

#pragma unroll 5

for ()

Pragma unroll must be processed immediately after the loop.

#pragmatic unroll 1 prohibits the compiler from expanding the loop.

If you do not specify the number of times, the loop will be fully expanded for a constant number of cycles, and the loop will not be expanded for an indeterminate number of loops.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More