#pragma unroll

最後更新：2018-07-26 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

CUDA在給出的執行個體程式中出現了不少次的 #prama unroll 的用法，搜集到資料整理如下：

1. 官方文檔 CUDA C PROGRAMMING GUIDE v6.5 中給出的說明：

By default, the compiler unrolls small loops with a known trip count. The #pragma unroll directive however can be used to control unrolling of any given loop. It must be placed immediately before the loop and only applies to that loop. It is optionally followed by a number that specifies how many times the loop must be unrolled. For example, in this code sample:

[cpp] view plain copy #pragma unroll 5 for (int i = 0; i < n; ++i)

the loop will be unrolled 5 times. The compiler will also insert code to ensure correctness (in the example above, to ensure that there will only be n iterations if n is less than 5, for example). It is up to the programmer to make sure that the specified unroll number gives the best performance.

#pragma unroll 1 will prevent the compiler from ever unrolling a loop. If no number is specified after #pragma unroll, the loop is completely unrolled if its trip count is constant, otherwise it is not unrolled at all.

在預設情形下，編譯器對已知次數的小迴圈進行展開，#pragma unroll 可以用來控制任意一個給定的迴圈。但 #pragma unroll必須放在被控制的迴圈的前面，後面可以帶展開次數選項。

編譯器編譯時間保證正確性，而效能則是由程式員決定。

後跟參數1則編譯器不會展開迴圈。如果沒有參數，並且迴圈次數是一常數時編譯器會將迴圈完全展開，如果不是常數就根本不會展開。

2. #pragma unroll 用法

#pragma宏命令主要是改變編譯器的編譯行為，其他的參數網上資料比較多，我只想簡單說下#pragma unroll的用法，因為網上的資料比較少，而且說的比較籠統，請看下面的一段代碼 [cpp] view plain copy int main() { int a[100]; #pragma unroll 4 for(int i=0;i<100;i++) { a[i]=i; } return 0; }

迴圈是一個程式已耗用時間的主要展現形式，通過使用#pragma unroll命令，編譯器在進行編譯時間，遇到該命令就會對迴圈進行展開，比如對一些迴圈次數比較少的迴圈

[cpp] view plain copy for(int i=0;i<4;i++) cout<<"hello world"<<endl;

可以展開為： [cpp] view plain copy cout<<"hello world"<<endl; cout<<"hello world"<<endl; cout<<"hello world"<<endl; cout<<"hello world"<<endl;

這樣程式的運行效率會更好，當然，現在大多數編譯器都會自動這樣最佳化，而通過使用#pragma unroll命令就可以控制編譯器的對迴圈的展開程度。還是回到最開始那個程式，他的迴圈展開形式為：

[cpp] view plain copy for(int i=0;i<100;i+=4） { a[i]=i; a[i+1]=i+1; a[i+2]=i+2; a[i+3]=i+3; }

3. CUDA的編譯

CUDA的編譯器為nvcc,nvcc將各種編譯工具整合起來，這些編譯工具實現了編譯的不同階段。nvcc的基本工作流程是將device代碼從host代碼中分離出來，然後將其編譯成二進位或者cubin工程。在執行過程中，將忽略host代碼，而將device代碼載入並通過CUDA的裝置API來執行。

CUDA原始碼在編譯器前端是基於c++文法的。host代碼中能夠全部支援C++，但是在device中只能支援c++中的C部分。在kernel中不允許有C++的類、繼承以及在基本塊中定義變數等文法。C++中的void類型指標不能在沒有類型轉化的前提下賦值給一個非void的指標。

nvcc的更多介紹請見：http://download.csdn.net/source/1173428

__noinline__

__device__函數在預設情況下是內聯的，通過__noinline__限定符能夠提示編譯器不要將指定的函數內聯。編譯器不支援指標參數和大量參數的函數使用__noinline__

#pragma unroll

編譯器預設情況下將迴圈展開小的次數，#pragma unroll 能夠指定迴圈以多少次展開（程式員必須保證按這個展開是正確的），例如

#pragma unroll 5

for()

pragma unroll 後必須緊接著處理的迴圈。

#pragmatic unroll 1 禁止編譯器將迴圈展開。

如果沒指定次數，對於常數次的迴圈，迴圈將完全展開，對於不確定次數的迴圈，迴圈將不會被展開。

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

#pragma unroll

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support