軟體預取調度的距離

來源:互聯網
上載者:User

Intel最佳化文檔部分翻譯   By  G-Spider 2010-12-14  不妥之處,歡迎指正。

http://blog.csdn.net/G_Spider

 

Software Prefetch Scheduling Distance
軟體預取調度的距離

Determining the ideal prefetch placement in the code depends on many architecturalparameters, including: the amount of memory to be prefetched, cache lookuplatency, system memory latency, and estimate of computation cycle. The ideal
distance for prefetching data is processor- and platform-dependent. If the distance is too short, the prefetch will not hide the latency of the fetch behind computation. Ifthe prefetch is too far ahead, prefetched data may be flushed out of the cache by the time it is required.

在代碼中確定理想的預取位置取決於許多結構性參數,其中包括:將預取的儲存量,緩衝尋找延遲,系統記憶體延遲,和運算周期的估計。理想
預取資料的距離是處理器和平台相關的。如果距離太短,預取將不能掩蓋背後的提取計算延遲。如果預取是過於超前,有用的預取資料可能被刷出緩衝。

Since prefetch distance is not a well-defined metric, for this discussion, we define a new term, prefetch scheduling distance (PSD), which is represented by the number of iterations. For large loops, prefetch scheduling distance can be set to 1 (that is, schedule prefetch instructions one iteration ahead). For small loop bodies (that is, loop iterations with little computation), the prefetch scheduling distance must be more than one iteration.

由於預取距離不是一個明確的指標,為了討論,我們定義一個新的術語,預取調度距離(PSD),它是由迭代的次數反映。對於大迴圈,調度預取距離可設定為1(即,預取指令附在第一次迭代前)。對於小的迴圈體(即有很少的迴圈迭代計算),預取距離必須調度不止一次迭代。

 

A simplified equation to compute PSD is deduced from the mathematical model. For a simplified equation, complete mathematical model, and methodology of prefetch distance determination, see Appendix E, “Summary of Rules and Suggestions.”

關於計算PSD的一個簡化公式可由數學模型推匯出。對於簡化方程,完整的數學模型和預取方法距離測定,見附錄E,“規則和建議摘要”。

 

Example 7-3 illustrates the use of a prefetch within the loop body. The prefetch scheduling distance is set to 3, ESI is effectively the pointer to a line, EDX is the address of the data being referenced and XMM1-XMM4 are the data used in computation. Example 7-4 uses two independent cache lines of data per iteration. The PSD would need to be increased/decreased if more/less than two cache lines are used per iteration.

例7-3說明了一個預取在迴圈體內的使用。預取調度距離(PSD)設定為3,ESI是有效資料基指,EDX是資料的參考地址,XMM1 - XMM4存放計算中使用的資料。樣本7-4每次迭代使用兩個獨立的資料快取行。如果每次迭代使用多於/小於兩個緩衝行,PSD需要增加/減少。

 

 

例 7-3. 預取調度距離
top_loop:
 prefetchnta [edx + esi + 128*3]
 prefetchnta [edx*4 + esi + 128*3]
 ......
 ......
 movaps xmm1, [edx + esi]
 movaps xmm2, [edx*4 + esi]
 movaps xmm3, [edx + esi + 16]
 movaps xmm4, [edx*4 + esi + 16]
 ......
 ......
 add esi, 128
 cmp esi, ecx
jl top_loop

 

 

 

 

 

 

 

 

 

 

 

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.