Intel最佳化文檔部分翻譯 By G-Spider 2010-12-14 不妥之處,歡迎指正。
http://blog.csdn.net/G_Spider
Software Prefetch Scheduling Distance
軟體預取調度的距離
Determining the ideal prefetch placement in the code depends on many architecturalparameters, including: the amount of memory to be prefetched, cache lookuplatency, system memory latency, and estimate of computation cycle. The ideal
distance for prefetching data is processor- and platform-dependent. If the distance is too short, the prefetch will not hide the latency of the fetch behind computation. Ifthe prefetch is too far ahead, prefetched data may be flushed out of the cache by the time it is required.
在代碼中確定理想的預取位置取決於許多結構性參數,其中包括:將預取的儲存量,緩衝尋找延遲,系統記憶體延遲,和運算周期的估計。理想
預取資料的距離是處理器和平台相關的。如果距離太短,預取將不能掩蓋背後的提取計算延遲。如果預取是過於超前,有用的預取資料可能被刷出緩衝。
Since prefetch distance is not a well-defined metric, for this discussion, we define a new term, prefetch scheduling distance (PSD), which is represented by the number of iterations. For large loops, prefetch scheduling distance can be set to 1 (that is, schedule prefetch instructions one iteration ahead). For small loop bodies (that is, loop iterations with little computation), the prefetch scheduling distance must be more than one iteration.
由於預取距離不是一個明確的指標,為了討論,我們定義一個新的術語,預取調度距離(PSD),它是由迭代的次數反映。對於大迴圈,調度預取距離可設定為1(即,預取指令附在第一次迭代前)。對於小的迴圈體(即有很少的迴圈迭代計算),預取距離必須調度不止一次迭代。
A simplified equation to compute PSD is deduced from the mathematical model. For a simplified equation, complete mathematical model, and methodology of prefetch distance determination, see Appendix E, “Summary of Rules and Suggestions.”
關於計算PSD的一個簡化公式可由數學模型推匯出。對於簡化方程,完整的數學模型和預取方法距離測定,見附錄E,“規則和建議摘要”。
Example 7-3 illustrates the use of a prefetch within the loop body. The prefetch scheduling distance is set to 3, ESI is effectively the pointer to a line, EDX is the address of the data being referenced and XMM1-XMM4 are the data used in computation. Example 7-4 uses two independent cache lines of data per iteration. The PSD would need to be increased/decreased if more/less than two cache lines are used per iteration.
例7-3說明了一個預取在迴圈體內的使用。預取調度距離(PSD)設定為3,ESI是有效資料基指,EDX是資料的參考地址,XMM1 - XMM4存放計算中使用的資料。樣本7-4每次迭代使用兩個獨立的資料快取行。如果每次迭代使用多於/小於兩個緩衝行,PSD需要增加/減少。
例 7-3. 預取調度距離
top_loop:
prefetchnta [edx + esi + 128*3]
prefetchnta [edx*4 + esi + 128*3]
......
......
movaps xmm1, [edx + esi]
movaps xmm2, [edx*4 + esi]
movaps xmm3, [edx + esi + 16]
movaps xmm4, [edx*4 + esi + 16]
......
......
add esi, 128
cmp esi, ecx
jl top_loop