Intel optimization documentation translation by G-spider is inappropriate. please correct me.
Http://blog.csdn.net/G_Spider
Software prefetch scheduling distance
Distance between software prefetch Scheduling
Determining the ideal prefetch placement in the Code depends on behalf of turalparameters, including: the amount of memory to be prefetched, cache lookuplatency, system memory latency, and estimate of computation cycle. The ideal
Distance for prefetching data is processor-and platform-dependent. if the distance is too short, the prefetch will not hide the latency of the fetch behind computation. ifthe prefetch is too far ahead, prefetched data may be flushed out of the cache by the time it is required.
InCodeDetermining the desired prefetch location in depends on many structural parameters, including: prefetch storage capacity, cache query latency, system memory latency, and calculation cycle estimation. Ideal
The distance between prefetch data is related to the processor and platform. If the distance is too short, the extraction computing latency behind the prefetch cannot be concealed. If prefetch is too advanced, useful prefetch data may be cached.
Since prefetch distance is not a well-defined metric, for this discussion, we define a new term, prefetch scheduling distance (PSD), which is represented by the number of iterations. for large loops, prefetch scheduling distance can be set to 1 (that is, schedule prefetch instructions one iteration ahead ). for small loop bodies (that is, loop iterations with little computation), the prefetch scheduling distance must be more than one iteration.
Since prefetch distance is not a clear indicator, we define a new term, prefetch scheduling distance (PSD), which is reflected by the number of iterations. For a large loop, the scheduling prefetch distance can be set to 1 (that is, prefetch commands are attached before the first iteration ). For small cyclic bodies (that is, few cyclic Iterative Computing), the prefetch distance must be scheduled for more than one iteration.
A simplified equation to compute PSD is deduced from the mathematical model. for a simplified equation, complete mathematical model, and methodology of prefetch distance determination, see Appendix E, "Summary of rules and suggestions."
A Simplified Formula for PSD calculation can be derived from a mathematical model. For simplified equations, complete mathematical models and distance determination of prefetch methods, see Appendix E, "rule and suggestion summary ".
Example 7-3 using strates the use of a prefetch within the loop body. the prefetch scheduling distance is set to 3, ESI is too tively the pointer to a line, edX is the address of the data being referenced and XMM1-XMM4 are the data used in computation. example 7-4 uses two independent cache lines of data per iteration. the PSD wowould need to be increased/decreased if more/less than two cache lines are used per iteration.
Example 7-3 illustrates the use of a prefetch in a circular body. The prefetch scheduling distance (PSD) is set to 3, ESI is a valid data base, edX is the data reference address, and xmm1-xmm4 stores the data used in computing. Example 7-4 Use two independent data cache rows for each iteration. If you use more than/less than two cache rows for each iteration, PSD needs to increase/decrease.
Example 7-3. prefetch scheduling distance
Top_loop:
Prefetchnta [edX + ESI + 128*3]
Prefetchnta [edX x 4 + ESI + 128x3]
......
......
Movaps xmm1, [edX + esi]
Movaps xmm2, [edX * 4 + esi]
Movaps xmm3, [edX + ESI + 16]
Movaps xmm4, [edX * 4 + ESI + 16]
......
......
Add ESI, 128
Cmp esi, ECx
Jl top_loop