7.3 hardware prefetching speculative execution in High Performance Computer ubunturesThis section describes hardware data prefetch. Data prefetch is divided into two ends: processor end and memory end. The processor end is mainly used for prefetch in L1 or L2 mode, the memory end is mainly implemented in the memory controller. There are also three prefetch algorithms: sequential or step-size prefetch (
Stride and sequential prefetching), Correlation prefetch (
Correlation prefetching), Content relevance prefetch (
Content-based prefetching)
Sequential or step-size prefetch (Stride and sequential prefetching)Prefetch rules: 1. If the requested line is in the cache, no operations will occur. 2. If line is not cached but found in the stream buffer header, line will be started into the cache and the head will be moved to the next one. 3. If none of them exist, a new stream buffer will be allocated. In addition, the row prefetch is put into the stream buffer, which may cause the same problem on the way. The repeated stream buffer causes non-unit step-size access.
Associate prefetch (Correlation prefetching)In order to solve the problem of sequential or step-size prefetch, relevance prefetch introduces an association table, which records the contact Miss after a miss. When Miss occurs again, all rows in the relational table are returned. Prefetch is divided into 2, 1. base, and 2. relicated. The steps shown in the figure show the two differences. Disadvantage: a large number of caches are required to store this relational table. It may take 1-2 m of SRAM to be very expensive.
Content relevance prefetch (Content-based prefetching)In this way, check the content to see if it is a pointer or address, and feel that it is about to be accessed. If it is, prefetch it. The heuristic rule is to divide the cacheline into 4-byte blocks. The block is divided into several segments. 1. Compare the Compare bits of the Miss address and other addresses. If the matched prefetch is data of the same base address. 2. If compare bits are all 0, check that if the filter bits is not 0, It is a similar address. If compare bits are all 1 filters bits are all 0, it is also considered as a similar address 3. align bits must be 00. Disadvantages: prefetch is sent based on the last Miss content, so it may become a situation where every Miss content occurs, the indirect matrix and the following standard form do not work
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.