Cache Basics OR1200 A brief introduction to Icache

Source: Internet
Author: User

The following excerpt, "Step fright Core-the interior design and analysis of the soft-core processor," a book

12.1 CacheBasic knowledge12.1.1 CacheThe role

Processor designers often claim that the processor they are designing can multiply in a second, and each instruction consumes only xx clock cycles, but when we actually use the processor, we find that it is not. For example, in Chapter One, you can see from Figure 11.8 that when the program is running on a simple SOPC . The original design, which requires only one clock cycle in the run phase, L.movhi 6 clock cycles to complete the operation. The reason why the actual situation is inconsistent with the design is that the actual situation is a system composed of multiple modules and equipment. Readers should be aware of the "short board effect", the maximum amount of water a bucket can load depends on the length of the shortest plank that makes up the bucket. Same. When the remaining modules are very slow, the entire system is not fast, even if the processor is very fast. One of the important modules that affect the speed of the system is the memory, and the simple SOPC of the first chapter is that it takes more than one clock cycle from the memory to cause the processor to pause to wait for instructions to be taken, which makes L.movhi The instruction is run with multiple clock cycles.

The processor's operating frequency has been in rapid progress. However, the operating frequency of memory (including memory, hard disk, etc.) is growing slowly. In fact, it is also possible to store programs and data in the storage space inside the processor (like the smallest system established in Chapter 2 of this book, which stores programs and data in the qmem of OR1200 . So it can now read the instruction in a clock cycle, reading the stored data in two clock cycles. However, the storage space inside the processor is an inch of gold, in order to find a balance between price and performance, modern computers generally adopt multilevel storage level,12.1 see. Hard drives are the cheapest, so you can use large-capacity hard drives, but the speed is the slowest. The Cache is typically SRAMinside the processor. The highest cost and limited capacity to use. But very fast, generally can be in a clock cycle complete access to ask; Memory is located between the two, faster than the hard disk, but not as much as the cache, the cost is better than the cache, but not as hard disk.

stored in the cache is recently visited or may be about to access the data (the data contained in the instructions, data), theCPU needs data, first in the cache , assuming that the cache To find the data in the cache, go directly to the data in the caches and assume that the required data is not found in the cache , then go to the in-memory lookup. Find end. Store the found data and its nearby data in the cache so that you can find it in the cache the next time you look for the same content or nearby content.

The Cache improves processor performance by using the local principle of the program, which includes time locality, spatial locality, such as the following:

    • time locality: Assuming that a data is being interviewed, it is likely to be visited again in the near future, a typical example being a loop in which the loop body code is run repeatedly by the processor until the end of the loop. The first time you visit the Loop body code, it is read from memory, and the code is stored in the Cache at the same time. This allows the CPU to find the desired code in the Cache in a later loop, speeding up the process of taking the finger
    • Spatial locality: Suppose a data is interviewed. Then the data near it is very likely to be visited very quickly, a typical example is an array, and the elements in the array are often followed in sequence by the program .
12.1.2 CacheThe structure and working process

The Cache is managed according to the block. The memory is cut into the same size block. The data is transferred to the cachein blocks, and a folder table is included in thecache , which is called line, and line corresponds to a block in memory that contains the block address high part of the memory block. Known as identity. Line also contains the contents of the memory block, as well as whether the valid flag bit V. What 12.2 see. If the memory block size is a byte, the folder table has a line , this is also known as a 8KB Cache.

The description of the role of the Cache from the 12.1.1 section is described. The main operations that the Cache involves are finding and storing. The lookup operation is to give an address to infer whether the corresponding data is in the Cache . The store operation is to put the data read from memory into the Cache , which introduces two issues:

(1) when the data is transferred into the Cache . Where is the folder table located?

(2) How does theCache find data through a folder table?

The different solutions of the above two problems form different cache mapping methods, there are three kinds of cache Mapping Methods: Full association mapping, direct Mapping, Group Association mapping. This is the same as the MMU mapping method, only the direct mapping method is implemented in the OR1200 processor. So this book only describes the direct mapping method. Reader friends can check the books for the rest of the Cache mapping methods. Direct mapping means that each block in memory can only be placed in a single location in the Cache , which makes it easy to find. When using direct mapping. Cache Lookup Process 12.3 See, here or if the memory block is a byte , theCache folder table has a line .

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvbgvpc2hhbmd3zw4=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast ">

IntoCacheThe data address you want to find is usually a physical address (for example:OR1200InCacheInMMUAfter that, so fedCacheis the physical address) because the block size is -BytesCacheIn a +A Line, so use this physical address4-12The value of the bit is read out as an index Line。 Will LineHigh in the identity and physical address of the +The bit is relatively, assuming equal, and LineThe flag bit inVFor1。 TheCacheHitCache Hit), converselyCacheLoss of Target (Cache Miss)。

When the Cache is hit and read. Reads the corresponding data from the line's byte data to the processor based on the lower 4 bits of the physical address (i.e., the intra-block offset). When the Cache misses and reads, it is necessary to read the corresponding address from memory and the data in the same memory block, send the required data to the CPU, and at the same time write all the read out data to the Cache. The index of the folder table to be written is determined by the 4-12 bit of the address.

From the above process can be more profound understanding of direct mapping: the direct mapping method. Each block of memory can only be placed in a single location of the Cache .

Suppose the processor is going to run a write operation. and the destination address to write in the Cache can be found, called write hit. When the processor changes the data in the Cache , the corresponding data in memory changes, which involves the write strategy. Frequently Used (write through), writeback (writeback). Different write strategies determine how the Cache works differently. This is explained in the next chapter when parsing Dcache .

12.2 OR1200InCacheSimple Introduction

      OR1200 processor uses the Span style= "Font-family:times new Roman" >harvard structure, with separate directives cache (icache ), data (dcache ). Reference Figure 1.6 icache after immu , is located after dmmu . So I sent icache , dcache The addresses of the are physical addresses.

ICache has only read operations, but Dcache can be read and writable. Dcache involved in the writing operation of the situation, more complex, so I separate analysis. In this chapter just dissect ICache. Analyze Dcachein the next chapter.

ICache involves a macro definition such as the following. The ICache can be configured as 512B,4KB,8KB,16KB,32KB. The default is 8KB. This chapter uses the default configuration, which is no longer repeated in the analysis later.

At this point the memory block is a byte, using direct mapping. The ICache folder table has rows, so use the 4-12bit of the address as the lookup index for the ICache folder table.

or1200_defines.v//' define Or1200_no_ic           //whether there is icache. The default is to stare out, that is, have icache//' define or1200_ic_1w_512b//' define OR1200_IC_1W_4KB ' define OR1200_IC_1W_8KB         //configuration Icache size. The default is 8kb//' define or1200_ic_1w_16kb//' define OR1200_IC_1W_32KB ' ifdef or1200_ic_1w_32kb         //Assuming configuration Icache is 32KB. Then the memory block size is 32 bytes ' Define OR1200_ICLS5 ' else ' define OR1200_ICLS4        //The rest of the case, the memory block size is 16 bytes ' endif                                                  ' ifdef or1200_ic_1w_ 8KB          //Assuming configuration Icache is 8KB, some macro definitions such as the following ' define Or1200_icsize13//icache are 8KB. So the address width is ' define OR1200_ICINDX ' or1200_icsize-2    //One ' define OR1200_ICINDXH ' or1200_icsize-1  //' Define The    high 9 bits in the Or1200_ictagl ' or1200_icindxh+1//13//13-bit address are the index of the Icache folder table ' Defineor1200_ictag ' or1200_icsize-' or1200_ The width of the icls//identity, which contains the high 19 bits of the physical address, the valid flag bit v ' defineor1200_ictag_w20                     ' endif


Copyright notice: This article blog original articles, blogs, without consent, may not be reproduced.

Cache Basics OR1200 A brief introduction to Icache

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.