In the computer architecture, the simulator is an important and effective tool. GEM5 integrates the advantages of Gem and M5, which is simple and convenient to use. Including the SE mode and fs mode, SE mode is to run our compiled program in GEM5, can get data such as CACHE,CPU status, FS mode can simulate the operating system, we can load our own modified Linux kernel to run on the Buddies. The following describes the SE mode of using GEM5 to run the compiled arm executable file. Get the retrieved trace, and then we can simulate the cache with these traces to get the cache hit rate and other data information.
First, use GEM5 to run the compiled executable, with Hello as an example
build/arm/gem5.opt configs/example/se.py-c Hello
To get the trace to run, you need to add some running parameters
Build/arm/gem5.opt--outdir=memaccess--debug-flags=memoryaccess--debug-file=memoryaccess.out configs/example/ Se.py-c Hello
Command parameter parsing:
--outdir=memaccess indicates the folder in which the generated memory-access trace file is stored.
--debug-flags=memoryaccess indicates what trace to get, we specify here to get the trace of memoryaccess (note the case, otherwise GEM5 will not be prompted to recognize the flag)
--debug-file=memoryaccess.out indicates the name of the trace file. Here is Memoryaccess.out
Here is the trace of memoryaccess, we can also get the trace of MMC and DRAM
Memoryaccess.out:(This file is relatively large and open more slowly)
Divided into two parts:
The first part: 0---------------------------------0,tick not increase.
0 Tick:
What is shown here is "loading" the Hello binaries into GEM5 memory.
Part II: 0--500--1000---------------------------End
0-500-1000, with Interval=500cycles increment. This is the actual program-by-order execution of the process. It includes information and is easy to read and understand.
After obtaining the trace of the executable, we know the specific execution order of the program, and also know the memory contents of each step, including ifetche reading instruction, write data, read reading data, etc.
It also includes the physical address of the specific operation.
Here is the cache simulator written, we use the 4-way group connected, blocksize default size of 32Bytes, a total of 4 groups, the following is the specific code
#include <iostream> #include <cstdlib> #include <string.h> #include <string> #include < Math.h> #include <fstream>using namespace std;typedef struct cache{int index; int time; int dirty;} Cache_t;int sets;int a;int blocksize;string l;cache_t **i_cache,**d_cache;string fileName;int i_num,wd_num,d_num,rd_ Num,i_hit,wd_hit,rd_hit; int Stringtoint (string a) {int res = 0,now = 1; for (int i=a.size () -1;i>=0;i--) {if (a[i]>= ' a ') res + = (10+a[i]-' a ') *now; else res + = (a[i]-' 0 ') *now; Now *= 16; } return res; void Simu_i_cache (int sets1,int a1,int blocksize1,string l1,string fileName1) {sets = Sets1; a = A1; BlockSize = blockSize1; L = L1; FileName = fileName1; I_cache = new Cache_t*[sets]; D_cache = new Cache_t*[sets]; for (int i=0;i<sets;i++) {I_cache[i] = new Cache_t[a]; memset (i_cache[i],-1,a*sizeof (cache_t)); D_cache[i] = new Cache_t[a]; memset (d_cache[i],-1,a*sizeof (cache_t)); }}int findset (int addr) {int res = addr>> (int) (log (BlockSize)/log (2)); Res%= sets; return res;} int findindex (int addr) {int res = addr>> (int) (log (BlockSize)/log (2)); res = res>> (int) (log (sets)/log (2)); return res;} BOOL Visiticache (int addr) {int myset = Findset (addr); int myindex = findindex (addr); for (int i=0;i<a;i++) {if (I_cache[myset][i].index = = Myindex) {return true; }} int maxtime=-1,insertindex=-1,mintime = 1000000000;; for (int i=0;i<a;i++) {if (mintime>i_cache[myset][i].time) {mintime = I_cache[myset][i].time;insertindex = I } if (maxtime<i_cache[myset][i].time) maxtime = I_cache[myset][i].time; } I_cache[myset][insertindex].index = Myindex; I_cache[myset][insertindex].time = maxtime+1; return false;} BOOL Visitdcache (int addr) {int myset = Findset (addr); int myindex = findindex (addr); for (int i=0;i<a;i++) {if (D_cache[myset][i].index = = Myindex) {return true; }} int Maxtime=-1,insertindex=-1,mintime = 1000000000;; for (int i=0;i<a;i++) {if (mintime>d_cache[myset][i].time) {mintime = D_cache[myset][i].time;insertindex = I } if (maxtime<d_cache[myset][i].time) maxtime = D_cache[myset][i].time; } D_cache[myset][insertindex].index = Myindex; D_cache[myset][insertindex].time = maxtime+1; return false;} void ReadFile () {ifstream fin (filename.c_str ()); int type; string number; while (Fin >> type >> number) {int addr = stringtoint (number); Switch (type) {case 0:i_num++; if (Visiticache (addr)) i_hit++; Case 1:d_num++;rd_num++;if (Visitdcache (addr)) Rd_hit++;break; Case 2:d_num++;wd_num++;if (Visitdcache (addr)) wd_hit++; Break Default:cout << "error" << Endl; }} cout << "instruction hit ratio" << (double) i_hit/i_num << Endl; cout << "Write data hit ratio" << (double) wd_hit/wd_num << Endl; cout << "read data hit ratio" << (double) rD_hit/rd_num << Endl;} int main () {Simu_i_cache (4,8,32, "LRU", "output.txt"); ReadFile ();}
The trace color input file is output.txt. The trace that is simulated by the GEM5. But given the larger trace, the trace was preprocessed.
Output.txt instances
0 0xa941 0xa940 0xa980 0xa980 0xa9c0 0xa9c0 0xaa00 0xaa01 0x5edd40 0xaa40 0xaa40 0xaa80 0xaa80 0xaac0 0xaac2 0x5edd40 0xab XX 0xab4
where 0 is for code reading, 1 for data read, 2 for data write, and then 16 for physical address
Implementation of the cache simulator based on GEM5 analog trace