Chap4. processor architecture command execution stage
- Fetch: Read the Instruction byte from the memory. The address is the value of PC. Extract the two four-digit bytes of the instruction indicator from the instruction, which are called icode and ifun ). It may take out a register Indicator byte, indicating one or two register operation indicators RA and Rb. It may also take out a four-byte constant number valc. It calculates the address valp of the next command in order, and valp is equal to the value of PC plus the length of the obtained command.
- Decode: read up to two operands from the register file to obtain the Vala and valb values.
- Execute: Alu either executes the operation specified by the instruction, calculates the valid address referenced by the memory, or changes the stack pointer.
- Memory: You can write data to or read data from the memory.
- Update PC (PC update): Set PC to the address of the next command.
Chap5. memory Blocker ):
1. Memory alias usage (memory aliasing): two pointers may point to the same memory location.
2. function call: When a function call modifies a global variable, it may impede Compiler optimization.
Code optimization:
1. Eliminate cycle inefficiency: Code moving;
2. Reduce process calls;
3. eliminate unnecessary memory references: use local variables (stored in registers) to save intermediate results.
Superscalar: multiple operations can be performed in an out-of-order manner in each clock cycle, which means that the order of command execution can be different from the order in the machine-level program.
4. Loop expansion: reduces the number of operations that do not directly contribute to program results, improves local concurrency, and reduces the length of key paths in the entire computation.
5. write code suitable for conditional transfer.
Amdahl's Law: S = 1/(1-A) + A/K ). S: acceleration ratio; A: Percentage of system time; K: Increase speed. Therefore, to greatly increase the speed of the entire system, it is necessary to increase the speed of a large part of the entire system.
Chap6. memory hierarchies
- Registers: 4 or 8 bytes are cached. The access time is 0 clock cycles, which are managed by the compiler;
- TLB (translation backup buffer): The page table for translation of the cache address. The access time is 0 clock cycles and is managed by MMU (Memory Management Unit;
- L1 high-speed cache: the cache contains 64 bytes, And the access time is one clock cycle, which is managed by hardware;
- L2 high-speed cache: the cache contains 64 bytes. The access time is 10 clock cycles and is managed by hardware;
- L3 high-speed cache: the cache contains 64 bytes, And the access time is 30 clock cycles, which are managed by hardware;
- Virtual Memory: 4-kb Memory Page cached. The access time is 100 clock cycles, which are managed by hardware + OS;
- Buffer cache: some files are cached. The access time is 100 clock cycles, which are managed by the OS;
- Disk cache: cache disk files. The access time is 100000 clock cycles and is managed by the Controller firmware;
- Network cache: cache network files. The access time is 10000000 clock cycles and is managed by the AFS/NFS client program.
It can be seen that the upper-layer memory caches part of the lower-layer memory. The higher the level, the faster the access speed, and the higher the unit cost, the smaller the corresponding capacity. The core concept of memory cache: local space locality: the memory location that has been referenced once. It is very likely that the memory location near it will be referenced in the near future; time locality: the memory location that has been referenced once may be referenced again in the near future. High-speed cache: (S, E, B, m) S: Number of High-speed cache groups E: Number of High-speed cache lines in the group. The more the number of rows, the higher the correlation degree, the better the time locality. The hit rate can be increased to reduce the possibility of jitter due to conflicting hits, but the hit time will be increased, and the corresponding penalty will be increased. B: block size (in bytes ). The larger the block, the better the space locality, the higher the hit rate, and the higher the penalty for not hitting (the higher the replacement cost). M: the number of physical addresses.
- Direct ing of High-speed cache: E = 1
- Group-connected high-speed cache: E> 1
- Full-link cache: S = 1
Memory Address: T + S + B = MT: Used as a row flag because multiple blocks in the memory are mapped to the same group. T = m-S-BS: group index number. 2 ^ s = SB: block index number. 2 ^ B = Why B chooses the intermediate bit as the Group Index: It can map adjacent blocks in the memory to different groups, in this way, high-speed cache can be fully utilized when a program with good spatial locality is running.