Sixth chapter Memory Hierarchy
A memory system is a hierarchical structure of storage devices with different capacities, costs, and access times.
1.CPU寄存器:容量小,成本高,访问快2.高速缓存存储器:CPU和主存之间的缓存区域3.主存:磁盘上大容量,成本低,慢速
Random access memory is divided into two categories: static, dynamic. SRAM is faster and more expensive than DRAM. SRAM as cache memory. DRAM as main memory.
SRAM stores each bit in a bistable storage unit. As long as there is electricity, it keeps his value.
DRAM stores each bit as a charge to a capacitor. The DRAM unit loses its charge within the 10-100ms, so it needs to be refreshed or otherwise guaranteed to be correct.
The DRAM representation is generally written as n*w,w is the number of bits of a unit, and N is the number of units. itself called the Unit a super unit. Then n is also expressed as r*c, which means that the address pin will not be too many when addressing it, such as n=16, which requires 4 pins. 4*4, 4 requires two pins. However, this two-dimensional array organization has a drawback: there is a double-step sending address, which increases the access time.
The enhanced DRAM has: FPM dram,edo dram,sdram,ddr SDRAM (DDR,DDR2,DDR3), Random Dram,vram.
The DRAM history used by the PC: Fpm,96-99edo,-02sdram and DDR,-10DDR3 before 95.
Nonvolatile memory, SRAM and DRAM are both volatile.
ROM is known as read-only memory for historical reasons, and many ROMs are writable. The classification of ROM is based on the number of times they can be reprogrammed (written) and the mechanism by which they are reprogrammed.
Rom includes: Prom,eprom,eeprom, Flash (based on EEPROM).
Programs stored in ROM devices are often referred to as firmware.
Bus things: Reading things-transferring data from main memory to the CPU, writing things-transferring data from the CPU to memory.
Disks are widely used storage devices for storing large amounts of data: platters, surfaces, spindles, rpms, tracks, sectors, cylinders.
For SRAM and DRAM,KMGT are usually 1024-based, but for disks, KMGT is based on 1000.
I/O devices, such as graphics cards, monitors, mice, keyboards and disks, are connected to the CPU and main memory via the I/O bus (PCI).
The system bus and the memory bus are related to the CPU bus, but the bus with PCI is independent of the underlying CPU.
The I/O bus is always slower than the system bus and memory bus, but it can accommodate a wide variety of third-party I/O devices. The host bus adapter connects one or more disks to the I/O bus, most commonly SCSI and SATA, which is more expensive and faster.
The CPU uses a technique called memory-mapped I/O, where a block of addresses in the address space is reserved for communication with I/O devices in a system that uses this technology. Each such address is called an I/O port. When a device is connected to the bus, it is associated with one or more ports.
Steps that occur when the CPU reads data from disk: The CPU initiates a disk read by writing the command, logical block number, and destination memory address to the memory map address associated with the disk, the disk controller reads the sector, and performs a DMA transfer to main memory, and when the DMA transfer is complete, the disk controller notifies the CPU in an interrupted manner.
Modern disks present their constructs as a simple view, a sequence of logical blocks with a sector size of B. The disk controller maintains a mapping between the logical block number and the actual (physical) disk sector. As seen here, the logical block corresponds to the sector.
SSD package consisting of one or more flash chips and a flash translation layer.
A basic fact of memory and disk technology: increasing density is easier than reducing access time.
DRAM and disk performance lag behind the performance of the CPU. Modern computers frequently use SRAM-based caches to try to compensate for the gap between processor-memory. This approach is possible because of the locality of the application.
Locality of
A well-written computer program tasted good locality. That is, they tend to refer to data items that are adjacent to other recently referenced data items, or to the data items themselves that have recently been referenced.
This tendency, known as the principle of locality, is a persistent concept that has a great impact on the design and performance of hardware and software systems.
Locality usually has two different forms: temporal locality and spatial locality.
The difference between the two is that the time pair is a memory location, the space pair is the nearby memory location.
The memory locations that have been referenced once are most likely to be referenced more than once in the near future-time. A memory location is referenced once, so the program is likely to refer to a nearby memory location-space in the near future.
For spatial locality, the reference pattern of step 1 is called sequential reference mode, which has the best spatial locality, such as array, and the sequential access of one is the best spatial locality.
multidimensional arrays, which have the best spatial locality in order of row precedence.
The simple principle of quantitative evaluation of locality in a program:
- A program that repeatedly references the same variable has good time locality.
- For a program with a reference pattern with a step size of K, the smaller the step size, the better the spatial locality. A program with a reference pattern with a step size of 1 has a good spatial locality.
- The loop has a good time and spatial locality for taking orders. The smaller the loop body, the more the loop iteration number, the better the locality
Memory modules
The DRAM chip is packaged in a memory module and is plugged into the expansion slot on the motherboard. A common 168-pin dual-inline memory Module , which transmits data to the storage controller and outgoing data from the storage controller with 64 bits, and includes a 72-pin single inline memory module that transmits data in 32-bit blocks.
Non-volatile memory
If power is lost, DRAM and SRAM will lose their information, so they are volatile.
Nonvolatile memory, even after power off, still retains their information. They are collectively referred to as read-only memory (ROM).
ROM is distinguished by the number of times they can be reprogrammed and the mechanisms used to reprogram them.
Disk storage
A disk is a storage device that holds large amounts of data.
1. Disk Construction
Disks are made of platters. Each platter has two discs, and the surface is covered with magnetic recording material. The center of the platter has a rotating spindle that rotates the disc at a fixed rotational rate.
Each surface consists of a set of concentric circles called tracks. Each track is divided into a set of sectors, each of which contains an equal number of bits of data encoded in the magnetic material on the sector, which is separated by some gaps, and the gap store is used to identify the format of the sector and does not store data bits.
2. Disk capacity
3. Disk operation
A read/write header is used to read and write bits stored on the magnetic surface, and the read-write head is connected to one end of the transmission arm,
Move the transfer arm forward and backward along the radius axis. The drive can position the read/write head on any track on the disc. Such a mechanical movement is called seeking the path.
3. Logical disk block (important)
4. Connecting to I/O devices
Input/output (I/O) devices, such as graphics cards, monitors, mouse keyboards, and disks, are connected to the CPU and main memory via the I/O bus.
Although the I/O bus is slower than the system bus and memory bus, it can accommodate a wide variety of third-party I/O devices.
Limitations
A well-written computer program usually has good locality . They tend to refer to data items that are adjacent to other recently referenced data items, or to the data item itself that has recently been referenced. This tendency is called the principle of locality .
There are two different forms of locality: Temporal locality and Spatial locality.
Programs that have good locality are faster than programs that have poor local performance.
Memory Hierarchy
The memory hierarchy is the method of organizing the memory system, which is what people think, and now all computer systems use this method.
In general, the cache is a small and fast memory device that acts as a buffer area for data objects stored in larger, slower devices. The process of using cache is called caching.
The central idea of the memory hierarchy is that for each k, the faster and smaller storage devices located on the K-tier act as caches of larger and slower storage devices located on the k+1 layer.
This means that each layer of the hierarchy caches data objects from the lower layer.
It is important to emphasize that this means that the data can not be leapfrog?
The data is always copied back and forth between the K-level and the k+1 layer with the block size as the transmission unit. Example: the transmission of L1 and L0 is a block of 1 words, between L2 and L1 is a 8-16-word block.
Some concepts: Cache hits, cache misses, sacrifice blocks, replacement policies, cold caches, mandatory misses, cold misses, placement policies, conflict misses, capacity misses.
- The compiler manages the register file, caching the top level of the hierarchy.
- The caches of L1,L2 and L3 are entirely managed by the hardware logic built into the cache.
- DRAM main memory is cached as a block of data stored on disk, and is managed jointly by the address translation hardware on the operating system and CPU.
- A distributed file system such as AFS, a local disk as a cache, is managed by the AFS client process running on the local machine.
In general terms, the cache-based memory hierarchy works because slower storage devices are cheaper to store faster, and because programs tend to show locality.
Cache Memory
Cache Architecture (S,E,B,M): Cache groups, cache lines, block mappings
In general, the structure of the cache can be described in tuples (s,e,b,m). The size of the cache (or capacity) C refers to the and of the size of all blocks. Mark and valid digits are not included. So, C=s*e*b.
Direct Map Cache
The cache confirms that a request is hit and then extracts the requested word into three steps: Group selection, row matching, Word extraction
According to E (number of cache rows per group) The cache is divided into different classes. Only one row of cache per group is called direct mapping cache.
1. Group selection in direct map cache
2. Line matching and word selection in direct map cache
3. Line substitution in direct mapping cache when not hit
If the cache is not hit, the requested block needs to be fetched from the next layer in the storage hierarchy, and the new block is stored in a high-speed slowdown in the group indicated by the group index bit.
Substitution rule: Replaces the current row with the newly fetched row.
Group- linked cache
Each group is saved with more than one cache line
1. Group selection in group-linked cache
As with the direct map cache, group index bits identify groups.
2. Row matching and word selection in group-linked cache
More complex than direct-mapped caches
3. Row substitution when missing in group-linked cache
fully-connected cache
A fully-connected cache consists of a group (e=c/b) that contains all the high-speed slowdown.
Basic structure:
1. Group selection in the fully-connected cache
There's only one group, so it's simple.
2. Line matching and word selection in fully-connected cache
Full-phase cache is the same as group-linked cache
Write Cache-Friendly code
Relatively good local programs tend to have lower misses, while programs that do not hit less often run faster than those that do not.
Here are the basic methods we use to ensure the code cache is friendly:
Let the most common situation run fast. Programs usually spend most of their time on a small number of core functions, which usually spend most of their time in a small number of loops.
The number of internal cache misses per loop is minimal. (Repeated references to local variables are good, and the reference pattern with step 1 is good)
Data quoted in the "in-depth understanding" textbook PDF version
Issue: Row substitution on group-linked cache misses is not fully understood.
Information Security System Design Foundation Sixth Week study summary