20145216 Shi Yao "Information Security system Design Fundamentals" Seventh week Study summary teaching Contents summary Sixth chapter memory hierarchy
A memory system is a hierarchical structure of storage devices with different capacities, costs, and access times.
CPU registers, cache memory, primary storage, disk.
The first section of storage technology
One, random access memory (RAM)
1.RAM classification:
- Static sram-faster, more expensive, as cache memory, CPU on-chip or on-chip
- Dynamic darm-as the frame buffer of main memory and graphics system
2. Non-volatile memory--rom
(1) Classification
- prom-programmable ROM, can only be programmed once
- eprom-erasable programmable ROM, the number of times that can be erased and written is approximately 1000 times
- EEPROM, an electronic erasable prom, can be programmed in the order of magnitude of 10 of the five times.
Flash Flash (2)
Based on EEPROM, it provides fast and durable nonvolatile storage for a large number of electronic devices.
Stored in: Digital camera, mobile phone, music player, PDA, notebook, desktop, server computer system
(3) Firmware
Programs stored in ROM devices are often referred to as firmware, and when a computer system is powered on, he runs the firmware stored in the ROM.
3.RAM Power loss data, is volatile; ROM is nonvolatile, collectively referred to as read-only memory
Second, disk storage
1. Disk Construction
- Surface: two surfaces per platter
- Spindle: Center of disc, rotatable
- Rotational rate: usually 5400~15000/min
- Track: Concentric Circles
- Sectors: Each track is divided into a set of sectors
- Data bits: Each sector contains an equal number of ~, typically 512 bytes
- Gap: Stores the format bits used to identify sectors
- Disk drives-disks-Rotating disks
- Cylinder: The set of tracks that are equal to the center of the spindle on all disc surfaces.
2. Disk capacity--the maximum number of digits that can be recorded on a single disk
(1) Influencing factors:
- Recording density-bits per inch
- Track density-road/inch
- Surface density-bits per square inch (increases capacity by increasing surface density)
(2) Modern large-capacity disk--multi-zone record
Splits the collection of cylinders into disjoint subsets (record areas), each containing a contiguous set of cylinders;
Each track of each cylinder in a zone has the same number of sectors, and the number of sectors is determined by the number of sectors that the innermost track in the area can contain
Note: Floppy disk is still an old-fashioned method, the number of sectors per track is constant
(3) Calculation formula:
Disk capacity = (Bytes/sector) x (average number of extents/tracks) x (number of tracks/surface) x (number of surfaces/platters) x (number of discs/disk)
3. Disk operation
The disk reads and writes data as a block of sector size.
Access time by Category:
(1) Seek time
-The time it takes to move the drive arm.
Depends on the position of the read/write head and the speed at which the drive arm moves on the disc.
Usually 3-9ms, up to 20ms maximum.
(2) Rotation time
--The drive waits for the first bit of the target sector to rotate to the read/write header
Depends on the disc position and the rotational speed.
Maximum rotational delay =1/rpm X 60secs/1min (s)
The average rotation time is half the maximum value.
(3) Delivery time
dependent on rotational speed and number of sectors per track
Average transfer time = 1/rpm x 1/(average sectors/track) x 60s/1min
The average time to access a disk sector content is the sum of average seek time, average rotation delay, and average transfer time.
Based on the 393-page example of a textbook, you can conclude:
1. The primary time is the seek time and rotation delay.
2. X2 The Seek time is a simple and reasonable way to estimate disk access time.
3. Logical Disk Block
Disk, track, sector, this ternary group uniquely identifies the corresponding physical sector.
Analogy: Memory can be seen as a byte array, disk can be seen as a block array.
4. Connecting to I/O devices (I/O bus)
The I/O bus is connected to CPU, main memory and I/O devices.
5. Accessing the disk
DMA: Direct Memory Access – The device can perform its own read or write bus transactions without the need for CPU interference.
See the 395-page chart for the specific process.
Three, solid disk
SSD is a flash-based storage technology that "differs from rotating disks: The solid-state disk has no moving parts.
1. Composition
An SSD package consists of one or more flash chips and a flash translation layer:
闪存芯片——对应旋转磁盘中机械驱动器闪存翻译层(硬件/固件设备)——对应磁盘控制器
2. Read/write
(1) Sequential reading and writing
Speed is quite, sequential read is slightly faster than sequential write
(2) Random reading and writing
Write slower than read an order of magnitude
Reason: Underlying flash basic properties are determined.
A flash memory consists of a sequence of B blocks, each of which consists of P pages. The size of the page is usually 512~4kb, the block is made up of 32~128 pages, and the size of the block is 16kb~512kb.
Data is read and written in pages.
The second section of the locality
Principle of locality:
A well-written computer program often tends to refer to data items that are adjacent to other recently referenced data items, or to the data item itself that has recently been referenced.
Classification:
- Time locality
- Spatial locality
Application:
1. Hardware layer:
By introducing cache memory to save the most recently referenced directives and data items, the access speed to main memory is increased.
2. Operating system level:
The system uses main memory as the cache of the most recently referenced block in the virtual address space, using main memory to cache the most recently used disk blocks in the disk file system.
3. In the application:
The Web browser places the most recently referenced documents on the local disk.
The locality of reference to program data
1. Reference mode with step size K
Definition: In a continuous variable, every k element is accessed, which is referred to as the reference pattern of the step size K.
The 1-Step reference pattern: The sequential access to each element of a vector, sometimes called the sequential reference pattern, is a common and important source of spatial locality in the program.
In general, spatial locality decreases as the step size increases.
2. Multidimensional Arrays (example of a two-dimensional array)
int sumarraycols(int a[M][N]){int i,j,sum = 0;for(i=0;i<N;i++) for(j=0;j<M;j++) sum += a[i][j];return sum;}
And
int sumarraycols(int a[M][N]){int i,j,sum = 0;for(j=0;j<M;j++) for(i=0;i<N;i++) sum += a[i][j];return sum;}
The code above is executed in line precedence, and the following code is executed in column precedence, while the C array is stored in row order in memory, so the first spatial locality is good and the second space is poorly localized.
因为循环体会被执行多次,所以它也有很好的时间局部性。
Second, the locality of taking instruction
Program instructions are stored in memory, and the CPU must take out (read out) these instructions.
But one important attribute of code that differs from program data is that it cannot be modified at run time.
Iii. Summary of Local
The simple principle of quantitative evaluation of locality in a program:
- A program that repeatedly references the same variable has good time locality
- For programs with reference patterns with a step size of K, the smaller the step size, the better the spatial locality
- The loop has a good time and spatial locality for taking orders. The smaller the loop body, the more the loop iteration number, the better the locality
Section III Memory hierarchy
Each tier of storage device is the next level of "cache"
First, the cache
Cache: is a small and fast storage device that acts as a buffer area for data objects stored in larger, slower devices.
Caching: The process of using a cache is called caching.
Data is always copied back and forth between the level K and the k+1 layer with the block size as the transmission unit. The block size is fixed between any pair of adjacent layers, but the other hierarchy pairs can have different block sizes.
Generally speaking: the lower the layer, the larger the block.
1. Cache Hits
When a program needs a data object D in Layer k+1, first look for D in a block that is currently stored in the K layer, and if D is just cached in the K layer, it is called a cache hit.
The program reads D directly from level K, faster than reading d from the k+1 layer.
2. Cache Misses
That is, there is no cached data object D in Layer K.
The K-tier cache then extracts the block containing d from the k+1 cache. If the level K cache is full, it is possible to overwrite an existing block
Overwrite--Replacement/expulsion
Replacement policy:
- Random substitution strategy-randomly sacrificing a block
- The least recently used substitution strategy lru-sacrifices the last accessed time distance now to the furthest block.
3. Types of Cache Misses
(1) Mandatory not hit/cold not hit
That is, the K-tier cache is empty (called a cold cache), and access to any data object is not hit.
It is usually a transient event that does not repeatedly access the memory to make the cache warm (understood to be repeated access to the memory, so that the memory is not empty?). ) appears in the stable state after.
(2) Conflict not hit
Because of a placement policy, placing a block limit on the k+1 layer in a small subset of the K-layer block causes the cache to be not full, but the corresponding block is full and will not be hit.
(3) Capacity not hit
When the size of the working set exceeds the size of the cache, the cache undergoes a capacity miss, which means that the cache is too small to handle the working set.
4. Cache Management
Some form of logic must manage the cache, while the logic for managing the cache can be either hardware, software, or a collection of both.
Section Fourth cache memory
L1 Cache:
Between the CPU register file and main memory, the access speed is 2-4 clock cycles
L2 Cache:
Between L1 cache and main memory, access speed of 10 clock cycles
L3 Cache:
Located between the L2 cache and main memory, with access speeds of 30 or 40 clock cycles
A general-purpose cache memory structure
A cache is an array of cache groups whose structure can be described using tuples (s,e,b,m):
- S: There are s=2^s cache groups in this array
- E: Each group contains an E cache line
- B: Each row is made up of a b=2^b byte block of data
- M: Each memory address has a M-bit, which forms m=2^m different address
In addition, there are markers and valid bits:
- Valid bits: Each row has a valid bit that indicates whether the row contains meaningful information
- Mark bit: t=m-(b+s), unique identifier of the block stored in this cache line
- Group Index bit: s
- Block Shift: B
The cache structure divides m addresses into T-markers, S-group index bits, and B-block offsets.
1. Cache Size/Capacity C
Refers to the size of all blocks and, not including the marker bit and the valid bit, so:
C=sEB
2. Working process
S,b divides the M address bits into three fields, and then:
Second, direct mapping cache
The cache is divided into different classes according to E (the number of cache rows per group), E=1 is called direct mapping cache, as an example:
The cache determines whether a request is hit, and then the process of removing the requested word is divided into three steps:
1. Group selection 2. Row matching 3. Word extraction 1. Group selection
Cache Extract S group index bits from the middle of the address of W
组索引位:一个对应于一个组号的无符号整数。类比:高速缓存-关于组的一位数组,组索引位就是到这个数组的索引。
2. Row matching
Note that there are two sufficient prerequisites to determine the cache hit:
- The row has a valid bit set
- The tag in the cache line matches the tag in the W's address
3. Word Selection
同样的一个类比:块-关于字节的数组,字节偏移是到这个数组的一个索引。我的理解,还能类比为数组的下标,和有效地址的偏移量,等等。
4. Row substitution when cache misses
--Replace the current row with the newly fetched row
5. Direct-mapped cache in post-run
- The tag and index bits are linked together to uniquely identify each block in the memory
- Blocks mapped to the same cache group are uniquely identified by the tag bit
※ Note the CPU of the textbook 第413-414 page to perform a series of actions to read
1.先利用索引位,确定是针对哪个组2.然后看对应的组是否有效:(1)如果无效则缓存不命中,高速缓存从存储器或低一层中取出要找的块,存储在对应的组中,再把有效位置1,返回需要的值(2)如果有效,再根据标记找是否有匹配的标记: 如果有,则缓存命中,返回需要的值 如果没有,则替换行,然后返回
6. Conflict misses in direct map cache
(1) Jitter:
--cache repeatedly load and evict groups of the same cache block
(2) Reason:
These blocks are mapped to the same cache group.
(3) Workaround:
Place B-byte padding at the end of each array (b-byte is the length of a block, one row is a block, which is equivalent to separating rows) so that they map to different groups.
Why do you index with intermediate bits? See 415-page Exercise 6.12 and 416-page marginal notes. High, at any time, the cache stores only one block-sized array of content.
Three, group-linked cache
E-channel group-linked cache: 1<e<c/b
1. Group selection
The same as the direct one.
2. Line matching and word selection
The form is (key, value), matches with key as token and valid bit, and returns value after matching.
重要思想:组中的任意一行都可以包含任何映射到这个组的存储器块,所以告诉缓存必须搜索组中的每一行。
The criteria for judging a match are still two sufficient and necessary:
- 1. Effective
- 2. Tag Matching
- 3. Line substitution
A blank line replaces a blank row, there is no blank line, and the substitution policy is applied:
- Random substitution
- The most infrequently used policy LFU: Replace the row that has the fewest references in a window in the past.
- Least recently used policy LRU: replaces the last line that was visited the longest time.
Iv. fully-connected cache (e=c/b)
1. Group selection
There is only one group, default group 0, no index bits, and addresses are divided into only one tag and one block offset.
2. Line matching and word selection
associated with the group.
It is only suitable for small caches.
Five, write
1. When a write hit, update the lower layer of the Copy method:
(1) write directly, the cache block of W immediately to the lower layer
Cons: Each write will cause bus traffic.
(2) write back, only if the replacement algorithm is to evict the updated block, it is written to the lower layer immediately below
- Advantages: Conform to the principle of locality, significantly reduce bus traffic
- Cons: Added complexity, you must maintain an additional modification bit for each cache line
2. How to handle write misses
(1) Write allocation---usually write back the corresponding
Load the blocks in the corresponding lower layer into the cache, and then update the cache block.
(2) Non-write assignment---usually write directly to the corresponding
Avoid the cache and write the word in the lower layer directly.
Six, the real cache hierarchy:
The cache saves both data and instructions.
- Save instructions only: I-cache
- Save only the program data: D-cache
- Save the instruction and save the data: Unified cache
Vii. performance impact of cache parameters
1. Performance:
- No hit = number of misses/number of references
- Hit Ratio = 1-No hit
- Hit time
- No hit penalty: Because of the extra time required to miss the
2. Specific impact:
- Cache Size: Hit ratio +, hit Time +
- Block Size: Spatial locality +, hit ratio +, number of cache rows-, time locality-, no hit penalty +
- Degree of coupling: E value big, jitter-, Price +, hit Time +, no hit penalty +, control logic + "compromise for not hit penalty low, low degree of coupling, not hit penalty high, use high degree of coupling"
- Write policy: The farther down, the more likely it is to write back rather than write directly
Problems encountered in learning:
What is the difference between cache rows, groups, and blocks?
Workaround:
The following conclusions are drawn by summing up the knowledge points:
- A block is a fixed-size packet that is passed back and forth between the cache and main memory (or the next layer of cache)
- Rows are containers in the cache that store blocks and other information, such as valid bits and marker bits
- A group is a collection of one or more rows. A group in a direct map cache consists of only one row, and groups in the group and fully-linked caches are composed of multiple rows
- In the direct mapping cache, groups and rows are really equivalent, however, in a cascade cache, groups and rows are not the same, and these two words cannot be used interchangeably
- Because a row always stores a block, the term "rows" and "blocks" are always used interchangeably
Code Hosting
Links: Https://git.oschina.net/sjy519/linux-program-C/tree/master
Other (sentiment, thinking, etc., optional)
Through this chapter of the study I have a more in-depth understanding of the mode of memory, but in reading the specific example of the textbook 412, found it difficult to understand, related exercises also did not understand, so the Internet to find relevant knowledge points to explain, only slowly understand the direct mapping cache, can only say that the knowledge points can not be fully digested.
Learning progress Bar
|
lines of code (new | /Cumulative)
Blog volume ( | new/cumulative)
Learning time (new/cumulative) |
Important growth |
Goal |
3000 rows |
30 Articles |
300 hours |
|
First week |
0/0 |
1/2 |
25/40 |
Learn Linux basics and Core commands |
Second week |
0/0 |
0/2 |
0/40 |
|
Third week |
300/300 |
3/5 |
40/80 |
Learn the vim, GCC, gdb instructions; Learn the information representation and processing |
Week Five |
200/500 |
1/6 |
45/125 |
Learn the machine-level representation of a program |
Week Six |
150/650 |
1/7 |
40/165 |
Learned the processor architecture |
Seventh Week |
100/750 |
1/8 |
40/205 |
Learning the Memory hierarchy |
20145216 Shi Yao "Information Security system Design Fundamentals" 7th Week Study Summary