20145216 Shi Yao "Information Security system Design Fundamentals" 7th Week Study Summary

Last Update:2016-10-30 Source: Internet

Author: User

Tags bit set

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

20145216 Shi Yao "Information Security system Design Fundamentals" Seventh week Study summary teaching Contents summary Sixth chapter memory hierarchy

A memory system is a hierarchical structure of storage devices with different capacities, costs, and access times.

CPU registers, cache memory, primary storage, disk.

The first section of storage technology

One, random access memory (RAM)

1.RAM classification:

Static sram-faster, more expensive, as cache memory, CPU on-chip or on-chip
Dynamic darm-as the frame buffer of main memory and graphics system

2. Non-volatile memory--rom

(1) Classification

prom-programmable ROM, can only be programmed once
eprom-erasable programmable ROM, the number of times that can be erased and written is approximately 1000 times
EEPROM, an electronic erasable prom, can be programmed in the order of magnitude of 10 of the five times.

Flash Flash (2)

Based on EEPROM, it provides fast and durable nonvolatile storage for a large number of electronic devices.

Stored in: Digital camera, mobile phone, music player, PDA, notebook, desktop, server computer system

(3) Firmware

Programs stored in ROM devices are often referred to as firmware, and when a computer system is powered on, he runs the firmware stored in the ROM.

3.RAM Power loss data, is volatile; ROM is nonvolatile, collectively referred to as read-only memory

Second, disk storage

1. Disk Construction

Surface: two surfaces per platter
Spindle: Center of disc, rotatable
Rotational rate: usually 5400~15000/min
Track: Concentric Circles
Sectors: Each track is divided into a set of sectors
Data bits: Each sector contains an equal number of ~, typically 512 bytes
Gap: Stores the format bits used to identify sectors
Disk drives-disks-Rotating disks
Cylinder: The set of tracks that are equal to the center of the spindle on all disc surfaces.

2. Disk capacity--the maximum number of digits that can be recorded on a single disk

(1) Influencing factors:

Recording density-bits per inch
Track density-road/inch
Surface density-bits per square inch (increases capacity by increasing surface density)

(2) Modern large-capacity disk--multi-zone record

Splits the collection of cylinders into disjoint subsets (record areas), each containing a contiguous set of cylinders;

Each track of each cylinder in a zone has the same number of sectors, and the number of sectors is determined by the number of sectors that the innermost track in the area can contain

Note: Floppy disk is still an old-fashioned method, the number of sectors per track is constant

(3) Calculation formula:

Disk capacity = (Bytes/sector) x (average number of extents/tracks) x (number of tracks/surface) x (number of surfaces/platters) x (number of discs/disk)

3. Disk operation

The disk reads and writes data as a block of sector size.

Access time by Category:

(1) Seek time

-The time it takes to move the drive arm.

Depends on the position of the read/write head and the speed at which the drive arm moves on the disc.

Usually 3-9ms, up to 20ms maximum.

(2) Rotation time

--The drive waits for the first bit of the target sector to rotate to the read/write header

Depends on the disc position and the rotational speed.

Maximum rotational delay =1/rpm X 60secs/1min (s)

The average rotation time is half the maximum value.

(3) Delivery time

dependent on rotational speed and number of sectors per track

Average transfer time = 1/rpm x 1/(average sectors/track) x 60s/1min

The average time to access a disk sector content is the sum of average seek time, average rotation delay, and average transfer time.

Based on the 393-page example of a textbook, you can conclude:

1. The primary time is the seek time and rotation delay.

2. X2 The Seek time is a simple and reasonable way to estimate disk access time.

3. Logical Disk Block

Disk, track, sector, this ternary group uniquely identifies the corresponding physical sector.

Analogy: Memory can be seen as a byte array, disk can be seen as a block array.

4. Connecting to I/O devices (I/O bus)

The I/O bus is connected to CPU, main memory and I/O devices.

5. Accessing the disk

DMA: Direct Memory Access – The device can perform its own read or write bus transactions without the need for CPU interference.

See the 395-page chart for the specific process.

Three, solid disk

SSD is a flash-based storage technology that "differs from rotating disks: The solid-state disk has no moving parts.

1. Composition

An SSD package consists of one or more flash chips and a flash translation layer:

闪存芯片——对应旋转磁盘中机械驱动器闪存翻译层（硬件/固件设备）——对应磁盘控制器

2. Read/write

(1) Sequential reading and writing

Speed is quite, sequential read is slightly faster than sequential write

(2) Random reading and writing

Write slower than read an order of magnitude

Reason: Underlying flash basic properties are determined.

A flash memory consists of a sequence of B blocks, each of which consists of P pages. The size of the page is usually 512~4kb, the block is made up of 32~128 pages, and the size of the block is 16kb~512kb.

Data is read and written in pages.

The second section of the locality

Principle of locality:

A well-written computer program often tends to refer to data items that are adjacent to other recently referenced data items, or to the data item itself that has recently been referenced.

Classification:

Time locality
Spatial locality

Application:

1. Hardware layer:

By introducing cache memory to save the most recently referenced directives and data items, the access speed to main memory is increased.

2. Operating system level:

The system uses main memory as the cache of the most recently referenced block in the virtual address space, using main memory to cache the most recently used disk blocks in the disk file system.

3. In the application:

The Web browser places the most recently referenced documents on the local disk.

The locality of reference to program data

1. Reference mode with step size K

Definition: In a continuous variable, every k element is accessed, which is referred to as the reference pattern of the step size K.

The 1-Step reference pattern: The sequential access to each element of a vector, sometimes called the sequential reference pattern, is a common and important source of spatial locality in the program.

In general, spatial locality decreases as the step size increases.

2. Multidimensional Arrays (example of a two-dimensional array)

int sumarraycols(int a[M][N]){int i,j,sum = 0;for(i=0;i<N;i++)    for(j=0;j<M;j++)        sum += a[i][j];return sum;}

And

int sumarraycols(int a[M][N]){int i,j,sum = 0;for(j=0;j<M;j++)    for(i=0;i<N;i++)        sum += a[i][j];return sum;}

The code above is executed in line precedence, and the following code is executed in column precedence, while the C array is stored in row order in memory, so the first spatial locality is good and the second space is poorly localized.

因为循环体会被执行多次，所以它也有很好的时间局部性。

Second, the locality of taking instruction

Program instructions are stored in memory, and the CPU must take out (read out) these instructions.

But one important attribute of code that differs from program data is that it cannot be modified at run time.

Iii. Summary of Local

The simple principle of quantitative evaluation of locality in a program:

A program that repeatedly references the same variable has good time locality
For programs with reference patterns with a step size of K, the smaller the step size, the better the spatial locality
The loop has a good time and spatial locality for taking orders. The smaller the loop body, the more the loop iteration number, the better the locality

Section III Memory hierarchy

Each tier of storage device is the next level of "cache"

First, the cache

Cache: is a small and fast storage device that acts as a buffer area for data objects stored in larger, slower devices.

Caching: The process of using a cache is called caching.

Data is always copied back and forth between the level K and the k+1 layer with the block size as the transmission unit. The block size is fixed between any pair of adjacent layers, but the other hierarchy pairs can have different block sizes.

Generally speaking: the lower the layer, the larger the block.

1. Cache Hits

When a program needs a data object D in Layer k+1, first look for D in a block that is currently stored in the K layer, and if D is just cached in the K layer, it is called a cache hit.

The program reads D directly from level K, faster than reading d from the k+1 layer.

2. Cache Misses

That is, there is no cached data object D in Layer K.

The K-tier cache then extracts the block containing d from the k+1 cache. If the level K cache is full, it is possible to overwrite an existing block

Overwrite--Replacement/expulsion

Replacement policy:

Random substitution strategy-randomly sacrificing a block
The least recently used substitution strategy lru-sacrifices the last accessed time distance now to the furthest block.

3. Types of Cache Misses

(1) Mandatory not hit/cold not hit

That is, the K-tier cache is empty (called a cold cache), and access to any data object is not hit.

It is usually a transient event that does not repeatedly access the memory to make the cache warm (understood to be repeated access to the memory, so that the memory is not empty?). ) appears in the stable state after.

(2) Conflict not hit

Because of a placement policy, placing a block limit on the k+1 layer in a small subset of the K-layer block causes the cache to be not full, but the corresponding block is full and will not be hit.

(3) Capacity not hit

When the size of the working set exceeds the size of the cache, the cache undergoes a capacity miss, which means that the cache is too small to handle the working set.

4. Cache Management

Some form of logic must manage the cache, while the logic for managing the cache can be either hardware, software, or a collection of both.

Section Fourth cache memory

L1 Cache:

Between the CPU register file and main memory, the access speed is 2-4 clock cycles

L2 Cache:

Between L1 cache and main memory, access speed of 10 clock cycles

L3 Cache:

Located between the L2 cache and main memory, with access speeds of 30 or 40 clock cycles

A general-purpose cache memory structure

A cache is an array of cache groups whose structure can be described using tuples (s,e,b,m):

S: There are s=2^s cache groups in this array
E: Each group contains an E cache line
B: Each row is made up of a b=2^b byte block of data
M: Each memory address has a M-bit, which forms m=2^m different address

In addition, there are markers and valid bits:

Valid bits: Each row has a valid bit that indicates whether the row contains meaningful information
Mark bit: t=m-(b+s), unique identifier of the block stored in this cache line
Group Index bit: s
Block Shift: B

The cache structure divides m addresses into T-markers, S-group index bits, and B-block offsets.

1. Cache Size/Capacity C

Refers to the size of all blocks and, not including the marker bit and the valid bit, so:

C=sEB

2. Working process

S,b divides the M address bits into three fields, and then:

Find out which group the word must be stored in by the S group index bit first
Then the T-marker bit tells us which line in this group contains the word (when and only if a valid bit is set and the tag bit of the row matches the marker phase in the address)
B-block biased shift gives the word offset in the B-byte data block
```
习题6.10：只需记住几个参数之间的数量关系即可
```

Second, direct mapping cache

The cache is divided into different classes according to E (the number of cache rows per group), E=1 is called direct mapping cache, as an example:

The cache determines whether a request is hit, and then the process of removing the requested word is divided into three steps:

1. Group selection 2. Row matching 3. Word extraction 1. Group selection

Cache Extract S group index bits from the middle of the address of W

组索引位：一个对应于一个组号的无符号整数。类比：高速缓存-关于组的一位数组，组索引位就是到这个数组的索引。

2. Row matching

Note that there are two sufficient prerequisites to determine the cache hit:

The row has a valid bit set
The tag in the cache line matches the tag in the W's address

3. Word Selection

同样的一个类比：块-关于字节的数组，字节偏移是到这个数组的一个索引。我的理解，还能类比为数组的下标，和有效地址的偏移量，等等。

4. Row substitution when cache misses

--Replace the current row with the newly fetched row

5. Direct-mapped cache in post-run

The tag and index bits are linked together to uniquely identify each block in the memory
Blocks mapped to the same cache group are uniquely identified by the tag bit

※ Note the CPU of the textbook 第413-414 page to perform a series of actions to read

1.先利用索引位，确定是针对哪个组2.然后看对应的组是否有效：（1）如果无效则缓存不命中，高速缓存从存储器或低一层中取出要找的块，存储在对应的组中，再把有效位置1，返回需要的值（2）如果有效，再根据标记找是否有匹配的标记： 如果有，则缓存命中，返回需要的值 如果没有，则替换行，然后返回

6. Conflict misses in direct map cache

(1) Jitter:

--cache repeatedly load and evict groups of the same cache block

(2) Reason:

These blocks are mapped to the same cache group.

(3) Workaround:

Place B-byte padding at the end of each array (b-byte is the length of a block, one row is a block, which is equivalent to separating rows) so that they map to different groups.

Why do you index with intermediate bits? See 415-page Exercise 6.12 and 416-page marginal notes. High, at any time, the cache stores only one block-sized array of content.

Three, group-linked cache

E-channel group-linked cache: 1<e<c/b

1. Group selection

The same as the direct one.

2. Line matching and word selection

The form is (key, value), matches with key as token and valid bit, and returns value after matching.

重要思想：组中的任意一行都可以包含任何映射到这个组的存储器块，所以告诉缓存必须搜索组中的每一行。

The criteria for judging a match are still two sufficient and necessary:

1. Effective
2. Tag Matching
3. Line substitution

A blank line replaces a blank row, there is no blank line, and the substitution policy is applied:

Random substitution
The most infrequently used policy LFU: Replace the row that has the fewest references in a window in the past.
Least recently used policy LRU: replaces the last line that was visited the longest time.

Iv. fully-connected cache (e=c/b)

1. Group selection

There is only one group, default group 0, no index bits, and addresses are divided into only one tag and one block offset.

2. Line matching and word selection

associated with the group.

It is only suitable for small caches.

Five, write

1. When a write hit, update the lower layer of the Copy method:

(1) write directly, the cache block of W immediately to the lower layer

Cons: Each write will cause bus traffic.

(2) write back, only if the replacement algorithm is to evict the updated block, it is written to the lower layer immediately below

Advantages: Conform to the principle of locality, significantly reduce bus traffic
Cons: Added complexity, you must maintain an additional modification bit for each cache line

2. How to handle write misses

(1) Write allocation---usually write back the corresponding

Load the blocks in the corresponding lower layer into the cache, and then update the cache block.

(2) Non-write assignment---usually write directly to the corresponding

Avoid the cache and write the word in the lower layer directly.

Six, the real cache hierarchy:

The cache saves both data and instructions.

Save instructions only: I-cache
Save only the program data: D-cache
Save the instruction and save the data: Unified cache

Vii. performance impact of cache parameters

1. Performance:

No hit = number of misses/number of references
Hit Ratio = 1-No hit
Hit time
No hit penalty: Because of the extra time required to miss the

2. Specific impact:

Cache Size: Hit ratio +, hit Time +
Block Size: Spatial locality +, hit ratio +, number of cache rows-, time locality-, no hit penalty +
Degree of coupling: E value big, jitter-, Price +, hit Time +, no hit penalty +, control logic + "compromise for not hit penalty low, low degree of coupling, not hit penalty high, use high degree of coupling"
Write policy: The farther down, the more likely it is to write back rather than write directly

Problems encountered in learning:

What is the difference between cache rows, groups, and blocks?

Workaround:

The following conclusions are drawn by summing up the knowledge points:

A block is a fixed-size packet that is passed back and forth between the cache and main memory (or the next layer of cache)
Rows are containers in the cache that store blocks and other information, such as valid bits and marker bits
A group is a collection of one or more rows. A group in a direct map cache consists of only one row, and groups in the group and fully-linked caches are composed of multiple rows
In the direct mapping cache, groups and rows are really equivalent, however, in a cascade cache, groups and rows are not the same, and these two words cannot be used interchangeably
Because a row always stores a block, the term "rows" and "blocks" are always used interchangeably

Code Hosting

Links: Https://git.oschina.net/sjy519/linux-program-C/tree/master

Other (sentiment, thinking, etc., optional)

Through this chapter of the study I have a more in-depth understanding of the mode of memory, but in reading the specific example of the textbook 412, found it difficult to understand, related exercises also did not understand, so the Internet to find relevant knowledge points to explain, only slowly understand the direct mapping cache, can only say that the knowledge points can not be fully digested.

Learning progress Bar /Cumulative) new/cumulative)

	lines of code (new	Blog volume (	Learning time (new/cumulative)	Important growth
Goal	3000 rows	30 Articles	300 hours
First week	0/0	1/2	25/40	Learn Linux basics and Core commands
Second week	0/0	0/2	0/40
Third week	300/300	3/5	40/80	Learn the vim, GCC, gdb instructions; Learn the information representation and processing
Week Five	200/500	1/6	45/125	Learn the machine-level representation of a program
Week Six	150/650	1/7	40/165	Learned the processor architecture
Seventh Week	100/750	1/8	40/205	Learning the Memory hierarchy

20145216 Shi Yao "Information Security system Design Fundamentals" 7th Week Study Summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More