20145225 "Information Security system Design Fundamentals" 7th Week Study Summary

Last Update:2016-10-31 Source: Internet

Author: User

Tags bit set

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sixth chapter Memory Hierarchy

A memory system is a hierarchical structure of storage devices with different capacities, costs, and access times.

CPU registers, cache memory, primary storage, disk.

The first section of storage technology one, random access memory (RAM) RAM classification:

of Static sram-faster, more expensive, as cache memory, CPU on-chip or on-chip
of Dynamic darm-as a frame buffer for main memory and graphics system

1. Traditional dram (1) Hyper-unit

The cell bits in the chip are divided into D-units, each with a W DRAM unit, and a dxw dram that stores the DW bit information in total.
The cells are organized into a rectangle of row C of R, i.e. Rc=d.
Each element is tangible such as the address of (I, j), I represents the row, and J represents the column.

(2) Inflow and outflow of information

The information flows through the pins to the outflow chip, each pin carrying a 1-bit signal.

(3) Storage controller

This circuit can pass in or out W bit at a time.

ras-Line Access strobe pulse-line address I
cas-Column Access strobe pulse-column address J
RAS and CAS request to share the same DRAM address pins .

2. Memory Module

The DRAM chip is packaged in a memory module and is plugged into the expansion slot on the motherboard.

168-pin dual-inline memory Module -transmits or leaves data in blocks of 64 bits .
72-pin single inline memory module -transmits data in blocks of 32 bits .

Read the contents of a memory module:

By connecting multiple memory modules to the storage controller, which can aggregate main memory, when the controller receives an address A, the controller chooses the module k containing a, converts a to its (i, j) mode, and sends (I, j) to Module K.

Exercise 6.1 concludes that the layout is likely to be close to the square. Note that the organization form is DXW, and has a relationship rc=d.

3. Enhanced DRAM

Fast page Mode-fpm DRAM: allows continuous access to the same row can be served directly from the row buffer. (the original DRAM on the same line of four instructions, after each instruction is discarded, and then re-read.) ）
Extended Data Output-edo DRAM: allows individual CAS signals to be tighter in time.
Synchronous-sdram: Replace many of these control signals with the rising edge of the same external clock signal as the drive storage controller-faster than asynchronous.
Double Data rate synchronization-ddr SDRAM: doubles the speed of the DRAM by using two clock edges as the control signal. Category: DDR (2-bit), DDR2 (4-bit), DDR3 (8-bit)
RDRAM
video-vram: used in the frame buffer of the graphics system, the thought resembles fpm DRAM, the difference:
```
1.VRAM的输出是通过依次对内部缓冲区的整个内容进行移位得到的2.VRAM允许对存储器并行的读和写。
```
4. Non-volatile memory--rom
Ram Power loss data, is volatile ;

ROM is non- volatile, collectively referred to as read-only memory

(1) Classification

prom-programmable ROM, can only be programmed once
eprom-erasable programmable ROM, the number of times that can be erased and written is approximately 1000 times
EEPROM, an electronic erasable prom, can be programmed in the order of magnitude of 10 of the five times.
Flash Flash (2)
Based on EEPROM, it provides fast and durable nonvolatile storage for a large number of electronic devices.

Stored in: Digital camera, mobile phone, music player, PDA, notebook, desktop, server computer system

(3) Firmware

Programs stored in ROM devices are often referred to as firmware, and when a computer system is powered on, he runs the firmware stored in the ROM.

5. Accessing main memory (1) bus

A bus is a set of parallel conductors that can carry addresses, data, and control signals.

Bus classification:

A. System bus-connect CPU and I/O bridge

B. Memory bus--connect I/O bridge and main memory c.i/o bus (see 6.1.2.4 for details)

The I/O bridge translates the electronic signal of the system bus into the electronic signal of the memory bus and also connects the system bus and the memory bus to the I/O bus.

Second, disk storage 1. Disk Construction

Platter
Surface: two surfaces per platter
Spindle: Center of disc, rotatable
Rotational rate: usually 5400~15000/min
Track: Concentric Circles
Sectors: Each track is divided into a set of sectors
Data bits: Each sector contains an equal number of ~, typically 512 bytes
Gap: Stores the format bits used to identify sectors
Disk drives-disks-Rotating disks
Cylinder: The set of tracks that are equal to the center of the spindle on all disc surfaces.

2. Disk capacity--the maximum number of digits (1) that can be recorded on a disk impact factor:

Recording density-bits per inch
Track density-road/inch
Surface density-bits per square inch

Increased surface density increases capacity.

(2) Modern large-capacity disk--multi-zone record

Splits the collection of cylinders into disjoint subsets (record areas), each containing a contiguous set of cylinders;

Each track of each cylinder in a zone has the same number of sectors, and the number of sectors is determined by the number of sectors that the innermost track in the area can contain

Note: Floppy disk is still an old-fashioned method, the number of sectors per track is constant

(3) Calculation formula:

3. Disk operation

The disk reads and writes data as a block of sector size.

Access time by Category:

(1) Seek time

-The time it takes to move the drive arm.

Depends on the position of the read/write head and the speed at which the drive arm moves on the disc.

Usually 3-9ms, up to 20ms maximum.

(2) Rotation time

--The drive waits for the first bit of the target sector to rotate to the read/write header

Depends on the disc position and the rotational speed.

Maximum Rotation delay =1/rpm X 60secs/1min (s)

The average rotation time is half the maximum value.

(3) Delivery time

dependent on rotational speed and number of sectors per track

Average transfer time = 1/rpm x 1/(average sectors/track) x 60s/1min

The average time to access a disk sector content is the sum of average seek time, average rotation delay, and average transfer time.

Based on the 393-page example of a textbook, you can conclude:

1. The primary time is the seek time and rotation delay.

2. X2 The Seek time is a simple and reasonable way to estimate disk access time.

3. Logical Disk Block

Disk, track, sector, this ternary group uniquely identifies the corresponding physical sector.

Analogy: Memory can be seen as a byte array, disk can be seen as a block array.

4. Connecting to I/O devices (I/O bus)

The I/O bus is connected to CPU, main memory and I/O devices.

Universal Serial Bus usb:2.0 maximum bandwidth 60mb/s,3.0 Maximum bandwidth 600mb/s
Graphics Card (Adapter)
Host Bus Adapter
5. Accessing the disk
DMA: Direct Memory Access
--The device can perform its own read or write bus transactions without the need for CPU interference.

See the 395-page chart for the specific process.

Three, solid disk

SSD is a flash -based storage technology that "differs from rotating disks: The solid-state disk has no moving parts.

1. Composition

An SSD package consists of one or more flash chips and a flash translation layer:

闪存芯片——对应旋转磁盘中机械驱动器闪存翻译层（硬件/固件设备）——对应磁盘控制器

2. Read/write (1) sequential read/write

Speed is quite, sequential read is slightly faster than sequential write

(2) Random reading and writing

Write slower than read an order of magnitude

reason: underlying flash basic properties are determined.

A flash memory consists of a sequence of B blocks, each of which consists of P pages. The size of the page is usually 512~4kb, the block is made up of 32~128 pages, and the size of the block is 16kb~512kb.

Data is read and written in pages .

3. Advantages

Consisting of semiconductors, with no moving parts--
Random access times are faster than spinning disks
Lower energy consumption
More robust

4. Disadvantages

More prone to wear
More expensive

Four, storage technology advantages

Different storage technologies have different price and performance tradeoffs
The price and performance attributes of different storage technologies vary at a very different rate
Increased density to lower costs than reduced access times
DRAM and disk performance lags behind CPU performance

The second section of the locality

Principle of locality:

A well-written computer program often tends to refer to data items that are adjacent to other recently referenced data items, or to the data item itself that has recently been referenced.

Classification:

Time locality
Spatial locality

Application:

1. Hardware layer:

By introducing cache memory to save the most recently referenced directives and data items, the access speed to main memory is increased.

2. Operating system level:

The system uses main memory as the cache of the most recently referenced block in the virtual address space, using main memory to cache the most recently used disk blocks in the disk file system.

3. In the application:

The Web browser places the most recently referenced documents on the local disk.

I. Locality of reference to program Data 1. Reference mode with step size K

definition: in a continuous variable, every k element is accessed, which is referred to as the reference pattern of the step size K.

the 1-step reference pattern: The sequential access to each element of a vector, sometimes called the sequential reference pattern , is a common and important source of spatial locality in the program.

In general, spatial locality decreases as the step size increases.

The code above is executed in line precedence , and the following code is executed in column precedence , while the C array is stored in row order in memory, so the first spatial locality is good and the second space is poorly localized.

Because the loop experience is executed several times, it also has a good time locality.

Second, the locality of taking instruction

Program instructions are stored in memory, and the CPU must take out (read out) these instructions.

But one important attribute of code that differs from program data is that it cannot be modified at run time.

Iii. Summary of Local

The simple principle of quantitative evaluation of locality in a program:

A program that repeatedly references the same variable has good time locality
For programs with reference patterns with a step size of K, the smaller the step size, the better the spatial locality
The loop has a good time and spatial locality for taking orders. The smaller the loop body, the more the loop iteration number, the better the locality.

Section III Memory hierarchy

That is, each tier of storage devices is the next level of "cache"

First, the cache

Cache: is a small and fast storage device that acts as a buffer area for data objects stored in larger, slower devices.

Caching: The process of using a cache is called caching.

Data is always copied back and forth between the level K and the k+1 layer with the block size as the transmission unit. The block size is fixed between any pair of adjacent layers, but the other hierarchy pairs can have different block sizes.

Generally speaking: the lower the layer, the larger the block.

1. Cache Hits

When a program needs a data object D in Layer k+1, first look for D in a block that is currently stored in the K layer, and if D is just cached in the K layer, it is called a cache hit.

The program reads D directly from level K, faster than reading d from the k+1 layer.

2. Cache Misses

That is, there is no cached data object D in Layer K.

The K-tier cache then extracts the block containing d from the k+1 cache. If the level K cache is full, it is possible to overwrite an existing block

Overwrite--Replacement/expulsion

Replacement policy:

Random substitution strategy-randomly sacrificing a block
The least recently used substitution strategy lru-sacrifices the last accessed time distance now to the furthest block.

3. Types of Cache Misses (1) mandatory misses/cold misses

That is, the K-tier cache is empty (called a cold cache), and access to any data object is not hit.

It is usually a transient event that does not repeatedly access the memory to make the cache warm (understood to be repeated access to the memory, so that the memory is not empty?). ) appears in the stable state after.

(2) Conflict not hit

Because of a placement policy, placing a block limit on the k+1 layer in a small subset of the K-layer block causes the cache to be not full, but the corresponding block is full and will not be hit.

(3) Capacity not hit

When the size of the working set exceeds the size of the cache, the cache undergoes a capacity miss, which means that the cache is too small to handle the working set.

4. Cache Management

Some form of logic must manage the cache, while the logic for managing the cache can be either hardware, software, or a collection of both.

Second, the memory hierarchy structure concept summary

Section fourth cache memory L1 cache:

Between the CPU register file and main memory, the access speed is 2-4 clock cycles

L2 Cache:

Between L1 cache and main memory, access speed of 10 clock cycles

L3 Cache:

Located between the L2 cache and main memory, with access speeds of 30 or 40 clock cycles

A general-purpose cache memory structure

A cache is an array of cache groups whose structure can be described using tuples (s,e,b,m):

S：这个数组中有S=2^s个高速缓存组E：每个组包含E个高速缓存行B：每个行是由一个B=2^b字节的数据块组成的m：每个存储器地址有m位，形成M=2^m个不同的地址

In addition, there are markers and valid bits:

有效位：每个行有一个有效位，指明这个行是否包含有意义的信息标记位：t=m-(b+s)个，唯一的标识存储在这个高速缓存行中的块组索引位：s块偏移位：b

The cache structure divides m addresses into T-markers, S-group index bits, and B-block offsets.

1. Cache Size/Capacity C

Refers to the size of all blocks and, not including the marker bit and the valid bit, so:

C=S*E*B

2. Working process

S,b divides the M address bits into three fields, see, and then:

Find out which group the word must be stored in by the S group index bit first
Then the T-marker bit tells us which line in this group contains the word ( when and only if a valid bit is set and the tag bit of the row matches the marker phase in the address )
B-block biased shift gives the word offset in the B-byte data block

Exercise 6.10: Just remember the quantity relationship between several parameters

Second, direct mapping cache

The cache is divided into different classes according to E (the number of cache rows per group), E=1 is called direct mapping cache, as an example:

The cache determines whether a request is hit, and then the process of removing the requested word is divided into three steps:

1.组选择2.行匹配3.字抽取

1. Group selection

Cache Extract S group index bits from the middle of the address of W

组索引位：一个对应于一个组号的无符号整数。

Analogy: Cache-about an array of groups, the group index bit is the index to this array.

2. Row matching

Note that there are two sufficient prerequisites to determine the cache hit:

The row has a valid bit set
The tag in the cache line matches the tag in the W's address

3. Word Selection

The same analogy: block-an array of bytes, the byte offset is an index to this array.

In my understanding, I can also compare the subscript of an array, the offset of a valid address, and so on.

4. Row substitution when cache misses

--Replace the current row with the newly fetched row ...

5. Direct-mapped cache in post-run

The tag and index bits are linked together to uniquely identify each block in the memory
Blocks mapped to the same cache group are uniquely identified by the tag bit

※ Note the CPU of the textbook 第413-414 page to perform a series of actions to read

1.先利用索引位，确定是针对哪个组2.然后看对应的组是否有效：（1）如果无效则缓存不命中，高速缓存从存储器或低一层中取出要找的块，存储在对应的组中，再把有效位置1，返回需要的值（2）如果有效，再根据标记找是否有匹配的标记：    -如果有，则缓存命中，返回需要的值    -如果没有，则替换行，然后返回。

6. Collision-Not-hit (1) jitter in direct map cache:

--cache repeatedly load and evict groups of the same cache block

(2) Reason:

These blocks are mapped to the same cache group.

(3) Workaround:

Place B-byte padding at the end of each array (b-byte is the length of a block, one row is a block, which is equivalent to separating rows) so that they map to different groups.

Why do you index with intermediate bits? See 415-page Exercise 6.12 and 416-page marginal notes. High, at any time, the cache stores only one block-sized array of content.

Three, group-linked cache

E-channel group-linked cache: 1<e<c/b

1. Group selection

The same as the direct one.

2. Line matching and word selection

The form is (key, value), matches with key as token and valid bit, and returns value after matching.

Important idea: any row in the group can contain any memory block mapped to the group, so tell the cache that each row in the group must be searched.

The criteria for judging a match are still two sufficient and necessary:

1.有效2.标记匹配

3. Line substitution

A blank line replaces a blank row, there is no blank line, and the substitution policy is applied:

Random substitution
The most infrequently used policy LFU: Replace the row that has the fewest references in a window in the past.
Least recently used policy LRU: replaces the last line that was visited the longest time.

Iv. full-phase-linked cache (e=c/b) 1. Group selection

There is only one group, default group 0, no index bits , and addresses are divided into only one tag and one block offset.

2. Line matching and word selection

associated with the group.

It is only suitable for small caches.

V. Write 1. When a write hit, update the copy in the lower layer of the method: (1) write directly, the cache block of W immediately to the lower layer

Cons: Each write will cause bus traffic.

(2) write back, only if the replacement algorithm is to evict the updated block, it is written to the lower layer immediately below

Advantages: Conform to the principle of locality, significantly reduce bus traffic
Cons: Added complexity, you must maintain an additional modification bit for each cache line

2. Write misses processing method (1) Write allocation---usually write back to the corresponding

Load the blocks in the corresponding lower layer into the cache, and then update the cache block.

(2) Non-write assignment---usually write directly to the corresponding

Avoid the cache and write the word in the lower layer directly.

Six, the real cache hierarchy:

The cache saves both data and instructions.

Save instructions only: I-cache
Save only the program data: D-cache
Save the instruction and save the data: Unified cache

Vii. performance impact of cache parameters 1. Performance:

No hit = number of misses/number of references
Hit Ratio = 1-No hit
Hit time
No hit penalty: Because of the extra time required to miss the

2. Specific impact:

Cache Size: Hit ratio +, hit Time +
Block Size: Spatial locality +, hit ratio +, number of cache rows-, time locality-, no hit penalty +
Degree of coupling: E value big, jitter-, Price +, hit Time +, no hit penalty +, control logic + "compromise for not hit penalty low, low degree of coupling, not hit penalty high, use high degree of coupling"
Write policy: The farther down, the more likely it is to write back rather than write directly

Several concepts are easily confused in this section, and errors are generated when you do a problem. The distinction is as follows:

The fifth section writes the cache-friendly code 1. Basic methods:

Let the most common situation run fast
Minimum number of cache misses within each loop

2. Important issues:

Repeated references to local variables are good (temporal locality)
The reference pattern of step 1 is good (spatial locality)

Sixth section Memory Mountain

Each computer has the only memory mountain that indicates the capabilities of his memory system.

--that is, the performance of the memory system is expressed as a mountain of time and space locality.

What you want to accomplish: make the program run at the peak rather than the trough

Objective: To make use of time locality to remove frequently used words from L1 and to use spatial locality to make as many words as possible from a L1 cache line.

20145225 "Information Security system Design Fundamentals" 7th Week Study Summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More