Csapp (4): Memory hierarchy

Source: Internet
Author: User

The memory system is a hierarchical structure of storage devices with different capacity, cost, and access times.

(i) Types of storage devices

(ii) access to main memory

Read and write operations are initiated by the bus interface circuitry on the CPU.

Depending on the data flow in, for read operations:

1.CPU puts the address A on the system bus, then the I/O bridge transmits the signal to the memory bus;

2. Main memory senses the address signal on the memory bus, reads the address from the memory bus, extracts the data word from the DRAM, and writes the data to the memory bus. The I/O bridge translates the signal into a system bus signal transfer.

3.CPU senses the data on the system bus and reads the data from the bus.

For write operations:

1.CPU puts the address A on the system bus, then the I/O bridge transmits the signal to the memory bus, the main memory reads the address, waits for the data to arrive;

2.CPU Copy the data Word to the system bus

3. Main memory reads the data word from the memory bus and stores it.

This is the process of transferring data between CPU and main memory. If more detail, how to read and write data according to the bus information for main memory ?

(iii) Access to disk

When the operating system is performing an IO operation, the operating system sends a command to the disk controller to read a logical number. A firmware on the controller performs a quick table lookup, translates a logical block number into a ternary group (disk, track, sector), reads the data into a buffer in the controller, and then copies it to main memory.

The CPU uses a technique called memory mapping to send commands to an IO device. In the memory-mapped IO system, a block of addresses in the address space is reserved for communication with the IO device. Such an address is called an IO port.

As shown, disk reads:

1. The first instruction is to send a command word that tells the disk to initiate a read, and also sends other parameters (such as when the read completes, whether to interrupt the CPU); The second instruction specifies the logical block number that should be read; The third instruction indicates the main memory address where the contents of the disk sector should be stored. Initiates disk reads.

2. The disk controller receives the command, translates the logical block number to the sector address, reads the sector content, and then transmits the content directly to main memory without CPU interference. This process is called DMA transfer (direct memory access)

When the 3.DMA transfer is complete, the disk controller sends an interrupt signal to the CPU to notify the CPU, which causes the CPU to pause the work it is currently doing and jump to an operating system routine. This routine will record that the IO has been completed and then return control to the place where the CPU was interrupted.

(iv) Locality

A well-written program shows good locality.

Time locality: In a good time local program, the memory location that has been quoted once is likely to be quoted again in the near future;

Spatial locality: In a good spatial locality program, if a memory location is referenced once, then the program is likely to refer to a nearby memory location in the near future.

Simple principles for evaluating the local nature of the program:

1. Repeated reference to the same variable program has a good time locality;

2. For reference patterns with a step size of K, the smaller the step size, the better the spatial locality.

3. For the instruction, the loop has good time and space locality. The smaller the loop body, the greater the number of iterations and the better the locality.

(v) Memory structure

Based on two facts: 1 The access time of different storage technologies is very poor, the speed gap between CPU and main memory is increasing; 2) Good program has good locality. People think of a way to organize memory systems: memory hierarchies.

The memory hierarchy is primarily dependent on a concept: caching.

(vi) Cache memory

Early computer systems have a memory hierarchy of only three tiers: CPU registers, DRAM master memory, and disk storage. As the gap between the CPU and main memory grows, the system Designer is forced to insert the L1 cache and then insert the L2 cache.

Say a little bit more:

How big is the difference between C and M? Is t/e so big, because the cache is very small, so t must be very big.

As many lines, what is the difference between organizing by S and organizing by E? Organized by S, can be indexed by address, similar to an array; organized by E, using tag to match, similar to map.

Depending on the E, the cache is divided into different classes. The e=1 cache is called the Direct map cache (direct-mapped cache). E>1 is called group-linked cache (set associative cache).

For a direct map cache, if you do not hit it, you need to save the new block from the next layer in the cache. If you want to replace it, because there is only one row in the group, you can replace it directly.

Because there is only one row in the group, conflicts are often not hit when the program accesses an array of powers of size 2. The reasons are as follows:

The basic principle of group-linked caching is similar to direct mapping caching. It is worth mentioning that the line substitution is not hit. There are random substitution policies, least common substitution policies, and least recently used substitution policies. All of these policies require additional time and hardware. However, the more expensive it is to go below the memory hierarchy, the more costly it is to use a better replacement strategy.

The above only read, how to write it?

After updating the copy of W in the cache, how do I update the copy of W in the hierarchy followed by the lower layer?

1. Write directly, the cache block of W is immediately written back to the copy immediately below the lower layer.

2. Write back, only if the replacement algorithm is going to evict the updated block, write it to the lower layer immediately thereafter. Then the cache must maintain an additional bit of modification for each cache line, indicating whether the cache has been modified

How do I handle write misses?

1. Write the allocation, load the corresponding block in the next layer into the cache, and then update the cache block.

2. Non-write allocation, avoid cache, write directly to the lower layer.

Write-through caches are typically non-write-allocated, and writeback caches are typically write-allocated.

(vii) Writing good local code

1. Focus your attention on the inner loop, where most of the computation and memory access takes place;

2. By the data object stored in the memory in order to take the step of 1 to read the data, so that the spatial locality is the largest;

3. Once a data object is read from memory, use it as much as possible, thus maximizing time locality

Csapp (4): Memory hierarchy

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.