In-depth understanding of computer systems (ii) Principles of memory and cache [illustration]

Source: Internet
Author: User

Deep computer system note (1) describes the composition, execution, and control of programs. The next step is to run. I skipped the "processor architecture" and "Optimized Program Performance", and the notes for these two chapters continue to be delayed!

A great use of "deep into computer systems" is: it gives us a lotDefinitionOrScientific ExplanationThis will become my theoretical basis; it is no longer a chat on the Internet with self-proclaimed veterans. He is indeed the best computer-savvy University at Carnegie Mellon University.

This blog skips Chapter 4 CPU Structure, Chapter 5 optimizes program performance, and does not discuss the cache mechanism in detail.

Vi. Memory Hierarchy

It is necessary to list the Memory Hierarchy again:

What does "row", "Block", and "group" of High-speed cache mean? Refer to the "physical logic diagram of High-speed cache" at the bottom of Bolg"

Word: indicates a word, which is 16 characters in ia32,

Block: it is a fixed-size information package, which is transferred back and forth between the cache and the primary storage. The block contains 32 ~ 64 bytes. Therefore, the memory only contains information and becomes a block.

Row: the container that stores other information in the cache. Therefore, a row is always the size of a block. Generally, "rows" and "blocks" can be used interchangeably. Combined with the existing block and other information of the cache, this is called "row"

GROUP: one or more rows

--- "The previous knowledge stays in Ram and Rom, and now we can push forward further ram into static SRAM and dynamic dram. SRAM is mainly used for high-speed cache dram for memory

SRAM stores each bit in a bistability memory unit. Each unit is implemented using a six transistor circuit. It can be maintained indefinitely in one of two different voltage configurations or states. That is to say, as long as there is electricity, he will keep its value forever. Even if there is interference, such as electronic noise and disturbance voltage, when the interference is eliminated, the circuit will return to the stable value. Demonstrate the effect of "bistability" and adopt a inverted clock.

Can this principle be explained as follows: Does electromagnetic interference cause TV screen disorder and cell phone signal interference?

DRAM will not be stored as a capacitor, which is very small, usually only 30 milliliters of micro Farah. Unlike SRAM, DRAM storage units are sensitive to interference. When the capacitor voltage is not disturbed, it will never be restored. Therefore, DRAM must constantly refresh the capacitor. When exposed to light, the capacitor voltage changes. In fact, sensors in digital cameras and cameras are essentially arrays of DRAM units. [Surprised]

In any case, SRAM and DRAM are both volatile)

The following describes the dram read mechanism:

The disadvantage of this two-dimensional array is that the address must be sent in two steps to increase the storage time.

--- Difference between DDR2 and ddr3 memory. Double data-rate synchronous DRAM bandwidth is 4 bits and 8 bits respectively. Ddr3 is also divided into 1333mhz and 1600 MHz

--- Disk, skipped.

--- The performance of DRAM and disk lags behind that of CPU, although their performance is growing.

This is why high-speed cache between CPU and memory keeps increasing.

--- People found that they could not increase the clock frequency of the CPU as before, and the computer manufacturer hit the "energy wall" because the power consumption of chips would be too large. So the emergence of multi-core, multi-core CPU clock is reduced, and the area is gentle. However, the effective CPU cycle time is still like the previous rate increase.

--- "High-speed cache operation mechanism. P (406)

---"Buffer hit.

Why is the second startup program much faster than the first one because of the buffer hit principle. After the first startup, a lot of data remains in the multi-tier buffer. At this time, if it is started again, it will reduce the number of data moves and the startup time of the program. We found that some data exists in the cache and does not need to be retrieved from the memory.

For example, irrigation of wheat, using a 100-meter water channel (ordinary water channel, Not cement), if the water is fixed.

When used for the first time, the water in the water canal is slowly moving forward. It takes about 10 minutes to walk through the water canal of 100 meters. This is because some water may be lost, it is infiltrated into the soil below the channel.This part of water can be seen as water in the "advanced cache.

When we reirrigate wheat for the second time (in a short time), the water in the channel quickly took two minutes, because compared with the first time, the water in the channel does not need to penetrate much water under the king. If it takes a long time, after two days, the water in the channel will penetrate down again.

The third irrigation (For A Long Time) also requires some water to penetrate into the soil under the channel, so it still takes 10 minutes.

Use a computer instance: When a load command instructs the CPU to read a word from memory address a, it sends address a to the cache. If the cache certificate stores one copy of the address at location a, it immediately sends the word to the CPU. This is much faster than reading from memory.

--- Core i7 high-speed cache hierarchy

Note: Only the Instruction Cache I-cache is saved, and the data cache is called D-cache. Commands and data are saved in a unified cache.

As stated in chapter 1, high-speed cache is crucial, especially if the technology "buffer hit" is optimized, the speed will be greatly improved.

Physical logic diagram of High-speed cache:

--- Directly ing high-speed cache P (410)

According to E (high-speed cache row), high-speed cache is divided into different classes. Each group has only one (E = 1) high-speed cache.Direct ing to high-speed cacheAlthough an integer may be in a register, an integer array may exist in the cache.

--- "Conflict does not hit. See the following example:

float dotprod(float x[8], float y[8]){    float sum = 0.0;    int i;    for(i = 0;i < 8;i++)        sum += x[i] * y[i];    return sum;}

In the first iteration of X [0], X [0] ~ The block of X [3] is loaded to the high-speed buffer group 0. The next call is Y [0], but it does not hit again, resulting in Y [0] ~ Blocks of Y [3] are copied to group 0 to overwrite the previous x value. In this iteration, the next X [1] value will continue to miss, and then X [0] ~ X [3], overwrite y [0] ~ The value of Y [3. This is called conflict miss or jitter. Essentially, the X and Y arrays are mapped to the same group. Programmers can avoid this jitter, but I think the compiler should solve this problem.

There may not be many errors in the original book, but they also affect the reading mood.

P406 Figure 6-24 has a cell of 10 missing, and 12 cells are repeated.

P403 Figure 6-21 A) The function name should be: int sumarraycol (int A [m] [N]).

P199 the last line of the first line should be "offset"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.