Read the previous article. You may be wondering why the execution of the program will be the result. Now, let's go into the world of CPUs.
In the SMP (symmetric multiprocessor) era, multiple CPUs work together. So that the computational power is further improved, how does the CPU coordinate memory interview?
+--------------+ +--------------+ | CPU0 | | CPU1 | +--------------+ +--------------+ ^ | ^ | | | | | | V | V | +--------+ | +--------+ |<--> | Store | |<--> | Store | | | Buffer | | | Buffer | | +--------+ | +--------+ | | | | | V | V +--------------+ +--------------+ | Cache | | Cache | +--------------+ +--------------+ | | | | +------------+ +------------+ | Invalidate | | Invalidate | | Queue | | Queue | +------------+ +------------+ | Interconnect | +----------------------------------+ | +-----------------------+ | Memory | +-----------------------+
is a modern CPU commonly used architecture, because the CPU's computing power is faster than memory access to ask a lot of speed, so that the memory is a relatively slow operation of the process of instruction. So in the CPU unit and
There is also a multilevel cache between main memory, often called L1, L2, L3. The speed of their visits is decreasing in turn. Only shows the existence of a first-level cache. This way, when the CPU visits the memory, it will first look in its own Cache
Find. See if this memory is already cached, assuming it is already in the cache, then the direct access to the cache is complete.
With the presence of a cache. CPU operation Speed has been greatly improved. Because the cache is relatively expensive, it is generally smaller on my Pentium E5800 computer. The first-level cache is only 128KBytes.
The cache is made up of cache lines, commonly called cache line, in modern Intel CPUs. Typically 64B, the cache is usually made up of multiple links. My CPU is 4-way, and here I don't want to dive into the multi-pass
How the cache works and is interested in being able to access it on its own.
When there is no data to be needed in the cache. Called the cache miss, this is usually loaded from memory, and the cache is loaded in cache line and is aligned with the cache line, which is the
The address is uploaded into the cache line-size content, which is typically 64 bytes, which means that the data is kept in a cache line-aligned memory, which causes the computational efficiency to increase.
When multiple CPUs are required to access the same memory. The same memory content will appear in the cache of multiple CPUs today, and the question is how to maintain the unity between them. For example, when the same address is cached at the same time by two caches, one of which is rewritten, there must be a means to notify the other cache of immediate updates. In case there is a CPU that needs to fetch the data, it can get the latest data.
The cache has brought these problems. It is not possible to instruct some memory to be cached when asked. Or, more precisely. How do you control the shape of the cache when you visit a piece of memory? The answer is to be able. We can specify the type of memory.
The Mtrrs register allows you to specify the memory type of the physical address range, which is typically stored in these types of usage:
1. UC (uncacheable), which indicates that the memory cannot be cached.
2. WT (write Through), indicating that the cache and memory are updated when writing.
3. WB (write back), which means that only the cache is updated, and the appropriate time to write to memory.
4. The memory such as WC (write combining) will not be deleted by the cache, and may be deferred for write operations.
But often, in order to achieve a relatively high efficiency. Normal memory usage usually chooses the WB type. That is, the memory will be cache. At this point need a means to ensure the consistency between the caches, professional point is cache-coherence.
So how do you know that the cache line in your cache exists in the cache of other CPUs? How to record your own cache line has been changed, need to write back, obviously there are some status tags to record these
Things. This is the MESI protocol.
In the MESI protocol. Each cache line has 4 statuses. The 2 bits are used to indicate that each of them is:
1. M (Modified)
This line of data is valid and the data is changed. Inconsistent with the data in memory, the data exists only in this cache.
2. E (Exclusive)
This line of data is valid, the data is consistent with the data in memory, and the data exists only in this cache.
3. S (Shared)
This line of data is valid, the data is consistent with the data in memory, and the data may exist in very many caches.
4. I (Invalid)
This row of data is invalid.
The truth is being solved in a step-by-step way ...
Go into the Cache of the CPU