Memory, register and cache differences and connections

Source: Internet
Author: User

1. The register is the component part of the central processing Unit. Registers are high-speed storage parts with limited storage capacity that can be used to hold instructions, data and addresses. In the control unit of the central processor, the registers included are the instruction register (IR) and the program counter (PC). in the arithmetic and logic parts of the central processing Unit, the registers included are accumulators (ACC).
2. Memory contains a very wide range, generally divided into read-only memory (ROM), random memory (RAM) and cache memory (caches).

3. Registers are internal components of the CPU, the registers have very high read and write speed, so the data transfer between the registers is very fast.
4. Cache : The cache is a small but high-speed memory that sits between the CPU and the main memory . Because the CPU speed is much higher than the main memory, the CPU accesses the data directly from memory to wait for a certain period of time, thecache holds the CPU has just used or recycled part of the data , when the CPU re-use the part of the data can be directly called from the cache, This reduces the CPU wait time and improves the efficiency of the system. The cache is also divided into one-level cache (L1 cache) and level two cache (L2 cache), L1 cache is integrated within the CPU, L2 cache is usually soldered to the motherboard and is now integrated into the CPU , the common capacity is 256KB or 512KB L2 Cache.

Summary: In general, the data is through the memory-cache-register, the cache is to compensate for the CPU and memory between the speed difference between the components set.

http://blog.csdn.net/csuyishuan/article/details/52073421

First look at the computer's storage System (memory hierarchy) Pyramid

Next we look at a computer's storage system

Register

Registers are the internal components of the CPU, which is the place where commands and data are taken when the CPU is operating, and the registers can be used to hold instructions, data and addresses. In the CPU, there are usually universal registers, such as instruction register IR, special function register, such as program counter PC, SP, etc.

Cache

The cache is used to temporarily store the data in memory, if the register to take part of the data in memory, can be taken directly from the cache, which can be increased speed. The cache is a partial copy of memory.

CPU <---> registers <---> Cache <---> memory

Registers work in a very simple way, with only two steps: (1) Find the relevant bits, (2) read the bits.

The way memory works is much more complicated:

(1) A pointer to the data is found. (The pointer may be stored in the register, so this step includes the full work of the Register.) )

(2) Send the pointer to the Memory management Unit (MMU), which translates the virtual memory address into the actual physical address by the MMU.

(3) Send the physical address to the memory controller and the memory controller to find out which memory slot (bank) the address is on.

(4) Determine which memory block (chunk) the data is on, and read the data from that block.

(5) The data is sent back to the memory controller, then sent back to the CPU, and then started to use.

The workflow of memory is a lot more than the register. Each step creates a delay, which accumulates to make the memory much slower than the register.

To mitigate the huge speed differences between registers and memory, hardware designers have made a lot of effort, including setting up caches inside the CPU, optimizing how the CPU works, and trying to read all the data from the memory commands at once, and so on.



Ram-memory

That is, memory, which is the unit for storing data. It is used to temporarily store the operational data in the CPU, as well as the data exchanged with external memory such as the hard disk.

Harddisk

Hard disk

Shihu
Links: http://www.zhihu.com/question/20075426/answer/16354329
Source: Know
Copyright belongs to the author, please contact the author for authorization.

A assembly instruction probably the execution process is (not absolute, different platforms have differences):

Refer to (take instruction), decoding (to convert instructions to micro-instructions), take the number (read the number of operations in memory), calculation (various calculations of the process, the ALU is responsible for), write back (write the results back to memory), some platforms, the first two steps will be merged into one step, some instructions will not have a number or write

Again, the concept of CPU frequency: First, the frequency is absolutely not equal to the number of instructions can be executed in a second, the execution cost of each instruction is different, such as the x86 Platform Assembly Instruction Inc is faster than the add, the specific clock cycle of each instruction can refer to the Intel Manual.

Why should we mention the main frequency? Because each operation requires a clock cycle during the execution of the above, 5 clock cycles are required for the addition of an operating memory, in other words, a CPU of 500Mhz at a maximum of 100MHz instructions.

Careful observation, the above steps do not include register operation, for the CPU read/write registers do not need time, or if only the operation register (such as MOV bx,ax and other operations), then the number of instructions executed in a second is theoretically equal to the frequency, because the register is a part of the CPU.

Then register down is the cache at all levels, there are L1 cache,l2, even L3, and TLB these (TLB can also be thought of as the cache), then memory, said register fast, now say why these slow:

For all levels of the cache, the access speed is different, theoretically l1cache (primary cache) has the same speed as the CPU register, but L1cache has a problem, when you need to synchronize the cache and memory between the content, you need to lock a piece of the cache (the term is the cache Line), and then the cache or memory content updates, during this time the cache block is inaccessible, so the L1cache speed is not register fast, because it will be frequently unavailable for a period of time.

L1 cache Below is L2 cache, even L3 cache, these have the same problem as the L1 cache, to lock, Sync, and L2 slower than L1, L3 slower than L2, so the speed is lower.

Finally say that memory, the main frequency of memory is now about 1333? Or 1600, the unit is MHz, which is much lower than the CPU speed, so the memory speed starting point is lower, and then the memory and the CPU communication is not what you want.

Memory not only to communicate with the CPU, but also through the DMA controller and other hardware to communicate, the CPU to initiate a memory request, first to give a signal to say "I want to access the data, you busy not busy?" "If the memory is busy at this time, the communication needs to wait, not busy, the communication can be normal." Also, the time cost of this request signal is enough to execute several assembly instructions, so this is one reason for the memory to be slow.

Another reason: the memory and CPU communication between the channel is also limited, is called "bus bandwidth", but, to remember that this bandwidth is not only left to memory, but also the video memory and other communications to go this way, and because the road is shared, so any request to initiate between the first preemption, to seize the bandwidth takes time , it also takes time for the bandwidth not to wait.

The above two adds up to a slower CPU access memory than the cache.

For a more understandable example:

CPU to take the value of register ax, it only takes one step: Bring Ax to me and Ax will take it.
CPU to take a value of L1 cache, need 1-3 steps (or more): To lock a cache line, put some data to get, unlock, if not locked on the slow.
CPU to take a value of L2 cache, first to L1 cache, L1 said, I did not, in L2, L2 began to lock, Lock, L2 in the data copied to L1, and then perform the process of reading L1, the above 3 steps, and then unlock.
The CPU takes the same L3 cache, except that it is copied from L3 to L2, from L2 to L1, from L1 to CPU.
CPU fetching memory is the most complex: Notify memory controller to occupy bus bandwidth, notify memory lock, initiate memory read request, wait for response, the response data is saved to L3 (if not to L2), then from L3/2 to L1, then L1 to CPU, then the bus is unlocked.

The difference between the disk cache and the memory cache memory cache

Cache (English: cache, English pronunciation:/k?? /Kash [1][2][3], or cache), whose original meaning is a ram that accesses faster than general random access memory (RAM), usually unlike the DRAM technology used in system main memory, and uses expensive but faster SRAM technology.

Principle
The word cache is derived from a 1967-year paper on electronic engineering journals. The author of the French word "cache" to the meaning of "safekeeping storage" for the field of computer engineering.

When the CPU processes the data, it goes to the cache first, and if the data is staged because the previous operation has already been read, it does not need to read the data from the Random access memory (Main memory)-- Since the CPU is generally running faster than the main memory, the main memory period (the time it takes to access the main memory) is a few clock cycles. So to access the main memory, you have to wait for a few CPU cycles to cause waste.

cache" is to adapt the speed of data access to the CPU's processing speed, based on the principle of "local sexual behavior of program execution and data access" in memory, that is, the time and space of a certain program execution, The code being accessed is concentrated in part. To get the most out of caching, not only rely on "staging the data you've just accessed", but also use the hardware-implemented instruction prediction and data prefetching techniques-to take the data you want to use in advance from memory to the cache, whenever possible.

CPU's cache was once an advanced technology used on supercomputers, but today's computers use AMD or Intel microprocessors that integrate data caches and instruction caches of varying sizes within the chip, commonly known as L1 caches (L1 Cache is level 1 On-die cache, first-on-chip cache), while L2 caches larger than L1 have been placed on the outside of the CPU (motherboard or CPU interface card), but have now become the standard components within the CPU The more expensive CPU will be equipped with a L3 cache larger than the L2 cache (level 3 On-die cache tertiary buffer memory).

expansion of the concept of  
The concept of caching today has been expanded to include not only the cache between the CPU and the main memory, but also the cache (disk cache) between the memory and the hard disk. Even between the hard disk and the network there is a certain sense of cache── called the Internet temporary folder or network content cache. Any structure that is located between two kinds of hardware with large speed difference, which can be used to coordinate the differences between data transmission speed, is called the cache.

address mirroring and transformation  
Main article: CPU Cache # Group-linked  
Because the memory capacity is much larger than the capacity of the CPU cache, the two must be matched by a certain rule. Address mirroring refers to the loading of main memory blocks into the cache according to certain rules. Address transformation is when the main memory block is loaded into the cache by a certain image, and each time the CPU cache is accessed, how to transform the physical address of the main memory (physical address) or virtual addresses into the address of the CPU cache to access the data.

Cache substitution Policy
Main article: CPU cache # Permutation policy, paging, and cache file substitution mechanism
The main memory capacity is much larger than the CPU cache, disk capacity is much larger than main memory, so no matter what level of cache is faced with the same problem: when the capacity of a limited amount of free cache space, and new content needs to be added to the cache, how to select and discard the original part of the content, so as to make room for the new content There are several algorithms to solve this problem, such as the longest unused algorithm (LRU), FIFO, least Recently used algorithm (LFU), non-Recently used algorithm (NMRU), etc., which have different efficiency and cost at different levels of cache execution. It is necessary to choose the most suitable one according to the specific occasion.

Disk Cache

Disk Cache

16MB Buffers of hard disks
Disk cache, or disk caches, is actually the data that is downloaded to the system in the memory space allocated to the software (the memory space is called a "memory Pool"), and the data is saved to the hard disk when the data is saved to a level in the memory pool. This reduces the actual disk operation, effectively protecting the disk from duplication of read and write operations, resulting in damage.

Disk caching is to reduce the number of times the CPU reads disk drives through I/O, improves the efficiency of disk I/O, uses a piece of memory to store more frequent disk content, because memory access is an electronic action, and disk access is an I/O action, feeling that disk I/O becomes faster.

The same technique can be used in the write action, we first put the content to write into memory, wait until the system has other idle time, and then write this memory data to disk.

Size
Today's disks typically have a 32MB or 64MB cache. The old hard drive has 8MB or 16MB.

Memory, register and cache differences and connections

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.