High concurrency (5) Understanding cache consistency protocol and its impact on concurrent programming

Source: Internet
Author: User

As a cross-platform language, Java implements different underlying hardware systems. It designs an intermediate layer model to avoid underlying hardware differences and provides upper-layer developers with consistent interfaces. The Java memory model is such an intermediate layer model. It shields programmers from the underlying hardware implementation details and supports most mainstream hardware platforms. To understand the Java memory model and some technical means to handle high concurrency, it is necessary to understand some basic hardware knowledge. This article will introduce some hardware knowledge related to concurrent programming.


A basic CPU computing process is as follows:

1. The program and data are loaded to the main memory.

2. commands and data are loaded into the high-speed cache of the CPU.

3. The CPU executes the command and writes the result to the cache.

4. Data written back to the master memory in the cache


In this process, we can see that there are two problems

1. modern computing chips integrate a L1 high-speed cache. We can understand that each chip has a private storage space.When different computing chips of the CPU need to access the same memory address, the value of this memory address will have multiple copies between different computing chips of the CPU. How can we synchronize these copies??

2. the CPU reads and writes directly to the cache, rather than directly to the master memory. Generally, a primary storage accesses a clock in dozens to hundreds of clock cycles, while a L1 high-speed cache reads and writes only require one or two clock cycles, the read/write of an L2 high-speed cache only requires dozens of clock cycles.So when will the CPU value written to the cache be written back to the master? If multiple computing chips are processing the same memory address, how can this time difference be solved?.


For the first problem, different hardware structures are handled differently. Let's take a look at the concept of interconnectivity.

Interconnectivity is the medium for the processor to communicate with the processor in master memory and between the processor and the processor. There are two basic interconnection structures: SMP (symmetric Ric multiprocessing) and NUMA (nonuniform memory access non-consistent Memory Access)

SMP system structures are very common because they are the easiest to build, and many small servers use this structure.The processor and memory are interconnected by bus, and both the processor and memory have bus control units responsible for sending and listening to information broadcast by bus. However, at the same time, only one processor (or storage controller) can broadcast on the bus. All processors can listen.

It is easy to see that the use of the bus is the bottleneck of the SMP structure.


In the nump system structure, a series of nodes are interconnected through point-to-point networks, like a small Internet. Each node contains one or more processors and one local memory. Local Storage of one node is visible to other nodes. Local Storage of all nodes forms a global storage that can be shared by all processors. It can be seen that nump's local storage is shared rather than private, which is different from SMP. Nump requires more complex protocols than bus replication. processors can access the memory of their nodes faster than those of other nodes. Nump is highly scalable, so many large and medium-sized servers are currently using the nump structure.


For upper-layer programmers, the most important thing to understand is that interconnectivity is an important resource. The quality of use will directly affect the execution performance of the program.


After understanding the different interconnection structures, let's look at the cache consistency protocol. It mainly deals with the problem that multiple processors handle the same primary address.

MESI is a mainstream cache consistency protocol that has been used in Pentium and PowerPC processors. It defines several statuses of cache blocks.

  • Modified: the cache block has been modified and must be written back to the primary storage. Other Processors cannot cache this block.
  • Exclusive: the cache block has not been modified and cannot be loaded by other processors.
  • Share: the cache block has not been modified and can be loaded by other processors.
  • Invalid (invalid): the data in the cache block is invalid.

Shows the status transition instance of the MESI cache consistency protocol.

1. In a, processor A reads data from address a and saves the data to its cache and sets it to exclusive.

2. In B, when processor B tries to read data from the same address a, a detects an address conflict and responds to the relevant data. At this time, both A and B load the cache in the shared state.

3. In C, when B wants to write to the shared address a, it changes the status to modified and broadcasts a to remind it to set its cache block status to invalid.

4. in D, when a tries to read data from a, it broadcasts its request, and B sends the modified data to a and primary storage, and set the status of the two replicas to shared to respond.


For more details about cache consistency protocols, refer to this http://blog.csdn.net/realxie/article/details/7317630


One of the biggest problems with the cache consistency protocol is that it may cause a cache consistency traffic storm. Previously, we saw that the bus can only be used by one processor at the same time. When a large number of caches are modified, or when the same cache block is modified all the time, a large amount of consistent cache traffic is generated, occupying the bus and affecting other normal read/write requests.


One of the most common examples is that if multiple threads keep using CAS for the same variable, there will be a lot of modifications, resulting in a lot of consistent cache traffic, because each CAS operation sends a broadcast notification to other processors, thus affecting program performance.


We will talk about how to optimize this method later.


For the second problem, the memory barrier is usually used to process the time difference from the high-speed cache to the primary memory. A special topic will be introduced later.


Reprinted please indicate Source: http://blog.csdn.net/iter_zc



High concurrency (5) Understanding cache consistency protocol and its impact on concurrent programming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.