Pseudo-sharing and cache row filling, from Java 6, Java 7 to Java 8, java

Source: Internet
Author: User

Pseudo-sharing and cache row filling, from Java 6, Java 7 to Java 8, java

There are already a lot of articles about pseudo-sharing. For multi-threaded programming, especially when processing lists and arrays with multiple threads, you should pay attention to the issue of pseudo-sharing. Otherwise, the multi-thread performance may be worse than that of a single thread. With the upgrade of the JAVA version, there is a difference in the practice of reducing pseudo-sharing in each version, and the code may become invalid if you are not careful with it. Be sure to perform the test. This article is a summary.

 

What is pseudo-sharing?

The most clear explanation about pseudo-sharing is the article analysis Disruptor: Why is it so fast? (3) pseudo-sharing:

 

The cache system stores data in the unit of cache line. The cache row is an integer power of 2 consecutive bytes, generally 32-bytes. The most common cache row size is 64 bytes. When multiple threads modify mutually independent variables, if these variables share the same cache row, they will inadvertently affect the performance of each other. This is pseudo-sharing. The write competition on the cache row is the most important limiting factor for the scalability of parallel threads running in the SMP system. Some people describe pseudo-sharing as a silent performance killer, because it is hard to see whether there will be pseudo-sharing in the code.

To linearly associate scalability with the number of threads, ensure that no two threads are written to the same variable or cache row. Two threads can write the same variable in the code. To determine whether independent variables share the same cache row, you need to know the memory layout or find a tool to tell us. Intel VTune is such an analysis tool. In this article, I will explain the memory layout of Java objects and how to fill cache rows to avoid pseudo-sharing.

Figure 1 illustrates the problem of pseudo-sharing. The thread running on core 1 wants to update variable X, and the thread on Core 2 wants to update variable Y. Unfortunately, these two variables are in the same cache row. Each thread needs to compete for the ownership of the cache row to update the variable. If core 1 has ownership, the cache subsystem will invalidate the corresponding cache row in Core 2. When Core 2 acquires ownership and performs the update operation, core 1 will invalidate its corresponding cache row. This will go back and forth through the L3 cache, greatly affecting the performance. If the competing core is located in different slots, an extra cross-slot connection is required, and the problem may be more serious.

 

Solutions in JAVA 6

The solution to pseudo-sharing is to use the cache row to fill in so that an object occupies exactly 64 bytes of memory or an integer multiple of it. This ensures that no multiple objects exist in a cache row. Profiling Disruptor: Why is it so fast? (3) pseudo-sharing provides examples of filling cache rows:

public final class FalseSharing     implements Runnable {     public final static int NUM_THREADS = 4; // change     public final static long ITERATIONS = 500L * 1000L * 1000L;     private final int arrayIndex;       private static VolatileLong[] longs = new VolatileLong[NUM_THREADS];     static     {         for (int i = 0; i < longs.length; i++)         {             longs[i] = new VolatileLong();         }     }       public FalseSharing(final int arrayIndex)     {         this.arrayIndex = arrayIndex;     }       public static void main(final String[] args) throws Exception     {         final long start = System.nanoTime();         runTest();         System.out.println("duration = " + (System.nanoTime() - start));     }       private static void runTest() throws InterruptedException     {         Thread[] threads = new Thread[NUM_THREADS];           for (int i = 0; i < threads.length; i++)         {             threads[i] = new Thread(new FalseSharing(i));         }           for (Thread t : threads)         {             t.start();         }           for (Thread t : threads)         {             t.join();         }     }       public void run()     {         long i = ITERATIONS + 1;         while (0 != --i)         {             longs[arrayIndex].value = i;         }     }       public final static class VolatileLong     {         public volatile long value = 0L;         public long p1, p2, p3, p4, p5, p6; // comment out     } }

 

VolatileLong fills in useless fields p1, p2, p3, p4, p5, and p6, and considers that the object header also occupies 8 bits, extends the memory occupied by objects to 64 bytes (or an integer multiple of 64 bytes ). This avoids loading multiple objects in a cache row. However, this method can only be adapted to Java 6 and earlier versions.

 

(Note: if the object size is larger than 64 bytes, for example, 16bit-public long p1, p2, p3, p4, p5, p6, p7, and p8 are filled with multiple values ;. In theory, the pseudo-sharing problem should also be avoided. However, in this case, the execution speed is also several times slower, just better than without filling. I have not understood the reason. Therefore, the test result must be an integer multiple of 64 bytes)

 

Solutions in JAVA 7

The above example is no longer applicable in JAVA 7. Because JAVA 7 will optimize useless fields, you can refer to False Sharing & Java 7.

 

Therefore, it is more difficult to fill cache lines in JAVA 7. You need to use the inheritance method to avoid the padding being optimized, I think the example in "False Sharing & Java 7" is not very good, so I made some optimizations to make it more common:

public final class FalseSharing implements Runnable {      public static int NUM_THREADS = 4; // change      public final static long ITERATIONS = 500L * 1000L * 1000L;      private final int arrayIndex;      private static VolatileLong[] longs;        public FalseSharing(final int arrayIndex) {          this.arrayIndex = arrayIndex;      }        public static void main(final String[] args) throws Exception {          Thread.sleep(10000);          System.out.println("starting....");          if (args.length == 1) {              NUM_THREADS = Integer.parseInt(args[0]);          }            longs = new VolatileLong[NUM_THREADS];          for (int i = 0; i < longs.length; i++) {              longs[i] = new VolatileLong();          }          final long start = System.nanoTime();          runTest();          System.out.println("duration = " + (System.nanoTime() - start));      }        private static void runTest() throws InterruptedException {          Thread[] threads = new Thread[NUM_THREADS];          for (int i = 0; i < threads.length; i++) {              threads[i] = new Thread(new FalseSharing(i));          }          for (Thread t : threads) {              t.start();          }          for (Thread t : threads) {              t.join();          }      }        public void run() {          long i = ITERATIONS + 1;          while (0 != --i) {              longs[arrayIndex].value = i;          }      }  }
Public class VolatileLongPadding {public volatile long p1, p2, p3, p4, p5, p6; // note}
public class VolatileLong extends VolatileLongPadding {    public volatile long value = 0L;  }

 

Put padding in the base class to avoid optimization. (It seems that there is no reason to talk about it. The Memory Optimization Algorithm of Java 7 can be bypassed ). However, this method is a bit annoying. If I borrow another blogger: It's really hard to be a java programmer.

 

 

Solutions in JAVA 8

In JAVA 8, cache row padding is finally supported by JAVA Native. A @ Contended annotation is added to JAVA 8. The annotation will be automatically filled with cache lines. The preceding example can be changed:

public final class FalseSharing implements Runnable {      public static int NUM_THREADS = 4; // change      public final static long ITERATIONS = 500L * 1000L * 1000L;      private final int arrayIndex;      private static VolatileLong[] longs;        public FalseSharing(final int arrayIndex) {          this.arrayIndex = arrayIndex;      }        public static void main(final String[] args) throws Exception {          Thread.sleep(10000);          System.out.println("starting....");          if (args.length == 1) {              NUM_THREADS = Integer.parseInt(args[0]);          }            longs = new VolatileLong[NUM_THREADS];          for (int i = 0; i < longs.length; i++) {              longs[i] = new VolatileLong();          }          final long start = System.nanoTime();          runTest();          System.out.println("duration = " + (System.nanoTime() - start));      }        private static void runTest() throws InterruptedException {          Thread[] threads = new Thread[NUM_THREADS];          for (int i = 0; i < threads.length; i++) {              threads[i] = new Thread(new FalseSharing(i));          }          for (Thread t : threads) {              t.start();          }          for (Thread t : threads) {              t.join();          }      }        public void run() {          long i = ITERATIONS + 1;          while (0 != --i) {              longs[arrayIndex].value = i;          }      }  }
import sun.misc.Contended;@Contendedpublic class VolatileLong {    public volatile long value = 0L;  }

 

During execution, the virtual machine parameter-XX:-RestrictContended must be added, and the @ Contended annotation will take effect. Many articles have missed this, so it does not actually work.

 

@ Contended comments can also be added to fields. In the future, I will write an article to describe its usage in detail.

 

(Note: The above code is based on a 32-bit JDK test. In a 64-bit JDK, the object header size is different. If you have time, try again)

 

Reference

Http://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html

Http://mechanical-sympathy.blogspot.hk/2011/08/false-sharing-java-7.html

Http://robsjava.blogspot.com/2014/03/what-is-false-sharing.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.