Pseudo-Share and cache row padding, from Java 6, Java 7 to Java 8

Source: Internet
Author: User
Tags volatile

About the pseudo-share of the article has been many, for multithreaded programming, especially the multi-threaded processing lists and arrays, we should pay great attention to the problem of pseudo-sharing. Otherwise, not only can not play the advantages of multithreading, but also may be worse than single-threaded performance. With the Java version of the update, then the different versions of the practice of reducing pseudo-sharing are different, accidentally code may be invalidated, pay attention to test. This article summarizes.

What is pseudo-sharing

The most clear explanation about pseudo-sharing is this article, "Anatomy disruptor: Why is it so fast?" (c) pseudo-sharing, here I will directly excerpt its interpretation of pseudo-sharing:

The caching system is stored in cache line. The cache line is a 2 integer that is a power of contiguous bytes, typically 32-256 bytes. The most common cache row size is 64 bytes. When multithreading modifies variables that are independent of each other, if these variables share the same cache row, they inadvertently affect each other's performance, which is pseudo-sharing. Write contention on cache lines is the most important limiting factor in the scalability of parallel threads running in SMP systems. Some people describe pseudo-sharing as a silent performance killer because it is hard to see whether pseudo-sharing occurs in the code.

In order for scalability to be linearly related to the number of threads, you must ensure that no two threads are written to the same variable or cache line. Two threads write the same variable that can be found in the code. To determine whether separate variables share the same cache line, you need to know the memory layout or find a tool to tell us. Intel VTune is such an analysis tool. In this article I'll explain the memory layout of the Java objects and how we can populate the cache rows to avoid pseudo-sharing.

Figure 1 illustrates the problem with pseudo-sharing. Threads running on core 1 want to update the variable x, while the thread on Core 2 wants to update the variable Y. Unfortunately, these two variables are in the same cache line. Each thread is going to compete for the ownership of the cache row to update the variable. If core 1 gets ownership, the cache subsystem will invalidate the corresponding cache line in Core 2. When Core 2 takes ownership and then performs the update operation, Core 1 will invalidate its corresponding cache line. This will go back and forth through the L3 cache, which greatly affects performance. If the competing cores are in different slots, additional cross-socket connections are possible, and the problem may be more severe.

Scenarios under JAVA 6

The way to resolve a pseudo-share is to use a cache row population so that an object consumes exactly 64bit or an integer multiple of its size, guaranteeing that there will be no more than one object in a cache row. Anatomy disruptor: Why is it so fast? (c) Pseudo-sharing provides examples of cache line fills:

 Public Final classfalsesharingImplementsRunnable { Public Final Static intNum_threads = 4;// Change     Public Final Static Longiterations = 500L * 1000L * 1000L; Private Final intarrayindex; Private StaticVolatilelong[] longs =NewVolatilelong[num_threads]; Static     {          for(inti = 0; i < longs.length; i++) {Longs[i]=NewVolatilelong (); }     }        PublicFalsesharing (Final intarrayindex) {          This. arrayindex =arrayindex; }        Public Static voidMainFinalString[] args)throwsException {Final LongStart =System.nanotime ();         Runtest (); System.out.println ("duration =" + (System.nanotime ()-start)); }       Private Static voidRuntest ()throwsinterruptedexception {thread[] threads=NewThread[num_threads];  for(inti = 0; i < threads.length; i++) {Threads[i]=NewThread (Newfalsesharing (i)); }            for(Thread t:threads) {T.start (); }            for(Thread t:threads) {t.join (); }     }        Public voidrun () {Longi = iterations + 1;  while(0! =-i) {longs[arrayindex].value=i; }     }        Public Final Static classVolatilelong { Public volatile LongValue = 0L;  Public LongP1, p2, p3, P4, P5, P6;//Comment out    } }

Volatilelong by filling in some useless field p1,p2,p3,p4,p5,p6, and considering that the object header also occupies 8bit, it just extends the memory occupied by the object to just 64bit (or an integer multiple of 64bit). This avoids the loading of multiple objects in a cache line. But this method now only adapts to JAVA6 and the previous version.

(Note: If our padding makes the object size greater than 64bit, such as multi-filled 16bit–public long P1, p2, p3, P4, P5, P6, P7, P8;.) It is also theoretically necessary to avoid pseudo-sharing problems, but the fact is that the execution speed is also several times slower than the unused padding. has not yet understood its cause. So test it down, must be an integer multiple of 64bit)

Scenarios under JAVA 7

The above example is no longer applicable under Java 7. Because Java 7 optimizes unused fields, refer to "False sharing && java 7".

As a result, JAVA 7 makes cache row padding more cumbersome, need to use the inheritance method to avoid the padding is optimized, "False sharing && JAVA 7" In the example I think it is not very good, so I did some optimization, make it more generic:

 Public Final classFalsesharingImplementsRunnable { Public Static intNum_threads = 4;// Change     Public Final Static Longiterations = 500L * 1000L * 1000L; Private Final intarrayindex; Private Staticvolatilelong[] longs;  PublicFalsesharing (Final intarrayindex) {           This. arrayindex =arrayindex; }         Public Static voidMainFinalString[] args)throwsException {thread.sleep (10000); System.out.println ("Starting ..."); if(Args.length = = 1) {num_threads= Integer.parseint (args[0]); } longs=NewVolatilelong[num_threads];  for(inti = 0; i < longs.length; i++) {Longs[i]=NewVolatilelong (); }          Final LongStart =System.nanotime ();          Runtest (); System.out.println ("duration =" + (System.nanotime ()-start)); }        Private Static voidRuntest ()throwsinterruptedexception {thread[] threads=NewThread[num_threads];  for(inti = 0; i < threads.length; i++) {Threads[i]=NewThread (Newfalsesharing (i)); }           for(Thread t:threads) {T.start (); }           for(Thread t:threads) {t.join (); }      }         Public voidrun () {Longi = iterations + 1;  while(0! =-i) {longs[arrayindex].value=i; }      }  }
 Public class volatilelongpadding {    publicvolatilelong//  comment   }
 Public class extends volatilelongpadding {    publicvolatilelong value = 0L;  }

You can avoid optimization by placing padding in the base class. (This seems to make no sense, JAVA7 memory optimization algorithm problem, can be around). However, this approach is a bit annoying, borrowing another blogger: it's hard to be a Java programmer.

Scenarios under JAVA 8

In Java 8, the cache row population was finally supported by Java native. JAVA 8 Adds a @contended annotation that adds this annotation, which is automatically populated with cached rows. The above example can be changed to:

 Public Final classFalsesharingImplementsRunnable { Public Static intNum_threads = 4;// Change     Public Final Static Longiterations = 500L * 1000L * 1000L; Private Final intarrayindex; Private Staticvolatilelong[] longs;  PublicFalsesharing (Final intarrayindex) {           This. arrayindex =arrayindex; }         Public Static voidMainFinalString[] args)throwsException {thread.sleep (10000); System.out.println ("Starting ..."); if(Args.length = = 1) {num_threads= Integer.parseint (args[0]); } longs=NewVolatilelong[num_threads];  for(inti = 0; i < longs.length; i++) {Longs[i]=NewVolatilelong (); }          Final LongStart =System.nanotime ();          Runtest (); System.out.println ("duration =" + (System.nanotime ()-start)); }        Private Static voidRuntest ()throwsinterruptedexception {thread[] threads=NewThread[num_threads];  for(inti = 0; i < threads.length; i++) {Threads[i]=NewThread (Newfalsesharing (i)); }           for(Thread t:threads) {T.start (); }           for(Thread t:threads) {t.join (); }      }         Public voidrun () {Longi = iterations + 1;  while(0! =-i) {longs[arrayindex].value=i; }      }  }
Import sun.misc.Contended; @Contended  Public class Volatilelong {    publicvolatilelong value = 0L;  }

When executed, the virtual machine parameter-xx:-restrictcontended must be added, @Contended comment will not take effect. Many articles leave this out, so that doesn't really work.

@Contended comments can also be added to the field, and later write the article in detail about its usage.

Reference

Http://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html

Http://mechanical-sympathy.blogspot.hk/2011/08/false-sharing-java-7.html

Http://robsjava.blogspot.com/2014/03/what-is-false-sharing.html

Pseudo-Share and cache row padding, from Java 6, Java 7 to Java 8

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.