The idea of thinking makes us think that atomic variables are always faster than synchronous operations, and I have always believed that, until one test in the process of implementing an ID generator, it happened that it was not all.
Test code:
Import Java.util.arraylist;import Java.util.list;import Java.util.concurrent.atomic.atomicinteger;public class Concurrentadder {private static final Atomicinteger Atomic_integer = new Atomicinteger (0); private static int I = 0; Private static Final Object o = new Object (); Private static volatile long start; public static void Main (final string[] args) {//each thread executes how many times the summation int round = 10000000; Thread count int threadn = 20; Start = System.currenttimemillis (); Atomicadder (THREADN, round); Syncadder (THREADN, round); } static void Atomicadder (int threadn, int addtimes) {int stop = THREADN * addtimes; list<thread> list = new arraylist<thread> (); for (int i = 0; i < Threadn; i++) {List.add (startatomic (Addtimes, stop)); } for (Thread each:list) {Each.start (); }} static Thread startatomic (final int addtimes, final int stop) {threadret = new Thread (new Runnable () {@Override public void run () {for (int i = 0; I < ; Addtimes; i++) {int v = atomic_integer.incrementandget (); if (stop = = v) {System.out.println ("value:" + V); System.out.println ("Elapsed (ms):" + (System.currenttimemillis ()-start)); System.exit (1); } } } }); Ret.setdaemon (FALSE); return ret; } static void Syncadder (int threadn, int addtimes) {int stop = THREADN * addtimes; list<thread> list = new arraylist<thread> (); for (int i = 0; i < Threadn; i++) {List.add (Startsync (Addtimes, stop)); } for (Thread each:list) {Each.start (); }} static thread Startsync (final int addtimes, final int stop) {Thread ret = new Thread (new Runnable () { @Override public void Run () {for (int i = 0; i < addtimes; i++) {Sync Hronized (o) {i++; if (stop = = i) {System.out.println ("value:" + i); System.out. println ("Elapsed (ms):" + (System.currenttimemillis ()-start)); System.exit (1); } } } } }); Ret.setdaemon (FALSE); return ret; }}
This is a very simple accumulator, n threads accumulate concurrently, each thread accumulates r times.
Comment separately
Atomicadder (THREADN, round);//Atom variable accumulation syncadder (THREADN, round);//Synchronous summation
Executes another row in a row
The configuration of the author machine: i5-2520m 2.5G four core
N=20
r=10000000
Results:
Atomic summation: 15344 MS
Synchronous accumulation: 10647 ms
The question comes out, why does the synchronization accumulate about 50% faster than the atomic summation?
@ We know that the Java lock process is (built-in sync is similar to an explicit lock), and the thread to be locked checks if the lock is occupied. Joins the waiting queue to the target lock if it is occupied. If not, add lock.
Here each thread acquires the lock accumulation and immediately goes to acquire the lock, when the other thread has not been awakened, and the lock is taken by the current thread. This is the problem of starvation that can be caused by unfair locking.
But does this reason not explain the 50% performance improvement? Theoretically, in an absolute time, there is always a thread that accumulates successfully, so the time-consuming of the two accumulators should be approximate.
So what has improved the performance of synchronous accumulation, or what has reduced the performance of atomic accumulation?
@ Next I perf the execution of the two accumulators separately:
The first time an atomic accumulator is executed, and a second synchronous accumulator is executed.
[Email protected]:/data$ perf stat-e cs-e l1-dcache-load-misses java concurrentaddervalue:100000000elapsed (ms): 8580 Performance counter stats for ' Java concurrentadder 1 1000000 ': 21,841 cs 233,140,754 l1-dcache-load-misses< c2/>8.633037253 seconds Time Elapsed
[Email protected]:/data$ perf stat-e cs-e l1-dcache-load-misses java concurrentaddervalue:100000000elapsed (ms): 5749 Performance counter stats for ' Java concurrentadder 2 1000000 ': 55,522 cs 28,160,673 l1-dcache-load-misses< c2/>5.811499179 seconds Time Elapsed
As we can see, the context switch of synchronous accumulation is more than that of atoms, which is understandable, and the lock itself increases the switching of threads.
Again, the L1 cache failure of the Atom Accumulator is one order of magnitude higher than the synchronous accumulator.
The author enlightened that atomic operations cause cache consistency problems, which results in frequent cache rows being invalidated. Cache consistency Protocol MESI see: Http://en.wikipedia.org/wiki/MESI_protocol
However, the synchronization accumulator acquires the lock operation repeatedly over a CPU cycle, and the cache is not invalidated.
The output of each cumulative thread ID is then revealed that the distribution of the atoms ' cumulative threads is much more dispersed.
Back to the question, why do we always think that atomic operations are faster than locking? The example in this article is very special, in the normal business scenario, we add up after, to go through a lot of business code logic to accumulate again, here has crossed many CPU time slices. Thus synchronous accumulators are difficult to acquire to the lock all the time, in which case the synchronous accumulator will have the performance penalty of waiting for lock-up and the performance loss due to cache consistency. So in general, the synchronization accumulator will be much slower.
Talking about the efficiency of Java atom variables and synchronization--to subvert your outlook on life