Talk about high concurrency (vii) Implementation of several spin locks (ii)

Source: Internet
Author: User

Talking about high concurrency (vi) Implementation of several spin locks (i) two major spin locks are implemented in this article: Taslock and Ttaslock, and their problem is frequent CAS operations. A large amount of cache consistency traffic is raised, resulting in poor lock performance.


An improvement to Ttaslock is Backofflock, which will rollback the thread in case of high contention, reduce contention, and reduce cache consistency traffic. But Backofflock has three basic questions:

1. There is still a lot of cache consistency traffic, and since all threads are rotated on the same shared variable, each successful fetch lock will generate cache consistency traffic

2. The lock release information cannot be obtained in time due to the existence of the fallback. There is a time difference that causes the lock to get longer

3. There is no guarantee of starvation, and some threads may not be able to acquire locks


This article will implement 2 queue-based locks to address the three issues mentioned above.

The basic idea is to organize threads into a queue. There are 4 advantages:

1. Each thread only needs to check the state of its predecessor thread, scatter the spin variable from one to multiple, and reduce the cache consistency traffic

2. Ability to get notification of lock release even if

3. Queue provides first-come-first-serve fairness

4. No hunger, every thread in the queue can be guaranteed to run to


Queue locks fall into two categories. A class is based on bounded queues, and one is based on unbounded queues.


First look at queue locks based on bounded queues. There are 3 features of Arraylock:

1. Organizing threads based on a volatile array

2. Using an atomic variable tail to represent the tail thread

3. Pass a threadlocal variable to the index of each thread. Indicates where it is located in the queue.


Package Com.test.lock;import java.util.concurrent.atomic.atomicinteger;/** * Bounded queue lock. Using a volatile array to organize threads * The disadvantage is to know in advance the size of the thread n. The number of times that all threads acquire the same lock cannot exceed n * if L lock, then the space complexity of the lock is O (Ln) * **/public class Arraylock implements lock{//use a volatile array to hold the lock flag, flags[i] = t Rue indicates the ability to obtain a lock private volatile boolean[] flags;//points to the next location of the newly added node private atomicinteger tail;//Total capacity private final int capacity;        Private threadlocal<integer> Myslotindex = new threadlocal<integer> () {protected Integer InitialValue () { return 0; }};p ublic arraylock (int capacity) {this.capacity = Capacity;flags = new Boolean[capacity];tail = new Atomicinteger (0);//default The first position can be obtained lock flags[0] = true;} @Overridepublic void Lock () {int slot = Tail.getandincrement ()% Capacity;myslotindex.set (slot);//flags[slot] = = True indicates Got the lock.       The volatile variable guarantees that the lock is released in time (!flags[slot]) {}} @Override public void unlock () {int slot = Myslotindex.get ();       Flags[slot] = false; flags[(slot + 1)% capacity] = true;}        <pre name= "code" class= "Java" >Public String toString () {         return "Arraylock";    } 

}


The disadvantage of being able to see bounded queue locks is:

1. It must know the size of the thread. The thread state is overwritten for the same lock assuming that the thread gets more times than n

2. Spatial complexity is O (Ln)

3. For a shared volatile array, save the thread to get the state of the lock. There may still be cache consistency. We know that when the CPU reads memory once, it reads the bit length of the data bus, for example, 64-bit bus. Reads 64-bit-length data at a time.

So for arrays of type Boolean. The boolean length is 1 bytes, so a read can read to 8 Boolean variables. While the length of a cache block that is cached is also 64 bits, which means that 8 Boolean variables can be saved on a cache block, assuming that a single CAS operation changes a variable causing a cache block to be invalid, it can actually cause 8 variables to fail.

The solution is to scatter the variable in 8 lengths. For example flag[0] = thread1 flag[8] = thread2. The problem is that it consumes more space.


The unbounded queue lock can overcome several problems of bounded queue lock.

1. It uses a linked list instead of an array to implement unbounded queues

2. Use two threadlocal variables to represent the pointer. A node pointing to itself, one pointing to the previous node

3. Use an atomic reference variable to point to the end of the team

4. Space complexity is reduced, assuming L lock. n Threads. Each thread acquires only one lock, so the space complexity is O (L + N)

5. For the same lock, a thread can fetch multiple times without adding space complexity

6. When the thread ends. GC will proactively reclaim memory


Package Com.test.lock;import java.util.concurrent.atomic.atomicreference;/** * Unbounded queue lock, use a linked list to organize threads * if L lock. n threads, then the space complexity of the lock is O (l+n) * **/public class Clhlock implements lock{//atomic variable points to the end of the team private atomicreference<qnode> tail;// Two pointers, one pointing to their own node, one pointing to the previous nodethreadlocal<qnode> Mynode; Threadlocal<qnode> myprenode;public Clhlock () {tail = new atomicreference<qnode> (new QNode ()); MyNode = new Threadlocal<qnode> () {protected Qnode InitialValue () {return new Qnode ();}}; Myprenode = new Threadlocal<qnode> () {protected Qnode InitialValue () {return null;}};} @Overridepublic void Lock () {qnode node = Mynode.get (); node.lock = true;//CAs atomic operation, guaranteed atomicity qnode Prenode = Tail.getandset (nod e); Myprenode.set (prenode);//volatile variable. can guarantee the lock release timely notification//only to the state of the previous node spin, reduce the cache consistency traffic while (Prenode.lock) {}} @Overridepublic void Unlock () {qnode node = Mynode.get (); Node.lock = false;//points mynode to Prenode. The goal is to ensure that the same thread will be able to use the lock next time, since Mynode originally pointed to a node that has its Prenode reference to the latter node//prevents the thread from Mynode.get getting the original node mynode.set the next time it is lock (Myprenode.get ( ));}        public static class Qnode {volatile Boolean lock;} Public String toString () {         return "Clhlock";    } }

Here we test both types of locks from the time of correctness and average acquisition of locks.

We designed a test example to verify correctness: Use 50 threads for a volatile variable + + operation, because the volatile variable + + operation is not atomic, without locking. There may be multiple threads at the same time on the Voaltile variable + + at the same time, and finally the result is impossible to predict.

Then use these two locks, first get the lock and then volatile variable + +. Because volatile variables prevent reordering. and can guarantee the visibility, we can determine the assumption that the lock is correctly obtained, that is, the same time only one thread to the volatile variable + +, then the result is definitely 1 to 50 of the order.

First look at the situation without locking

Package Com.test.lock;public class Main {//private static lock lock = new Arraylock;p rivate static lock lock = new CL Hlock ();//private static Timecost timecost = new Timecost (new Ttaslock ());p rivate static volatile int value = 0;public STA Tic void Method () {//lock.lock (); System.out.println ("Value:" + ++value);//lock.unlock ();} public static void Main (string[] args) {for (int i = 0; i <; i + +) {   thread t = new Thread (new Runnable () {   @Ov Erride public              Void Run () {method ();   }           });  T.start ();}}}        

Execution result: we can see that the thread that is actually occurring is the same time the operation of the volatile variable + +, the result is unpredictable

Value:1value:1value:2value:3value:4value:5value:6value:7value:8value:9value:10value:11value:13value:12value: 14value:15value:16value:17value:18value:19value:20value:21value:22value:23value:24value:25value:26value:27va Lue:28value:29value:30value:31value:32value:33value:34value:35value:36value:37value:38value:37value:39value: 40value:41value:42value:43value:44value:45value:46value:47value:48value:50

To use a bounded queue lock:

Package Com.test.lock;public class Main {private static lock lock = new Arraylock;//private static lock lock = new CL Hlock ();//private static Timecost timecost = new Timecost (new Ttaslock ());p rivate static volatile int value = 0;public STA Tic void Method () {Lock.lock (); System.out.println ("Value:" + ++value); Lock.unlock ();} public static void Main (string[] args) {for (int i = 0; i <; i + +) {thread t = new Thread (new Runnable () {@Overridepubl IC void Run () {method ();}}); T.start ();}}}

The result of the execution is a sequence of 1 to 50 self-increment. The description lock ensures that the same moment only has one thread on the volatile variable + +, which is correct

Value:1value:2value:3value:4value:5value:6value:7value:8value:9value:10value:11value:12value:13value:14value : 15value:16value:17value:18value:19value:20value:21value:22value:23value:24value:25value:26value:27value:28v  Alue:29value:30value:31value:32value:33value:34value:35value:36value:37value:38value:39value:40value:41value: 42value:43value:44value:45value:46value:47value:48value:49value:50

The use of unbounded queue locks is also correct. Because of the length of the reason here is no code.


Then look at the average time to acquire the lock.

Package Com.test.lock;public class Main {private static lock lock = new Timecost (new Clhlock ());//private Static lock lock = new Clhlock ();//private static Timecost timecost = new Timecost (new Ttaslock ());p rivate static volatile int value = 0;p Ublic static void Method () {Lock.lock ();//system.out.println ("Value:" + ++value); Lock.unlock ();} public static void Main (string[] args) {for (int i = 0; i <; i + +) {thread t = new Thread (new Runnable () {@Overridepub LIC void Run () {method ();}}); T.start ();}}}

In the case of 100 threads concurrency.

Arraylock average time to acquire a lock is: 719550 NS

Clhlock Average Time acquisition Lock: 488577 NS


View, reduced consistent traffic in the case of multiple shared variable queue spin locks. Improve the performance of the program than Taslock and Ttaslock. and Clhlock better scalability and performance than Arraylock, which is a very nice spin lock to perform.


Talk about high concurrency (vii) Implementation of several spin locks (ii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.