Talk about high concurrency (vii) Realization of several spin locks (ii) _

Talk about high concurrency (vii) Realization of several spin locks (ii) __ Multithreading

Last Update:2018-07-28 Source: Internet

Author: User

Tags cas static class volatile

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Talking about high concurrency (vi) Implementation of several spin locks (i) This article has implemented two basic spin locks: Taslock and Ttaslock, their problem is frequent CAS operations, triggering a lot of cache consistency traffic, resulting in poor lock performance.

An improvement to Ttaslock is Backofflock, which returns threads in the case of high contention, reduces competition, and reduces cache consistency traffic. But Backofflock has three major problems:

1. There is also a large number of cached consistent traffic, because all threads rotate on the same shared variable, and each successful fetch lock produces a cached consistency flow

2. Because of the existence of the fallback, can not get the information of the lock release in time, there is a time difference, resulting in the acquisition of the lock for longer

3. No guarantee of starvation, some threads may have been unable to acquire locks

This will implement 2 kinds of queue based locks to solve the three problems mentioned above. The main idea is to organize threads into a queue with 4 advantages:

1. Each thread only needs to check the state of its predecessor thread, scatter the spin variable from one to multiple, reduce the cache consistency traffic

2. Can even get lock release notification

3. Queues provide a first-come-first-served fairness

4. Without starvation, each thread in the queue is guaranteed to be executed to

Queue locks are divided into two categories, one is based on bounded queues and the other is based on unbounded queues.

First look at the queue lock based on bounded queues. Arraylock has 3 features:

1. Organize threads based on a volatile array

2. Using an atomic variable tail to represent the tail thread

3. Give each thread an index number through a threadlocal variable, indicating where it is located in the queue.

Package com.test.lock;

Import Java.util.concurrent.atomic.AtomicInteger; /** * Bounded queue lock, using a volatile array to organize threads * The disadvantage is to know the size of the thread in advance N, all threads to acquire the same lock number of times can not exceed N * assumed l lock, then the space complexity of the lock is O (Ln) * **/public class Arraylo
	
	CK implements lock{//using volatile array to hold the lock flag, flags[i] = True means the lock private volatile boolean[] flags can be obtained;
	
	Point to the latter position of the newly joined node private atomicinteger tail;
	
	Total capacity private final int capacity;
		        Private threadlocal<integer> Myslotindex = new threadlocal<integer> () {protected Integer InitialValue () {
		 return 0;
	
	}
	};
		public arraylock (int capacity) {this.capacity = capacity;
		Flags = new Boolean[capacity];
		tail = new Atomicinteger (0);
	The default first location can get the lock flags[0] = true;
		@Override public void Lock () {int slot = Tail.getandincrement ()% capacity;
		Myslotindex.set (slot); Flags[slot] = = True indicates that a lock was obtained, volatile variable guaranteed lock release in time to notify while (!flags[slot]) {}} @Override public void Unlock (
	       {int slot = Myslotindex.get (); Flags[slot] = false;
	flags[(slot + 1)% capacity] = true; <pre name= "code" class= "java" > Public String toString () {         return "
Arraylock ";     }

}

We can see that the disadvantage of bounded queue locks is:

1. It must know the size of the thread, and for the same lock if the thread gets more times than N, the thread state is overwritten.

2. Space complexity is O (Ln)

3. Cache consistency may still exist for a shared volatile array to hold the thread acquisition lock state. We know that when the CPU reads the memory once, it reads the bit length of the data bus, such as the 64-bit bus, and reads the 64-bit length at a time. So for a Boolean-type array, the boolean length is 1 bytes, then one read can read 8 Boolean variables, and the cache block is 64 bits long, that is to say, a cache block can hold 8 Boolean variables, So if a CAS operation modifies a variable that causes a cache block to be invalid, it could actually cause 8 variables to fail.

The solution is to disperse the variable in 8 units of length, such as flag[0] = thread1 flag[8] = thread2. The problem is that it consumes more space.

The unbounded queue lock can overcome several problems of bounded queue lock.

1. It uses linked lists instead of arrays to implement unbounded queues

2. Use two threadlocal variables to represent pointers, one point to your own node, one point to the previous node

3. Use an atomic reference variable to point to the tail of the team

4. Space complexity is reduced, if there is L lock, n threads, each thread only get a lock, then the space complexity of O (l + N)

5. For the same lock, a thread can be acquired multiple times without increasing the complexity of the space

6. When the thread is finished, the GC automatically reclaims the memory

Package com.test.lock;

Import java.util.concurrent.atomic.AtomicReference;  /** * Unbounded queue lock, using a linked list to organize the thread * Assuming L lock, n threads, then the space complexity of the lock is O (l+n) * **/public class Clhlock implements lock{//atomic variable point to Team tail private
	Atomicreference<qnode> tail;
	Two pointers, a node pointing to itself, one pointing to the previous node threadlocal<qnode> Mynode;
	
	Threadlocal<qnode> Myprenode;
		Public Clhlock () {tail = new atomicreference<qnode> (new Qnode ());
			Mynode = new Threadlocal<qnode> () {protected Qnode InitialValue () {return new Qnode ();
		}
		};
			Myprenode = new Threadlocal<qnode> () {protected Qnode InitialValue () {return null;
	}
		};
		@Override public void Lock () {qnode node = Mynode.get ();
		Node.lock = true;
		CAS atomic operation to ensure atomicity qnode Prenode = tail.getandset (node);
		Myprenode.set (Prenode); 
		Volatile variable, can guarantee the lock release timely notification//only to the previous node's state spin, reduce cache consistency flow while (Prenode.lock) {}} @Override public void unlock () {
		Qnode node = Mynode.get ();
		Node.lock = false; Point the Mynode to the pre.node, the goal is to ensure that the same thread can use the lock next time, because Mynode originally pointed to the node has its next node of the Prenode reference//prevent this thread to lock when Mynode.get get the original node Mynode.set (
	Myprenode.get ());
	public static class Qnode {volatile Boolean lock; Public String toString () {         return "Clhlock";     } &
 nbsp;}

Here we test both of these locks from the time of correctness and average acquisition of locks.

We design a test case to verify correctness: Using 50 threads for a volatile variable + + operation, because the volatile variable + + operation is not atomic, in the absence of locking, there may be multiple threads simultaneously on the voaltile variable + +, the final result is unpredictable. Then use these two locks, first get the lock volatile variable + +, because the volatile variable will prevent reordering, and can guarantee visibility, we can determine if the lock is properly acquired, that is, the same moment only one thread on the volatile variable + +, Then the result must be 1 to 50 of the order.

Let's look at the situation without locking.

Package com.test.lock;

public class Main {
	//private static lock lock = new Arraylock;
	
	private static lock lock = new Clhlock ();
	
	private static Timecost Timecost = new Timecost (new Ttaslock ());
	
	private static volatile int value = 0;
	public static void Method () {
		//lock.lock ();
		System.out.println ("Value:" + ++value);
		Lock.unlock ();
	}
	
	public static void Main (string[] args) {for
		(int i = 0; i < i + +) {
		   thread t = new Thread (new Runnable () { c13/> @Override public
	              void Run () {method
			 ();
		   }
				
	           });
		  T.start ();}}

Run Result: we can see that the thread that is actually happening is also operating on the volatile variable + +, and the result is unpredictable.

Value:1
value:1
value:2
value:3
value:4
value:5 value:6 value:7 value:8 Value:9
value:10
value:11
value:13
value:12
value:14 value:15 value:16 Value:17
value:18
value:19
value:20
value:21
value:22 value:23 value:24
value:25
value:26
value:27
value:28
value:29
value:30
value:31 value:32 value:33
value:34
value:35
value:36
value:37
value:38
value:37
value:39 value:40 value:41
value:42
value:43
value:44
value:45
value:46
value:47
value:48

Using bounded queue locks:

Package com.test.lock;

public class Main {
	private static lock lock = new Arraylock (MB);
	
	private static lock lock = new Clhlock ();
	
	private static Timecost Timecost = new Timecost (new Ttaslock ());
	
	private static volatile int value = 0;
	public static void Method () {
		lock.lock ();
		System.out.println ("Value:" + ++value);
		Lock.unlock ();
	}
	
	public static void Main (string[] args) {for
		(int i = 0; i < i + +) {
			thread t = new Thread (new Runnable () { c13/> @Override public
				void Run () {method
					();
				}
				
			});
			T.start ();}}

The run result is 1 to 50 in order, which means that the lock guarantees the same moment only one thread in the volatile variable + +, is correct

Value:1
value:2
value:3
value:4
value:5
value:6 value:7 value:8 value:9 Value:10
value:11
value:12
value:13
value:14
value:15 value:16 value:17
value:18
value:19
value:20
value:21
value:22
value:23
value:24 value:25 value:26
value:27
value:28
value:29
value:30
value:31
value:32
value:33 value:34 value:35
value:36
value:37
value:38
value:39
value:40
value:41
value:42 value:43 value:44
value:45
value:46
value:47
value:48
value:49
value:50

The use of unbounded queue locks is also correct, due to the length of the reason here does not post code.

Then look at the average time to acquire the lock.

Package com.test.lock;

public class Main {
	private static lock lock = new Timecost (new Clhlock ());
	
	private static lock lock = new Clhlock ();
	
	private static Timecost Timecost = new Timecost (new Ttaslock ());
	
	private static volatile int value = 0;
	public static void Method () {
		lock.lock ();
		System.out.println ("Value:" + ++value);
		Lock.unlock ();
	}
	
	public static void Main (string[] args {for
		(int i = 0; i < i + +) {
			thread t = new Thread (new Runnable () {
	
				@Override public
				void Run () {method
					();
				}
				
			});
			T.start ();}}

In the case of 100 threads concurrency,

Arraylock the average time to acquire a lock is: 719550 NS

Clhlock the average time to acquire a lock is: 488577 NS

As you can see, the queue lock reduces the consistent flow by using multiple shared variable spins, which improves the performance of the program than Taslock and Ttaslock. The Clhlock has better expansibility and performance than Arraylock, and it is a kind of good spin lock implementation.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More