High concurrency (7) implementation of several spin locks (2)

Source: Internet
Author: User

Talking about high concurrency (6) implementation of several spin locks (1) This article implements two basic spin locks: taslock and ttaslock. Their problems are frequent CAS operations, leading to a large amount of consistent cache traffic, resulting in poor lock performance.


One improvement of ttaslock is backofflock, which rolls back the thread in case of high lock contention to reduce competition and cache consistency traffic. However, backofflock has three major problems:

1. There is still a large amount of consistent cache traffic, because all threads rotate on the same shared variable, each successful lock acquisition will generate consistent cache traffic

2. Because of rollback, the lock release information cannot be obtained in time. There is a time difference, resulting in a longer lock acquisition time.

3. No hunger is guaranteed, and some threads may not be able to obtain the lock all the time.


This article will implement two queue-based locks to solve the three problems mentioned above. The main idea is to organize threads into a queue, which has four advantages:

1. Each thread only needs to check the status of its precursor thread and distribute spin variables from one to multiple to reduce cache consistency traffic.

2. Even if the lock is released

3. the queue provides the fairness of the first-come-first-served service.

4. No hunger. Every thread in the queue can be guaranteed to be executed


Queue locks are classified into two types: bounded queues and unbounded queues.


First, let's take a look at the queue Lock Based on the bounded queue. Arraylock has three features:

1. Organize threads based on a volatile Array

2. An atomic variable tail is used to represent the end thread.

3. Use a threadlocal variable to give each thread an index number, indicating the location of the thread in the queue.


Package COM. test. lock; import Java. util. concurrent. atomic. atomicinteger;/*** use a volatile array to organize threads with a bounded queue lock. * The disadvantage is that you must know the thread size N in advance, the number of times that all threads can obtain the same lock cannot exceed n * if l locks are applied, the space complexity of the lock is O (LN) * **/public class arraylock implements lock {// use the volatile array to store lock marks. Flags [I] = true indicates that the private volatile Boolean [] flags can be obtained; // point to the next position of the new node, private atomicinteger tail; // total capacity, private final int capacity; private threadlocal <integer> myslotinde X = new threadlocal <integer> () {protected integer initialvalue () {return 0 ;}}; public arraylock (INT capacity) {This. capacity = capacity; flags = new Boolean [capacity]; tail = new atomicinteger (0); // by default, the first position can obtain the lock flags [0] = true ;} @ overridepublic void lock () {int slot = tail. getandincrement () % capacity; myslotindex. set (slot); // flags [Slot] = true indicates that the lock is obtained, and the volatile variable ensures that the lock release notifies the while (! Flags [Slot]) {}}@ override public void unlock () {int slot = myslotindex. get (); flags [Slot] = false; flags [(slot + 1) % capacity] = true ;} <PRE name = "code" class = "Java"> Public String tostring () {return "arraylock ";}

}

 

We can see that the disadvantages of the bounded queue lock are:

1. It must know the number of threads. If the number of times the same lock is obtained exceeds N, the thread state will be overwritten.

2. The space complexity is O (LN)

3. There may still be cache consistency for the shared volatile array to save the thread's lock obtaining status. We know that when the CPU reads memory once, it will fully read the bit length of the Data Bus, such as the 64-bit bus, and read data of the 64-bit length at a time. For an array of the boolean type, the length of a Boolean value is 1 byte, so 8 Boolean variables can be read at a time, and the length of a cache block in the cache is also 64-bit, that is to say, eight Boolean variables can be saved in a cache block. Therefore, if a variable is modified in a CAS operation, a cache block becomes invalid. In fact, eight variables may become invalid.

The solution is to distribute the variables in 8 lengths, for example, flag [0] = thread1 flag [8] = thread2. The problem is that it consumes more space.


Unbounded queue locks can overcome several problems of bounded queue locks.

1. It uses a linked list instead of an array to implement unbounded queues.

2. Use two threadlocal variables to indicate the pointer, one pointing to its own node, and the other pointing to the previous node.

3. Use an atomic reference variable to point to the team end

4. reduced space complexity. If there are L locks and n threads, each thread obtains only one lock, the space complexity is O (L + n)

5. For the same lock, a thread can obtain the lock Multiple times without increasing the space complexity.

6. When the thread ends, GC automatically recycles the memory.


Package COM. test. lock; import Java. util. concurrent. atomic. atomicreference;/*** unbounded queue lock, using a linked list to organize threads * assuming l locks and n threads, the space complexity of the lock is O (L + n) * **/public class clhlock implements lock {// The Atomic variable points to the private atomicreference <qnode> tail; // two pointers, one pointing to the node, one refers to the previous nodethreadlocal <qnode> mynode; threadlocal <qnode> myprenode; Public clhlock () {tail = new atomicreference <qnode> (New qnode ()); mynode = new threadlocal <qnode> () {protected qnode initialvalue () {return New qnode () ;}; myprenode = new threadlocal <qnode> () {protected qnode initialvalue () {return NULL ;};}@ overridepublic void lock () {qnode node = mynode. get (); node. lock = true; // CAS atomic operation to ensure atomicity of qnode prenode = tail. getandset (node); myprenode. set (prenode); // volatile variable, which can ensure timely notification of lock release // only spin the status of the previous node, reducing cache consistent traffic while (prenode. lock) {}@ overridepublic void unlock () {qnode node = mynode. get (); node. lock = false; // point mynode to prenode to ensure that the lock can be used by the same thread next time, because the original node that mynode points to has the prenode reference of the last node // prevents this thread from mynode when it is locked next time. get to get the original node mynode. set (myprenode. get ();} public static class qnode {volatile Boolean lock;} Public String tostring () {return "clhlock ";}}

We will test the two locks in terms of correctness and average lock acquisition time.

Let's design a test case to verify the correctness: use 50 threads to operate a volatile variable ++. Because the volatile variable ++ operation is not atomic, without locking, the voaltile variable ++ may have multiple threads simultaneously, and the final result is unpredictable. Then use these two locks to obtain the lock first and then volatile variable ++. Because the volatile variable will prevent re-sorting and ensure visibility, we can determine that if the lock is obtained correctly, that is to say, if there is only one thread for the volatile variable ++ at the same time, the result must be 1 to 50 in sequence.

First look at the case of no lock

package com.test.lock;public class Main {//private static Lock lock = new ArrayLock(150);private static Lock lock = new CLHLock();//private static TimeCost timeCost = new TimeCost(new TTASLock());private static volatile int value = 0;public static void method(){//lock.lock();System.out.println("Value: " + ++value);//lock.unlock();}public static void main(String[] args) {for(int i = 0; i < 50; i ++){   Thread t = new Thread(new Runnable(){   @Override              public void run() { method();   }           });  t.start();        }}}

Running result: we can see that the operation of volatile variable ++ is indeed performed by the thread, and the result is unpredictable.

Value: 1Value: 1Value: 2Value: 3Value: 4Value: 5Value: 6Value: 7Value: 8Value: 9Value: 10Value: 11Value: 13Value: 12Value: 14Value: 15Value: 16Value: 17Value: 18Value: 19Value: 20Value: 21Value: 22Value: 23Value: 24Value: 25Value: 26Value: 27Value: 28Value: 29Value: 30Value: 31Value: 32Value: 33Value: 34Value: 35Value: 36Value: 37Value: 38Value: 37Value: 39Value: 40Value: 41Value: 42Value: 43Value: 44Value: 45Value: 46Value: 47Value: 48Value: 50

Use a bounded queue lock:

package com.test.lock;public class Main {private static Lock lock = new ArrayLock(100);//private static Lock lock = new CLHLock();//private static TimeCost timeCost = new TimeCost(new TTASLock());private static volatile int value = 0;public static void method(){lock.lock();System.out.println("Value: " + ++value);lock.unlock();}public static void main(String[] args) {for(int i = 0; i < 50; i ++){Thread t = new Thread(new Runnable(){@Overridepublic void run() {method();}});t.start();}}}

The running result is an ascending order of 1 to 50, which indicates that the lock ensures that only one thread is correct in the volatile variable ++ at the same time.

Value: 1Value: 2Value: 3Value: 4Value: 5Value: 6Value: 7Value: 8Value: 9Value: 10Value: 11Value: 12Value: 13Value: 14Value: 15Value: 16Value: 17Value: 18Value: 19Value: 20Value: 21Value: 22Value: 23Value: 24Value: 25Value: 26Value: 27Value: 28Value: 29Value: 30Value: 31Value: 32Value: 33Value: 34Value: 35Value: 36Value: 37Value: 38Value: 39Value: 40Value: 41Value: 42Value: 43Value: 44Value: 45Value: 46Value: 47Value: 48Value: 49Value: 50

The use of the unbounded queue lock is also correct. Due to space reasons, no code will be posted here.


Let's look at the average lock acquisition time.

package com.test.lock;public class Main {private static Lock lock = new TimeCost(new CLHLock());//private static Lock lock = new CLHLock();//private static TimeCost timeCost = new TimeCost(new TTASLock());private static volatile int value = 0;public static void method(){lock.lock();//System.out.println("Value: " + ++value);lock.unlock();}public static void main(String[] args) {for(int i = 0; i < 100; i ++){Thread t = new Thread(new Runnable(){@Overridepublic void run() {method();}});t.start();}}}

In the case of 100 concurrent threads,

The average time for arraylock to obtain the lock is: 719550 ns.

The average time for clhlock to obtain the lock is: 488577 NS


As you can see, the queue lock reduces consistent traffic when multiple shared variables are used for spin, which improves program performance compared with taslock and ttaslock. Clhlock has better scalability and performance than arraylock, and is a good implementation of spin locks.


High concurrency (7) implementation of several spin locks (2)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.