Exchanger source Android Version analysis

Source: Internet
Author: User
Tags cas modulus

The Exchanger is a synchronizer for threads that can pair to swap elements. Each thread calls an object as a parameter to the Exchange method, matches the partner thread, and then returns the object when the function receives the partner. In addition, the exchanger internal implementation adopts the lock-free algorithm, which can greatly improve the throughput and performance under multi-thread competition.

Algorithm Implementation
The basic method is to maintain a "slot" (slot), which is a reference to the node that holds the interchange object, and also a "hole" (hole) waiting to fill. If an upcoming "Occupy" (occupying) thread discovers that the slot is empty, then it will CAs (Compareandset) one node to the slot and wait for another thread to call the Exchange method. The second "match" (fulfilling) thread discovers that the slot is non-empty, then the CAS is empty, and the object is exchanged through the CAS hole, and if the occupied thread is blocked, the occupied thread is awakened. In each case, CAS may fail because the slots start out as non-empty but are empty at CAs, or vice versa, so the threads need to retry these actions.
This simple method works fine when only a small number of threads use exchanger, but when the same exchanger is used for multithreading, the performance drops sharply as CAS competes on the same slot. So we use an "area" (arena); In general, a hash table in which the number of slots can be dynamically changed, where any slot can be exchanged by threads. Incoming threads can select slots with a hash value based on their thread ID. If the incoming thread fails the CAS on the select slot, it will select another slot. Similarly, if a thread succeeds the CAs into a slot, but no other thread arrives, it will also try another slot until the No. 0 slot, even if the table shrinks, the No. 0 slot will always be there. This particular mechanism is as follows:

Wait (Waiting): The No. 0 slot is particularly in the absence of competition when it is the only slot that exists. When a single thread occupies the No. 0 slot, if there is no thread match, the thread blocks after a brief spin. In other cases, the occupied thread will eventually abandon and attempt another slot. In the case of blocking (No. 0 slot) or discarding (other slots) or restarting, the waiting thread spins for a moment (a bit shorter than the context switch time). There is no reason to block a thread unless it is unlikely that another thread exists. In order to avoid memory contention, the competitor will silently poll for a shorter period of time than blocking and then waking up. Because of the lack of other threads, a non-0 slot waits for the spin time to end, and an additional context switch time is wasted on each attempt, and the average is still much faster than the other method (blocking and then waking).

change Size (Sizing): In general, using a small number of slots can reduce competition. In particular, when working with a small number of threads, using too many slots can result in the same poor performance as using too few slots, as well as errors that can lead to insufficient space. The variable "Max" maintains the number of slots actually used. When a thread discovers too many CAS failures it adds "Max" (this is similar to the regular hash table that changes size based on a target loading factor, where the rate of growth is increased by one rather than proportional). Growth requires a three-time failure competition on each slot to occur. It takes a couple of failures to grow. This can be handled by the fact that some CAS failures are not due to competition, and may be simply competing on two threads or being preempted by a wired thread during the read and CAS processes. At the same time, a very short peak competition may be much higher than the average tolerable level. When a non-0-slot wait timeout is not matched by P, it attempts to reduce the maximum slot count (max) limit. The thread undergoes a timeout wait that moves closer to the No. 0 slot, so even if the table size is reduced due to inactivity, it will eventually find a thread that exists (or is in the future). This growth and reduction of the selection mechanism and thresholds are inherently involved in indexing and hashing in the swap code, and are not well abstracted.

Hash (Hashing): Each thread selects the initial slot with a simple hash code to be used. For any given thread, the order of each encounter is the same, but it is actually random for the thread. Using zones will encounter the cost vs. quality tradeoff problem with the classic hash table (costing versus quality tradeoffs). Here, we use the one-step fnv-1a hash value of the Thread.getid () return value based on the current thread, plus a cheap approximate modulus (mod) operation to select an index. The flaw in optimizing index selection in such a way is that hard coding is required to use a maximum table size of up to 32. However, this value is sufficient to exceed the known platform.

probing (probing): After detecting the competition of the selected slots, we will explore the entire table sequentially, similar to the linear probe in the conflict with the hash table. (In reverse order, you can best match table growth and reduction rules--the table grows and shrinks from the tail, the head 0 slots remain the same) in addition to minimizing the effects of misstatement and cache invalidation, we probe the first selected slot two times.

padding (Padding): Even with competitive management, slots can be severely contested, so use cache padding (cache-jpadding) to avoid bad memory performance. Because of this, the groove is only used to delay the construction, to avoid wasting unnecessary space. When the memory address is not the priority of the program, as time goes by, the garbage collector performs compression, and slots are likely to be moved to each other, unless padding is used, resulting in a large number of cache rows that are not valid on multiple cores.

algorithm implementation mainly in order to optimize the high competitive conditions of the throughput, so added more features to avoid various problems, initially looked more complex, it is recommended to look at the process first, and then look at the source code implementation, and in turn, will have a deeper understanding.

Source code Implementation
The main purpose of exchanger is to exchange objects between different threads, so the Exchange method is the only public method exchanger. There are two versions of the Exchange method, one is the no-timeout version that throws only the interruptedexception exception, and one that throws Interruptedexception, TimeoutException has a timeout version. Let's take a look at the implementation without a timeout version
    Public V Exchange (v x) throws Interruptedexception {        if (! Thread.interrupted ()) {            Object v = doexchange ((x = = null)? Null_item:x, False, 0);            if (v = = Null_item)                return NULL;            if (v! = CANCEL)                return (v) v;            Thread.interrupted (); Clear interrupt status on IE throw        }        throw new Interruptedexception ();    }
The function first determines whether the current thread has been interrupted, or if it throws an IE exception, otherwise calls the Doexchange function, before calling the function, in order to prevent the parameter x of the incoming interchange object from being null, and then passing in the Null_item when null. A predefined object as an identifier, and, depending on the object returned by Doexchange, to determine whether the object in the slot is null or if the current operation is interrupted, Doexchange returns the Cancel object if it is interrupted, so that exchange throws an IE exception.
    Private static Final Object CANCEL = new Object ();    Private static Final Object Null_item = new Object ();
Let's take a look at the implementation of the Doexchange method.
    Private Object Doexchange (object item, Boolean timed, long Nanos) {node me = new node (item);                  Create in case occupying int index = Hashindex ();                            Index of current slot int fails = 0; Number of CAS failures for (;;)                             {Object y;            Contents of current slot slot slots = Arena[index];                if (slot = = null)//Lazily Initialize Slots Createslot (index);                     Continue Loop to reread else if ((y = Slot.get ())! = null &&//Try to fulfill               Slot.compareandset (y, null)) {Node you = (node) y;                    Transfer Item if (You.compareandset (null, item)) {Locksupport.unpark (you.waiter);                return you.item; }//Else cancelled;           Continue } else if (y = = null &&//Try to occupy Slot.compareandset (null,                        Me) {if (index = = 0)//Blocking wait for slot 0 return timed?                Awaitnanos (Me, slot, Nanos): Await (me, slot);    Object v = spinWait (me, slot);                Spin wait for non-0 if (v! = CANCEL) return v;              me = new Node (item);                Throw away cancelled node int m = Max.get ();  if (M > (index >>>= 1))//Decrease index Max.compareandset (m, m-1);                Maybe Shrink table} else if (++fails > 1) {//Allow 2 fails on 1st slot                int m = Max.get ();                if (Fails > 3 && m < full && max.compareandset (M, M + 1)) index = m + 1;Grow on 3rd failed slot else if (--index < 0) index = m; Circularly Traverse}}}
The function first constructs the node variable me with the current object being swapped as a parameter, and the class node is defined as follows
    Private static final class Node extends atomicreference<object> {public        final Object item;        public volatile Thread waiter;        Public Node (Object item) {            This.item = Item;        }    }
The inner class node inherits from Atomicreference and has two member object Item,waiter internally. Assuming that thread 1 and thread 2 need to Exchange objects, class node passes the object that needs to be swapped in thread 1 as a parameter to the node constructor, and then thread 2 if this node is found in the slot, it uses the CAs to change the current atomic reference from NULL to the item object that needs to be exchanged. The member variable of node is then returned to the item object, and the thread 1 that constructs node calls the Get () method to return this object when it discovers that the atomic reference is not NULL. This allows thread 1 and thread 2 to swap objects smoothly. The member variable of class node waiter general thread 1 is used if blocking and waking are required.
Let's take a look at slot slots and the definition of their related variables.
    private static final int capacity = +;    Private static final class Slot extends atomicreference<object> {        //Improve likelihood of isolation on <= 12 8 byte cache lines.        We used to target of byte cache lines, but some x86s (including        //i7 under some bioses) actually use byte Cach E lines.        Long q0, Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9, QA, QB, QC, QD, QE;    }    Private volatile slot[] Arena = new Slot[capacity];    Private final Atomicinteger max = new Atomicinteger ();
The internal class slot is also inherited from the atomicreference, and its internal variables define a total of 15 long member variables, and the function of the 15 long member variables is to cache the fill (cached padding). This avoids mitigating the cache's impact when you have a large number of CAs. Arena is defined as an array of size capacity, and Max is the size of the array that the arena actually uses, and generally max will grow or shrink depending on the situation, thus avoiding the performance degradation of a single slot for CAs.

We look back at the Doexchange function, and the function then calls Hashindex to get the index of the corresponding slot based on the thread ID.
   Private final int Hashindex () {        Long id = thread.currentthread (). GetId ();        int hash = (((int) (ID ^ (ID >>> +))) ^ 0x811c9dc5) * 0x01000193;        int m = Max.get ();        int nbits = (((0xfffffc00  >> m) & 4) |//Compute ceil (log2 (m+1))                     ((0x000001f8 >>> m) & 2) | The Constants Hold                     ((0xffff00f2 >>> m) & 1)); A lookup table        int index;        while (index = hash & ((1 << nbits)-1) > m)/*       retry on            hash = (hash >>> nbits) | (Hash << (33-nbits)); Non-power-2 m        return index;    
Hashindex mainly calculates the corresponding hash value based on the ID of the current thread according to the one-step fnv-1a, and uses a fast modulus estimate to limit the hash value between [0, Max] (max is the actual size of the slot), the implementation involves various operations, Be interested in self-study, omitted here.

The Doexchange function then goes into a loop, inside the loop is the real algorithm logic, a total of 4 judgments, after each judgment, if there is no return and need to re-judge again. The currently selected slot is obtained from Arena first, because the Hashindex guarantee is less than the max value, so the array is not out of bounds. Let's take a look at the first judgment, when the slot is first used, the slot is null, so call Createslot for initialization.
    private void Createslot (int index) {        Slot newSlot = new slot ();        slot[] A = Arena;        Synchronized (a) {            if (a[index] = = null)                A[index] = newSlot;        }    }
the implementation of Createslot is simple, just adding references to the corresponding positions in the array based on the index parameter. But be aware of concurrency problems, so when assigning values to an array, use the Synchronized keyword to synchronize.
then look back at the Doexchange loop. Take a look at the second judgment, if the selected slot is already initialized, The current Slot.get () method is called to attempt to get the node nodes, if the current node node is not NULL, it indicates that the slot was previously occupied by a thread, then continues to attempt the CAs this slot is null, if successful, indicates that the current threads have been matched with the previous occupied thread, and then CAs replaces no The de atom refers to the Swap object item, then wakes up node's occupied thread waiter, then returns Node.item to complete the interchange.
in the third judgment, if node in the fetch slot is NULL, then the selected slot is not occupied, so the CAS changes the current slot from Null to a start in exchange for object item constructs node me, and if the CAs succeeds, it is divided into two processes according to the selected slot index. First, for the No. 0 slot, we need to block the wait, because we have a non-timed wait here, so call the await function.
 private    static final int ncpu = Runtime.getruntime (). Availableprocessors (); private static final int spins = (Ncpu = = 1)?    0:2000;        private static Object await (node node, slot slot) {Thread w = thread.currentthread ();        int spins = spins; for (;;)            {Object v = node.get ();            if (v! = null) return V;            else if (Spins > 0)//spin-wait phase--spins;            else if (Node.waiter = = null)//Set up to block next node.waiter = W;            else if (w.isinterrupted ())//Abort on Interrupt trycancel (node, slot);        else//Block Locksupport.park (node); }    }
First look at the definition of the spins variable, spins represents the number of times a spin poll variable is required before a timeout in blocking or waiting for a match, when only a single CPU is 0, otherwise 2000. Spins can be exchanged in a multi-core CPU, and if one of the threads is paused due to a GC or a preemption, it can be re-swapped only after a short polling. To see the implementation of await, there are also four judgments in the loop:
The first judgment, the call to node's Get method, if not NULL, proves that the thread has successfully swapped the object or because the interrupt is canceled because of the wait, so directly return to the object V;
The second judgment, the Get method returns NULL, then to spin the wait, the spin value is determined according to spins;
The third judgment, at this time the spin has been completed, so need to enter the blocking state, before blocking, the first to assign the Node.waiter to the current thread, so that the subsequent thread to exchange the time can wake up the threads;
The fourth judgment, before finally entering the block, if the current thread is found to have been interrupted, you need to call Trycancel cancel this wait
Finally, call Locksupport.park into the block.
    private static Boolean Trycancel (node node, slot slot) {        if (!node.compareandset (null, CANCEL))            return false;
   if (slot.get () = = node)//Pre-Check to minimize contention            slot.compareandset (node, null);        return true;    }
the implementation of Trycancel is very simple, first requires CAs to change the atomic reference of the current node from null to the Cancel object, if the CAs fails, it is possible that the thread has successfully matched the current node and called the CAs for Exchange. Otherwise, call CAs to change the slot where node is located to null. If CAs succeeds here, then the Cancel object is returned to the Exchange method, and after the Exchange method has been judged, the interruptedexception exception is thrown.

and then we look back doexchange third judgment, if you select a non-0 slot, you call spinwait to spin the wait.
    private static Object SpinWait (node node, slot slot) {        int spins = spins;        for (;;) {            Object v = node.get ();            if (v! = null)                return v;            else if (Spins > 0)                --spins;            else                trycancel (node, slot);        }    }
spinwait implementation is similar to await, but slightly different, the main logic is if after spins spin, still cannot be matched, then call Trycancel to call the current node Trycancel cancel, so return to Doexchange, If the current node is found to have been canceled, then a new node node is reconstructed, and the value of index is shifted to the right one (that is, divisible by 2), and the number of slots is also considered to be reduced, so that if Max's value is larger than the divisible index, then the max value is subtracted by one by CAs.

in the fourth judgment of Doexchange, if the first three judgments fail, then the CAS fails, the CAS failure may be due to the competition between the two threads, and possibly the concurrency of a large number of threads, so we first add a fails value to record this failure, and then continue to loop the previous judgment , if two consecutive failures, a large number of thread concurrency is more likely, if the number of failures is greater than 3 times, and Max is still less than full (the maximum value of the definition max), try CAs to increase max by 1, if successful, the index is assigned to M+1, The next selected slot is the newly allocated index, and if the number of failures is not enough 3 times, subtract one from the current index and loop through the entire slot table.

So Doexchange The approximate logic is that, exchange's time-out version is roughly similar in logic, call Doexchange to pass in the corresponding time-out parameter, so that the No. 0 slot needs to wait for the call of the additional function Awaitnanos.
   Private Object Awaitnanos (node node, slot slot, long Nanos) {int spins = Timed_spins;        Long lasttime = 0;        Thread w = null; for (;;)            {Object v = node.get ();            if (v! = null) return V;            Long now = System.nanotime ();            if (w = = null) W = Thread.CurrentThread ();            else Nanos-= now-lasttime;            Lasttime = Now;                if (Nanos > 0) {if (Spins > 0)--spins;                else if (Node.waiter = = null) Node.waiter = W;                else if (w.isinterrupted ()) Trycancel (node, slot);            else Locksupport.parknanos (node, Nanos);        } else if (Trycancel (node, slot) &&!w.isinterrupted ()) Return scanontimeout (node); }    }
Awaitnanos General logic is basically the same as await, but adds some logic about time-out judgments. The most important of these is that after the timeout, the Scanontimeout function is tried to be called.
    Private object Scanontimeout (node node) {        object y;        for (int j = arena.length-1; J >= 0;--j) {            slot slot = Arena[j];            if (slot = null) {while                ((y = slot.get ()) = null) {                    if (slot.compareandset (y, null)) {                        node = (node) y;< C9/>if (You.compareandset (null, Node.item)) {                            Locksupport.unpark (you.waiter);                            return You.item        ;        }}}} return CANCEL;    
The scanontimeout scans the entire slot table once, and if the thread is found to be in another slot, the CAS is exchanged. This can reduce the likelihood of timeouts. Note CAs replaces Node.item, not the Get () method returns an atomic reference that was previously dropped by CAs in Trycancel.

Summary
Exchanger uses a lock-free algorithm that uses a Synchronizer that can exchange object references between two sets of threads under multithreading. This synchronizer has done a lot of optimization in the fierce competition environment, and has adopted the padding to avoid the influence of cache in the memory competition for CAs. The lock-free algorithm and its optimization deserve careful taste and understanding.
Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.