Numa and SMPSMP (symmetric multi-processor), a symmetric multiprocessor structure, refers to the multiple CPU symmetry work in a server, which takes the same amount of time per CPU to access the memory address. Its main feature is sharing, including CPU, memory, I/O, and so on. The advantage of SMP is the ability to ensure memory consistency, the disadvantage is that these shared resources are likely to become a performance bottleneck, as the number of CPUs, each CPU to access the same memory resources, can lead to memory access conflicts, can lead to waste of CPU resources. The common PC is this. NUMA (non-uniform Memory access) is not consistent storage access, the CPU is divided into CPU modules, each CPU module is composed of multiple CPUs, and has a separate local memory, I/O slot, etc., between modules can be accessed through the interconnected modules, Access to local memory will be much faster than accessing remote memory (memory from other nodes in the system), which is also the origin of NUMA for inconsistent storage access. The NUMA advantage is that it can better solve the expansion problem of the original SMP system, because the latency of accessing far-memory far exceeds the local memory, so the system performance cannot increase linearly when the number of CPUs increases.
CLH Algorithm ImplementationThe node in the CLH queue Qnode contains a locked field, and the word Jorjo True indicates that the thread needs to acquire the lock and does not release the lock, and False indicates that the thread released the lock. Nodes are linked through the invisible linked list, the reason why the invisible linked list is because these nodes do not have an obvious next pointer, but through the mypred point of change of the node to affect the behavior of Mynode. There is also a trailing pointer on the clhlock that always points to the last node of the queue. The Clhlock class diagram looks like this:
When a thread needs to acquire a lock, a new Qnode is created, and the locked is set to true to indicate that a lock needs to be acquired, and then the thread calls the Getandset method on the tail domain to make itself the tail of the queue, getting a reference to its forward mypred, The thread then rotates on the locked field of the forward node until the lock is released by the forward node. When a thread needs to release the lock, set the locked field of the current node to false, while reclaiming the forward node. As the following illustration shows, thread A needs to acquire the lock, its Mynode field is true, some tail point to the node of thread A, then thread B joins the thread A, tail point to thread B. Both threads A and B rotate on its mypred field, and the locked field of its mypred node becomes false, and it can get the lock sweep line. Obviously thread A's mypred locked domain is false, at which point thread a acquires the lock.
The entire CLH code is as follows, which uses the Threadlocal class, binds the qnode to each thread, and uses the atomicreference, the modification of the tail pointer is the Getandset () operation that invokes it. It is guaranteed to update object references atomically.
public class Clhlock implements Lock {
atomicreference<qnode> tail = new Atomicreference<qnode> (new Qnode ());
Threadlocal<qnode> mypred;
Threadlocal<qnode> Mynode;
Public Clhlock () {
tail = new Atomicreference<qnode> (new Qnode ());
Mynode = new Threadlocal<qnode> () {
protected Qnode InitialValue () {return
new Qnode ();
}
};
mypred = new Threadlocal<qnode> () {
protected Qnode InitialValue () {return
null;}}
;
}
@Override public
void Lock () {
Qnode Qnode = Mynode.get ();
Qnode.locked = true;
Qnode pred = Tail.getandset (qnode);
Mypred.set (pred);
while (pred.locked) {
}
}
@Override public
void Unlock () {
Qnode Qnode = Mynode.get ();
qnode.locked = false;
Mynode.set (Mypred.get ());
}
You can see from the code that there is a while loop in the lock method, which is the locked field that is waiting for the forward node to false, which is a spin wait process. The Unlock method is simple, just set your own locked field to false.
CLH Advantages and disadvantagesThe advantage of CLH queue locks is that the space complexity is low (if there are N threads, l a lock, each thread gets only one lock at a time, then the required storage space is O (l+n), n threads have n mynode,l locks have L tail), a variant of CLH is applied in the Java Concurrency Framework. The only drawback is poor performance under NUMA system architecture, under this system structure, each thread has its own memory, if the memory position of the forward node is far away, the performance of the locked domain of the forward node can be greatly reduced, but the method is very effective in the SMP system structure. One way to solve the NUMA system structure is the MCS queue lock.
reference materials:A Hierarchical CLH Queue Lock
Thread Lock Series (1): CLH lock
The Art of multiprocessor programming