J.U.C Concurrency Framework

Source: Internet
Author: User
Tags builtin lock queue

Reprint: Http://itindex.net/detail/48869-j.u.c-%E6%A1%86%E6%9E%B6

J.U.C Concurrency Framework

Doug Lea
SUNY Oswego
Oswego NY 13126

[Email protected]

Translation: The scroll is passionate

In j2se1.5, most of the synchronization tools (locks, barriers, etc.) under the Java.util.concurrent package are built on a Abstractqueuedsynchronizer class basis. This framework provides some common mechanisms for automating the management of concurrency states, blocking and non-blocking threads, and queues. This paper describes the root, design, implementation, usage and performance of the framework.

Keywords: synchronized, Java

1. Introduction

The Java release j2se-1.5 describes the Java.util.concurrent package, a collection of intermediate concurrency classes created through JCP (Java Community Process) and JSR. These components are synchronous components that are supported by an abstract data type (ADT) class to support internal synchronization states (for example, to indicate whether a lock is locked or unlocked state), and to update and monitor the state, and if other threads modify the state and State allows, the state At least one method call thread is blocked. For example: Various forms of mutexes, read and write locks, semaphores, barriers, futures, event indicators, and queues.

As is well known (see [2]), almost all synchronization classes can be used to implement the corresponding other synchronization classes, for example, we may create a semaphore by resetting the lock, in turn can also create a reset lock through the semaphore. However, doing so is quite complicated and inconvenient for engineering development to use. Further, there is no sense of beauty. If there is no essential difference between them, it is also a disaster for the developers because they need to pick one out of them and create another synchronization object. As a result, JSR166 has built a small abstractqueuedsynchronizer-centric framework to provide to developers.

2.2.1 Functions Required

Synchronization is performed in two ways [7]: At least one thread is blocked until the synchronization allows the acquisition of the lock operation, and at least one modification of the synchronization state, allowing one or more threads to enter the non-blocking state of the release lock operation.

The J.U.C package does not define a unified synchronization API, some are defined by common interfaces (such as the Lock interface), but others include only specific versions. Therefore, the method names defined in different classes for acquiring and releasing locks are not the same. For example, Lock.lock,semaphore.acquire, countdownlatch.await, and futuretask.get all express the acquisition operation. However, each synchronization class has:

* Non-blocking synchronization (e.g. Trylock) and blocking synchronization

* Maximum wait time can be selected in order to give up waiting

* Interrupts that can be canceled. To differentiate between canceled interrupts and non-canceled interrupts

Synchronization can be divided according to the mutex or shared state. In a mutex state, only one thread is allowed to execute at a time, and after blocking, it is possible that the same thread will continue to execute. Shared state allows multiple threads to execute at a certain time. The usual locks are, of course, holding mutually exclusive states, but semaphores can allow a certain number of threads to execute. For the widest range of applications, the framework uses mutexes and shared locks.

The J.U.C package also defines the condition interface, which enables monitoring of await/signal operations, while in a mutually exclusive lock class, the built-in lock is monitored.

2.2 Performance targets

Java's built-in locks (which use the Synchronized keyword for blocking) have long focused on performance and a plethora of related structural introductions (such as [1],[3]), but the focus is on how to minimize the footprint when using single-threaded contexts on a one-core processor ( Because each Java class can be used as a lock, and consumes the shortest time. Neither approach focuses on synchronization, and the focus of synchronization is that programmers create concurrency only when they need it, so there is no need to compress space for fear of wasting, while in multithreaded designs (more and more machines are using multicore processors), most of them use mutex synchronization in the rare case of competition. Therefore, the conventional lock optimization strategy in the JVM is for 0 competitive scenarios, and for typical multithreaded server applications that rely heavily on java.util.concurrent, it is not the right approach to let other scenarios take a "slow path" that is difficult to anticipate.

Conversely, the initial performance goal is only to consider stability. Can predict maintenance efficiency, or a more specific situation, concurrent competition. Ideally, no matter how many threads there are, the number of synchronizations on a synchronization point is a constant. The main key point in these goals is to allow the ability to release concurrent (locked) threads that have not yet been freed to respond to as short a time as possible. This also takes into account resource allocations, including total CPU request time, storage path, and thread dispatch response time. For example, the request time of the spin lock is usually shorter than the blocking lock, but it wastes cycles and consumes memory, so it is not used frequently.

(performance) Goals are generally considered for two purposes. Many applications should consider maximizing throughput, response time, and, preferably, avoiding thread starvation. But in applications such as resource control, it is more important to consider reducing thread throughput so that threads can execute fairly. In these conflicting goals, there is no framework that can determine which of the user's actions to prefer, but must support different fairness strategies.

No matter how cleverly they are designed inside, concurrency can create bottlenecks in some applications. Therefore, the framework must enable users to monitor basic internal operations, allowing users to discover and reduce bottlenecks. This approach to minimizing bottlenecks allows the user to decide how many threads are allowed to block.

3. Design and implementation

The basic idea of concurrency is still more intuitive. A fetch operation is as follows:

while (concurrency state does not allow acquisition of locks {

If the current thread is not enqueued, queue the current thread;

May block the current thread;

}

If the current thread is in the queue, the current thread will be out of the team;

The release operation is as follows:

Update concurrency Status

if (status allowed, allows a blocking thread to acquire the lock)

To enter one or more blocking threads into a non-blocking state

These operations require 3 basic conditions:

1. Automatic management of concurrency status

2. Blocking and Waking threads

3. Maintain FIFO queue

You can create a framework to implement these three conditions independently. But it may not be efficient or useless. Because information such as nodes in a queue must conform to the node information that needs to release the blocking node into the non-blocking state, the provided interface method signature also has the characteristics of the concurrency state.

The center of Concurrent framework design is to choose a specific implementation of these three conditions, and can be in these three areas have a large selection of scope of use. This limits the scope of the application, but avoids new concurrency classes and provides a high-efficiency concurrency framework.

3.1 Concurrency Status

The Abstractqueuedsynchronizer class maintains a concurrency state using only a 32-bit int type, and provides GetState, SetState, and compareandsetstate three methods to obtain and update the state, These methods, in turn, depend on the read and write semantics provided by JSR133, and the Compareandset method is implemented by local Compare-and-swap or loadlinked/store-conditional directives. These methods atomically update the state to a given value.

It is a meaningful decision to limit the synchronization state to 32-bit int. The JSR166 also provides a long-type atomic operation of 64 bits, similar to using an internal lock, but the result is not ideal. In the future it may add a long type of state and become a second choice (that is, a long control parameter) that uses a 64-bit state. However, there is no need to add this class to the package at this time. The current 32-bit is enough for most of the applications used. There is only one class in the J.U.C package, and Cyclicbarrier is likely to be used, so it is replaced with a lock (which is also how other higher class classes are used in the package).

The specific class that implements the Abstractqueuedsynchronizer must define the Tryacquire and Tryrelease methods, which provide a State method interface to facilitate control of the fetch and release operations. If the lock is acquired concurrently, the Tryaaquire must return true, and Tryrelease must return TRUE if the subsequent concurrency state allows fetching. The two methods communicate only one parameter of type int, as in reentrant lock, the recursive addition is re-established when the lock is retrieved from a waiting condition queue. Many concurrent classes do not require this parameter, so it is ignored.

3.2 Blocking

Prior to JSR166, there was no Java API available for a blocking and nonblocking threading algorithm built on the basis of not built-in locks. The only Thread.Suspend and Thread.Resume methods that provide this algorithm have been canceled because of an unresolved competition problem: if a non-blocking thread calls the Resume method before the blocking thread executes the Suspend method, the resume operation is not valid.

The Java.util.concurrent.locks package includes a Locksupport class, which has a way to solve the problem. Locksupport.part is only called when the Locksupport.unpart and execution are executed. (False wakeup is also allowed). Calling the Unpart method does not calculate (lock). So multiple Unpark before a park will only release one park thread. Also, this is an app at the thread level, not an application at the Concurrency class level. One thread calls park, which is likely to be immediately matched by an excessive number of unpark called earlier. However, in the absence of Unpark, this call is blocked. So you need to make sure that this (blocking) state is cleared, but there is no need to do so. Multiple calls to park are more efficient if needed.

This simple mechanism is similar to Solaris-9 threads, the "consumable events" under Win32, and the NPTL threads of Linux, and the efficiency of running is consistent with the efficiency of Java on these platforms. (However, the current implementation of the Sun Hotspot JVM on Solaris and Linux is implemented using the Pthread Condvar to meet the design requirements of the runtime). The park method also provides some optional parameters to support expiration, and integrates with the JVM's thread.interrupt to interrupt a thread to unpark it.

3.3 Queues

The core of the framework maintains a queue of blocked threads, which is the FIFO queue. Therefore, the framework does not provide priority concurrency.

What has been debated these days is whether it is best to use non-blocking data structures for concurrent queues so that they do not need to create lower level locks themselves. And there are also two locks to use directly: Mellor-crummey and Scott (MCS) lock [9], and Craig,landin,hagersten (CLH) lock [5][8][10]. Previously CLH locks were used only for spin locks. However, CLH locks are more appropriate in the concurrency framework than MCS locks because it is easier to handle cancellation and timeout operations, so a CLH lock is chosen. The result of the discussion is that the initial CLH lock can be used in a wider range based on demand expansion.

A CLH queue is not much like a queue, because both the enqueue and the outbound operations are closely related to its operation as a lock. This is a link queue, head and tail two fields are atomically updated, they initially point to a pseudo node.

A new node, nodes, will be enqueued with the following atomic operations:

do {pred = tail;

} while (!tail.compareandset (pred, node));

The release state of each node is stored on its previous node, so a spin lock "spin" is as follows:

while (Pred.status! = Released); Spin

After this spin, the team operation is as simple as the head node is set to get the lock node

Head=node;

The advantage of the CLH lock is that the queue and the team are fast and non-blocking (even under competitive conditions, a thread will always be inserted into the competition), and the process of detecting whether other threads are waiting is also very fast (only to detect if the head node and tail node are the same); The release state is decentralized and avoids competition in memory.

The original design of the CLH lock is not a link node. The pred in the spin lock are stored in local variables. But Scott and Scherer show through the node's predecessor node holding state, CLH locks can handle timeouts and other forms of cancellation, and if a node's predecessor node is canceled, the node can use the state properties of the predecessor node.

Doing blocking concurrency requires only a valid update of the CLH queue, giving it a node to point to its successors. In a spin lock, a node only needs to update its state, which tells the subsequent node to prepare the spin, so no link is required. However, in blocking concurrency, a node needs to explicitly wake up (Unpark) its successor node.

A AQS queue node consists of a next property that points to its successor. However, since there is no real technology to enable a lock-free doubly linked list to use Compareandset for atomic insertion, the linked list is not atomic at the time of insertion, but simply assigned:

Pred.next=node;

This will be used in reflection. The next link is just to optimize the path. If a node detects no successor through next (or if the successor is canceled), it will then traverse forward from the tail node, using pred to see if there is only one node.

The second part of the change is to use the Status property in each node to control blocking, not spin. Under the concurrency framework, a thread queue can only be returned from a fetch operation in the Tryacquire method defined by a subclass, and a single "release" Bit is not enough. But control also ensures that the head node thread that invokes Tryacquire is alive, in which case it is still possible that it will not be acquired and then re-blocked. This does not require the status identification of each node because only the predecessor node of the current node is required to be allowed on the head node. This is not like a spin lock, the spin lock does not have enough memory space to read the head node, but it must be canceled in the state attribute.

The Queue Node state property page is used to avoid useless park and Unpark calls. These two methods are relatively quick to block, which avoids the boundary interaction between Java and JVM runtime and the OS. Before calling park, a thread set the "Signal me" bit and then re-detects the concurrency and node state before calling Park. A release thread to clear this state. This avoids unnecessary blocking of threads, especially for locks that take time to wait. This also avoids the request for a release thread to detect the successor, unless its successor sets the signal bit,

The CLH lock differs from other languages where the GC is, and the dependency on the GC requires that the node be set to null at the time of the queue

Additional optimizations include deferred initialization of pseudo-nodes for CLH queues, which are available in the j2se1.5 source code documentation.

Ignoring these details, the general conditions for obtaining operations are as follows (not including non-disruptive, non-timed):

if (!tryacquire (ARG)) {

node = Create and enqueue new node;

pred = node ' s effective predecessor;

While (Pred was not head node | |!tryacquire (ARG)) {

if (pred ' s signal bit is set)

Park ();

Else

Compareandset pred ' s signal bit to true;

pred = node ' s effective predecessor;

}

Head = node;

}

Release action:

if (Tryrelease (ARG) && head node ' s signal bit is set) {

Compareandset head ' s signal bit to false;

Unpark Head ' s successor, if one exists

}

The number of iterations of the main loop depends on the tryacquire. Otherwise, in the absence of a cancel operation, each component fetch and release operation is a constant time O (1), which calculates the cost of each thread to each other, regardless of the thread scheduling of the OS during park.

You can cancel the operation only to detect the interrupt or timeout of the Get loop inside park. A Unpark thread that is based on a timeout or interrupt sets its Node property and then the successor node, allowing the successor to reset the link. The detection of the predecessor, successor, and reset states in the cancellation may include an O (n) traversal (where n is the length of the queue). Because threads never block a cancel operation, the link and State properties can be quickly restored to stability.

3.4-piece Queue

The concurrency framework provides a conditionobject class to maintain mutex concurrency in accordance with the lock interface. A lock object can have many conditional objects, providing a classic monitoring method: await, signal, and signalall operations, including timeouts and monitoring internal methods. Conditionobject allows the integration of conditional objects and other concurrency operations to perform efficiently and adjusts some design ideas. This class only supports the Java type of monitoring principle, that is, only the lock holds the current thread's conditions to be able to perform conditional operations. Therefore, the practice of conditionobject in Reentrantlock and the built-in monitoring lock are the same (such as object.wait, etc.), but the method name is different, this is more practical, because the user can add a number of conditional objects for each lock.

Conditionobject uses the same built-in node queue as other concurrent objects, but maintains a different condition queue. The operation of the signal queue is performed by migrating from the conditional queue to the lock queue, without having to wake the signal thread before acquiring the lock.

The basic wait operation is:

Create and add new node to condition queue;

Release lock;

Block until node is on lock queue;

Re-acquire lock;

The signal operation is:

Transfer the first node from the condition queue to lock queue;

Because these operations can only be performed with a lock on, they can use the link queue operation (using the Nextwaiter property on the node) to maintain the conditional queue. The migration operation indicates that the first node of the conditional queue is simply freed, and then the node is captured and inserted into the lock queue using CHL.

The most complex implementation of these operations is the cancellation of conditions waiting for processing timeouts or calling Thread.Interrupt. A cancellation and get signal operation takes about the same amount of time to compete, they all have to confirm the built-in lock. In the revised version of JSR133, these operations require that if an interrupt occurs before acquiring a signal, the await method must throw a Interruptedexception exception after the lock is fetched. However, if you interrupt after acquiring the signal, the await method must set the interrupt state and return directly without throwing an exception.

To maintain a more appropriate order, the queue node state records whether the node is moved. Both the get signal code and the cancellation code will be compareandset to set this state. If a get signal operation is not able to acquire a lock in the competition, if there is a next node, it will move the next node. If the cancel operation fails, it must discard the migration and wait for the lock to be acquired again. The latter is the operating method of the spin infinite loop. A canceled wait thread cannot acquire a lock operation until the node is successfully inserted into the lock queue, so the thread must be spinning to wait for a successful call to the Compareandset method to insert the CHL queue. Because spin is very small, using Thread.yield to dispatch other threads, ideally a thread gets the signal instead of running directly. This makes it advantageous to cancel the insertion of the node, in case the organizer needs to confirm that the inserted node is too high. In other cases, no spin or hang is used in order to maintain performance in a single-core processor.

4. Usage

The Aqs class is closely related to the method discussed above and acts as a basic class for concurrency in the form of "temporary method pattern" [6]. Subclasses inheriting this class need to implement state monitoring and control methods for acquiring and releasing locks. But the subclass of Aqs itself does not do concurrent ADT (abstract data type), because subclasses need to provide methods for internal control of acquiring and releasing locking mechanisms. All declared concurrency classes under the J.U.C package declare a private subclass that inherits from Aqs and all the concurrency methods in that class. This also provides a common method name that is appropriate for concurrency. For example, there is a minimum class mutex, with a concurrency state equal to 0 to indicate a state that is not locked, and a concurrency state of 1 for a locked state. This class does not use parameter values to support concurrency, but uses 0, or simply ignores them directly

Class Mutex {

Class Sync

Extends Abstractqueuedsynchronizer {

public boolean tryacquire (int ignore) {

Return compareandsetstate (0, 1);

}

public boolean tryrelease (int ignore) {

SetState (0); return true;

}

}

Private final Sync sync = new sync ();

public void Lock () {sync.acquire (0);}

public void Unlock () {sync.release (0);}

}

The full version and other usage of this example can be found in the J2SE documentation. Of course there will be a lot of variables. For example, the Tryacquire method uses Testand-test-and-set to detect the state before acquiring the lock.

You might be surprised to construct an immutable exclusion lock with high performance requirements that is constructed in a way that delegates and virtual methods federate. But this OO design is structured in a way that is now a long-term concern of the dynamic compiler. They are good at optimizing this type of load, at least often in calls to concurrent classes. The Aqs class also provides a series of methods to complete concurrent operations. For example, on the basis of the method of acquiring locks also added a time-out to obtain, interrupt the way to acquire locks. In this case, we still focus on the concurrency of exclusion patterns, such as locks, Aqs also includes a series of parallel methods (such as acquireshared) with different tryacquireshared and tryreleaseshared methods. The method can get more locks by the return value notification framework, and wake up multiple threads to achieve optimal performance by cascading the acquisition of signals. Although serialized concurrency is not commonly used (persistent storage or transformation), these classes often reverse-construct other classes, such as thread-safe collections, which are usually serialized. The Aqs and Conditionobject classes provide methods for serializing the concurrency state, but do not block threads or temporarily store them. Even so, the concurrency class does not reset the concurrency state to the initial value when deserializing, because the lock is always deserialized into a non-locked state.

This equates to an empty instruction, but still needs to display the deserialization that provides the final property.

4.1 Control of fairness

Even though it is based on a FIFO queue, there is no need for fair concurrency. In the basic Fetch algorithm (section 3.3), Tryacquire checks (state) before the queue. So a new incoming thread (barge thread) can bypass the thread of the first node of the queue to acquire the lock. This strategy of FIFO can make concurrency higher than other technologies. It reduces the time it takes for the next node to acquire a lock to be blocked because of the lock availability. It also avoids unnecessary waste by allowing only the queue header to be awakened to acquire a lock after the lock is released. Developers have to give more space to the concurrency, so that Tryacquire will have to hold the lock several times before passing back control.

FIFO concurrency produces fair competition with little chance that the non-blocking threads in the queue header and the other new barge threads have the same chance of securing the lock, if not acquired, from the new into the blocking state or from the new attempt to acquire the lock.

However, if the thread entering the queue is faster than the speed at which a wake-up thread enters a non-blocking state, the first node will have a small chance of acquiring a lock, it will be blocked again, and its subsequent nodes will remain blocked. To put it simply, this concurrency occurs frequently in multi-core processor environments when the first node thread enters a non-blocking state, with multiple newly added barge threads and multiple release operations. As noted below, the benefit of doing so is to avoid one or more threads starving to achieve high-speed execution.

If the requirements are fair, it is relatively simple. If it is not the queue header, Tryacquire only returns false. The Getfirstqueuedthread method is a convenient way to see if it is a queue header.

A faster, less restrictive approach is that if the queue is a temporary empty queue, Tryacquire can allow a successful return. In this case, if more than one thread encounters an empty queue, they are competing for who will be the first to acquire the lock, at least one of them does not need to be queued, and this strategy supports the "fair" mode under the J.U.C package.

Although it will be used in practical applications, there is no guarantee of fairness because the Java language does not provide a scheduling fairness policy. For example, in fairly strict fairness concurrency, the JVM allows the queue to execute if the nodes in the queue do not need to block waiting for other nodes. In fact, in a single-core processor, such a thread is most likely to preempt execution for a certain amount of time before the context switches. If such a thread holds a mutex, it immediately cuts back to its original context, only to know that other threads also require a lock when it releases its own lock and becomes blocked. This increases the time that is available, but the lock has not yet been acquired. Fair concurrency settings have a greater impact on multicore processors because there is more interaction, and thus a thread has more chances to discover whether other threads need locks.

Even so in high competition, fair lock can well protect the program, but the performance will be very poor. For example, when they use the built-in lock to lock a long code, performance is not obvious, but there are risks, even if the risk of infinite waiting. The concurrency framework leaves these implementations to its users.

4.2 Concurrency Classes

The following describes how the J.U.C concurrency class uses this framework:

Reentrantlock calculates the number of locks with a concurrency state (recursive). When the lock is acquired, it logs the thread identity, detects whether the lock needs to be fetched recursively, and detects the illegal state if another thread wants to release the lock. This class is also used in Conditionobject, which provides additional monitoring and observation methods. The class internally defines two different sub-classes of AQS to set its fair or non-fair mode, and each reentrantlock chooses the appropriate pattern at the time of construction. The Reentrantreadwritelock uses a 16-bit concurrency state to hold the number of write locks, and the remaining 16 bits hold the number of read locks. Like the reentrantlock of Writelock, Readlock uses the Acquiredshared method to allow multithreaded reading.

The Semaphore class (a class that computes a signal) holds the number of current locks in a concurrent state, which defines the Acquireshared method, which reduces the number of locks held if the number of locks is greater than 0 o'clock, or blocks when the number is negative. As well as tryrelease to increase the number of locks, if the number is greater than 0, it may enter a non-blocking state.

The Countdownlaunch class holds the number of current locks in a concurrent state, and when the number of locks is 0 o'clock, all the fetch operations can be completed.

Futuretask uses a concurrency state to represent the running state of a future (initialization, run, cancel, completion state). Setting or canceling a future will call release. Non-blocking threads wait for the computed value through acquire.

Synchronousqueue class, the internal waiting node to match producers and consumers. Executes the producer, or executes the consumer, in a concurrent state.

Users can also use the J.U.C package to define their own concurrency classes if none of the above classes are appropriate. The classes in the package provide a variety of Win32 events, binary latches, central management locks, and tree-based barriers.

5. Performance

The Concurrency Framework provides various forms of concurrency classes as well as immutable lock locks. The performance of detecting and comparing locks is very simple. But there are many different ways to make comparisons. Here we design its maximum throughput.

Each time the test is tested, each thread uses the nextrandom (int seed) method to repeatedly update a random value:

int t= (seed%127773) * 16807– (seed/127773) * 2836;

Return (t>0)? t:t+0x7fffffff;

Each time a thread is updated, it may be a shared iterator under the exclusion lock, or only its local iterator will be updated instead of locked. This results in a short lock, which can have minimal impact if the thread first holds the lock. Locking (Shared variables) or no locks (local variables) is not fixed, which can determine if I need to lock and let the code run in a loop.

Compare four locks: Builtin, block with synchronized, mutex, simple mutex class shown in section Fourth, reentrant, Reentrantlock with Reentrantlock;fair, set to "fair" mode. All of these tests were built using Sun J2se1.5jdk's "server" mode (roughly the same as Beta2). At first we evaluated performance by running independently of 20 test projects. Test each thread to run 10 million iterations, and the thread in Fair mode executes 1 million iterations.

Test the X86 on the machine, and four UltraSPARC. All x86 machines run on Linux, using Redhat's kernel2.4 and packages. All the UltraSPARC are running Solaris-9. All systems have been loaded at least once during testing. The test does not need to run empty. Dual high-performance threading Xeon and 4 machines run as efficiently. We're not going to make them exactly the same here. As shown below, the performance of concurrent consumption does not vary depending on the number of processors, type, and speed of operation.

5.1 Consumption

The non-competitive way to measure is to run only one thread, set S to 1, and reduce the time spent by the thread iterations by s=2 the time difference. Table Two shows the time consumption of the lock in each concurrent code and the code that does not use concurrency, in nanoseconds. The mutex class is the most able to measure the consumption of this basic framework. A reentrant lock records the current thread, as well as the additional consumption of error checking, and a fair lock for the first time to check if a queue is empty.

Table 2 also shows a comparison of the Tryacquire and the "fast path" for building built-in locks. Here, much of the difference between them is reflected in the use of different atomic instructions and memory fences between locks and machines. In multi-core processors, these directives tend to complete all other instructions. The main difference between builtin and concurrency classes is that the hotspot uses Compareandset to lock and release locks, and these concurrency classes use Compareandset for lock operations and volatile writes (for example, memory fences on multicore processors, And all processes are subject to a restricted rearrangement). There is a difference in performance consumption between machines.

At the other extreme, table 3 shows the performance cost of S=1 when there are 256 threads concurrently running and the lock competition is very intense. In the case of full saturation, the BARGINGFIFO lock has a significantly lower performance than the Builtin lock and is two orders of magnitude less than a fair lock. This indicates that the BARGINGFIFO strategy has the advantage of maintaining thread operation, even in a competitive environment.

Table 2 time consumption per lock without competition (in nanoseconds)

Table 3 time consumption of locks under saturated competition (in nanoseconds)

Table 3 shows that even if the internal consumption is low, the time required for context switching is also related to fair lock performance. List display

The approximate percentage of time that these blocking and non-blocking threads spend on different platforms. Furthermore, the next experiment (using only a four-core machine) shows that only the locks are held here, and that the fair-setting lock has a small change in the overall performance effect. The difference between thread execution end times can be coarse-grained to measure locks. The ratio of 4P machine standards to 0.7% is used for fair locking, and 6% of the ratio is used for reentrant locks. Conversely, similar long-held locks can be tested as: Each thread holds a lock and then computes any number of 16K. In this way, the total run time is almost: Fair lock 9.79s, can be re-entered lock 9.72s. The fair mode performance varies, but is less than the reentrant lock consumption, averaging only 0.1% of the scale, while the reentrant lock increases on average to 29.5%.

5.2 Output

The vast majority of concurrent use is between state that is completely non-competitive and full-load competitive. The experiment can be tested in two ways: 1, changing the competitive ratio of a fixed set of threads, and 2, adding more threads to a set of threads for a fixed percentage of the competition. To prove these effects, all threads use a reentrant lock by running different competing proportions of the thread, as well as the different number of threads to be tested. Implement the index with the slowdown identity:

Here, T is the observable execution time, and B is the base line time for a thread without competition and concurrency. P is the number of processes, and s identifies the proportions acquired by the share. This value is based on the Amdahl theorem, a ratio between the observed time and the ideal execution time (which normally cannot be achieved) in a mix of serial execution and concurrent execution. The ideal time model indicates that there is no concurrent consumption at execution time, and that no threads are blocked because they are conflicting. Even so, under very low competitive conditions, very few test results indicate that the time spent will be slightly larger than the ideal time, presumably due to the slight difference between the basic line and the test run, the difference between the optimization, the catheter, etc.

Use log2n as a function. For example, n=1.0, which indicates that the measured time is twice times the ideal time, n=4.0, is 16 times times slower than the ideal time. With the basic time-based log function (where the basic time is to calculate the base time of any number), different calculations can show different tendencies. The test uses a competitive ratio of 1/128 (or "0.008") to 1, one per test, the competition ratio doubled, the number of threads used from 1 to 1024, and the number of threads doubled per test.

In a single-core processor, performance decreases as the competition increases, but as the threads increase, so does the performance slowdown. In multi-core processors, performance degradation is more intense in competitive situations. Multi-process images show an early peak in a race of only a handful of threads, which is where performance is lowest. Reflects the performance of the gradient region, in this region barging and wake-up threads almost equal to acquire locks, so frequently forcing each other to block. In most cases, the gradient area is followed by a smooth area, since the lock has little breathing air, resulting in a fetch lock that is approximated to a multi-process serial mode. It is important to note that the full load value in this example (that is, "1.000") has a very fast performance degradation in a small number of process machines.

These results show that the further optimization of blocking (Park/unpark) reduces context switching and its consumption, which can effectively improve this possible performance. Moreover, it is also able to optimize the concurrency classes in multiple processes using some form of spin to complete the simple holding but high competitive environment of the lock overhead. However, in different context switches the spin lock performance is not good, so in this case requires the user to establish their own locks, the respective application needs are different, so also encourage different implementations.

6. Conclusion

As this article says, the J.U.C concurrency framework is too new to be applied to the actual. It is unlikely to refer to all of its wide usage, but it will be available in j2se1.5, and it will provide API implementations and performance in accordance with this design. However, the success of the framework here is to provide an efficient concurrency implementation.

(full text) If you like this article please click to praise, share, comment.

J.U.C Concurrency Framework

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.