Flaws in Java threading
Allen Holub points out that the threading model of the Java programming language may be the weakest part of the language. It is completely unsuitable for the requirements of the actual complex program and is not object-oriented at all. This article recommends significant modifications and additions to the Java language to address these issues. The threading model of the Java language is one of the hardest parts of this language to satisfy. Although the Java language natively supports threading programming is a good thing, it has too little support for threading syntax and class packages, only for very small application environments. Most of the books on Java threading voluminous point to flaws in the Java threading Model and provide a first-aid kit (Band-aid/Bundy) class library to solve these problems. I call these classes as first aid kits because the problems they can solve are supposed to be contained in the Java language itself syntax. In the long run, a syntax rather than a class library approach would produce more efficient code. This is because the compiler and the Java virtual (JVM) can optimize the program code together, and these optimizations are difficult or impossible to implement for the code in the class library. In My taming Java Threads (see Resources) book and in this article, I further suggest some modifications to the Java programming language itself so that it can really solve these threading problems. The main difference between this article and my book is that I did more thinking when I wrote this article, so I improved the proposals in the book. These suggestions are only tentative-just my personal thoughts on these issues, and the realization of these ideas requires a lot of work and peer reviews. But this is after all a beginning, I intend to solve these problems to set up a dedicated working group, if you are interested, please e-mail to [email protected]. Once I have really started, I will send you a notice. The proposals put forward here are very bold. Some people recommend making minor and minor modifications to the Java Language Specification (JLS) (see Resources) to address the current fuzzy JVM behavior, but I want to make it more thorough. In the actual draft, many of my suggestions include introducing new keywords for this language. While it is common to ask not to break through the existing code of a language, it must be able to introduce new keywords if the language is not to remain unchanged and obsolete. To make the introduced keyword not conflict with an existing identifier, I will use a ($) character for careful consideration, which is illegal in an existing identifier. (for example, use a $task instead of a task). The compiler's command-line switches are required to support this, and you can use variants of these keywords instead of ignoring the dollar sign. Concept Java lines for task (tasks)The fundamental problem with the process model is that it is not object-oriented at all. Object-oriented (OO) designers do not think at all in terms of threading; They consider synchronizing information asynchronously (synchronization information is processed immediately-until information processing is completed before the message handle is returned; The asynchronous message is processed in the background for a period of time--and the message handle is returned long before the end of the processing). The Toolkit.getimage () method in the Java programming language is a good example of asynchronous information. The message handle of GetImage () will be returned immediately without waiting for the entire image to be retrieved by the background thread. This is an object-oriented (OO) approach. However, as mentioned earlier, the Java threading model is non-object-oriented. A Java programming language thread is really just a run () procedure that invokes other procedures. Here there is no object, asynchronous or synchronous information, and other concepts. One of the workarounds that I've discussed in depth in my book for this issue is the use of a active_object. An active object is an object that can receive an asynchronous request, which is processed in the background for a period of time after the request is received. In the Java programming language, a request can be encapsulated in an object. For example, you can pass an instance implemented by the Runnable interface to this active object, and the Run () method of the interface encapsulates the work that needs to be done. The Runnable object is queued to this active object, and when it executes, the active object uses a background thread to execute it. The asynchronous information that runs on an active object is actually synchronous because they are fetched and executed sequentially from the queue by a single service thread. Therefore, using an active object in a more procedural model can eliminate most of the synchronization problems. In a sense, the entire SWING/AWT subsystem of the Java programming language is an active object. The only safe way to send a message to a swing queue is to invoke a method like Swingutilities.invokelater (), which sends a Runnable object on the Swing event queue, and when it executes, Swing The event processing thread will process it. My first suggestion, then, is to incorporate the concept of a task (task) into the Java programming language to integrate the active object into the language. (The concept of task is drawn from Intel's RMX operating system and the Ada programming language.) Similar concepts are supported in most real-time operating systems. A task has a built-in active object distribution processand automatically manage all the mechanisms that handle asynchronous information. Defining a task is basically the same as defining a class, except that you need to add a asynchronous modifier to the task's method before it instructs the assignment program of the active object to process the methods in the background. Please refer to the class-based method in the Nineth chapter of my book, and look at the following File_io class, which uses the Active_object class discussed in taming Java Threads to implement asynchronous writes: All write requests are placed with a dispatch () procedure Call Queued in the input queue of the active-object. Any exception (exception) that occurs when these asynchronous messages are processed in the background is handled by the Exception_handler object, and this Exception_handler object is passed to the File_io_task constructor. When you want to write content to a file, the code is as follows: The main problem with this class-based approach is that it's too complicated--for a simple operation like this, the code is too miscellaneous. With the introduction of the $task and $asynchronous keywords into the Java language, you can rewrite the previous code as follows: Note that the Async method does not specify a return value because its handle is returned immediately, rather than waiting until the requested operation has finished processing. Therefore, there is no reasonable return value at this time. For derived models, the $task keyword is as effective as class: $task can implement interfaces, inherit classes, and other inherited tasks. The method marked with the asynchronous keyword is processed by the $task in the background. Other methods will run synchronously, just like in a class. The $task keyword can be decorated with an optional $ERROR clause (as shown above), which indicates that there will be a default handler for any exception that cannot be caught by the Async method itself. I use $ to represent the exception object being thrown. If you do not specify a $error clause, a reasonable error message (most likely a stack trace) will be printed. Note that to ensure thread safety, the parameters of the Async method must be constant (immutable). The runtime system should ensure this invariance through relevant semantics (simple replication is often not enough). All task objects must support some pseudo-information (pseudo-message), for example: In addition to commonly used modifiers (public, etc.), the Task keyword should also accept a $pooled (n) modifier, which causes the task to use a thread pool instead of running with a single thread An asynchronous request. n Specifies the size of the required thread pool, which can be increased if necessary, but when no line is neededIt should shrink to its original size. The pseudo-domain (Pseudo-field) $pool _size returns the original n parameter value specified in $pooled (n). In the eighth chapter of taming Java Threads, I give a server-side socket handler as an example of a thread pool. It's a good example of a task that uses a thread pool. The basic idea is to produce a standalone object whose task is to monitor a server-side socket. Whenever a client connects to a server, the server-side object fetches a pre-created sleep thread from the pool and sets the thread to serve the client connection. The socket server produces an additional customer service thread, but when the connection is closed, these additional threads are deleted. The recommended syntax for implementing a socket server is as follows: The Socket_server object uses a separate background thread to process the asynchronous listen () request, which encapsulates the socket's "accept" loop. When each client connects, listen () requests a client_handler to process the request by calling handle (). Each handle () request executes in their own thread (because this is a $pooled task). Note that each asynchronous message that is routed to the $pooled $task is actually handled using its own thread. Typically, because a $pooled $task is used to implement a autonomic operation, the best solution for resolving potential synchronization problems associated with access state variables is to use this in the $asynchronous method as a unique copy of the object pointed to. This means that when an asynchronous request is sent to a $pooled $task, a clone () operation is performed, and the this pointer of this method points to the cloned object. Communication between threads can be achieved by synchronizing access to the static zone. Improved synchronized although in most cases $task eliminate the need for synchronous operations, not all multithreaded systems are implemented with tasks. Therefore, there is also a need to improve existing threading modules. The Synchronized keyword has the following disadvantages: You cannot specify a timeout value. A thread that is waiting for a request lock cannot be interrupted. Multiple locks cannot be requested securely. (Multiple locks can only be obtained in order.) The solution to these problems is to extend the syntax of the synchronized so that it supports multiple parameters and can accept a timeout specification (as specified in the brackets below). Here's the syntax I want: synchronized (x && y && z) Gets the locks for x, y, and Z objects. Synchronized (x | | y | | z) gets the lock for an x, Y, or Z object. Synchronized ((x && y) | | z) Some extensions for the preceding code. Synchronized (...) [1000] Set a 1-second timeout to obtain a lock. SYNCHRONIZED[1000] F () {...} Gets the lock for this when it enters the F () function, but can have a 1-second timeout. TimeoutException is a runtimeexception derived class that is thrown when it waits for a timeout. Timeouts are needed, but not enough to make your code strong. You also need to have the ability to abort request lock waits from the outside. Therefore, when a interrupt () method is passed to a thread that waits for a lock, this method should throw a Synchronizationexception object and break the waiting thread. This exception should be a derived class of runtimeexception so that it does not have to be handled specifically. The main problem with these recommended change methods for synchronized syntax is that they need to be modified at the binary code level. The code now uses the Enter monitoring (enter-monitor) and exit monitoring (Exit-monitor) directives to implement synchronized. These directives do not have parameters, so it is necessary to extend the definition of the binary code to support multiple lock requests. But this modification is no easier than modifying a Java virtual machine in Java 2, but it is backwards compatible with existing Java code. Another problem that can be resolved is the most common deadlock scenario, in which two threads are waiting for the other party to complete an operation. Consider the following example (hypothetical): Imagine a thread invoking a (), but is deprived of the right to run before acquiring the Lock1 after acquiring the Lock2. The second thread goes into operation, calls B (), obtains Lock2, but because the first thread occupies lock1, it cannot get lock1, so it is then in a wait state. At this point the first thread is awakened, it tries to get Lock2, but is not available because it is occupied by a second thread. A deadlock occurs at this time. The following synchronize-on-multiple-objects syntax solves this problem: the compiler (or virtual machine) rearranges the order of request locks so that Lock1 is always first obtained, which eliminates deadlocks. However, this approach does not always succeed in multithreading, so there are ways to automatically break deadlocks. A simpleis to wait for a second lock to release the acquired lock at times. This means that you should take the following waiting method instead of waiting forever: If each program that waits for a lock uses a different time-out value, it breaks the deadlock and one of the threads can run. I recommend replacing the previous code with the following syntax: The synchronized statement will always wait, but it will often discard the acquired lock to break the potential deadlock. Ideally, the timeout value for each repeat wait is a random value from the previous one. Improvements to wait () and notify () Wait ()/notify () systems also have some problems: unable to detect if Wait () is returning normally or because of a time-out. You cannot use a traditional condition variable to implement a "signal" (signaled) state. Nested monitoring (monitor) locking is too easy to occur. The time-out detection problem can be resolved by redefining wait () so that it returns a Boolean variable instead of void. A true return value indicates a normal return, while false indicates that the time-out is returned. The concept of state-based conditional variables is important. If this variable is set to False, the waiting thread will be blocked until the variable enters a true state, and any waiting thread for a conditional variable waiting for true is automatically freed. (In this case, the wait () call does not block.) )。 This feature can be supported by the following syntax for extending notify (): Nested monitoring locking problems are very cumbersome, and I have no simple solution. Nested monitoring locks are a form of deadlock that occurs when a lock's owning thread does not release the lock before it suspends itself. The following is an example (or hypothetical) of this problem, but the actual example is very numerous: in this case, there are two locks involved in the get () and put () operations: One on the Stack object and the other on the LinkedList object. Let's consider the case when a thread tries to invoke a pop () operation on an empty stack. This thread obtains both locks, and then calls Wait () to release the lock on the Stack object, but does not release the lock on the list. If the second thread attempts to push an object into the stack at this point, it will hang forever on the Synchronized (list) statement and will never be allowed to press into an object. A deadlock occurs because the first thread waits for a non-empty stack. This means that the first thread can never be returned from wait () because it occupies a lock and causes the second thread to never run to the Notify () statement. In this case, there are a number of obvious ways to solve the problem: for example, using any methodSynchronous. But in the real world, the solution is usually not so simple. One possible way to do this is to release all locks acquired by the current thread in reverse order in wait (), and then retrieve them in the original fetch order when the wait conditions are met. However, I can imagine that the code used in this way is simply incomprehensible to people, so I don't think it's a really workable approach. If you have a good way, please send me an e-mail. I also hope to wait until the following complex conditions are fulfilled. For example, where a, B, and C are arbitrary objects. The ability to modify the thread class to support both preemptive and collaborative threading is a basic requirement in some server applications, especially if you want the system to achieve maximum performance. I think the Java programming language has gone too far in simplifying the threading model, and the Java programming language should support Posix/solaris's "green" and "lightweight (lightweight) processes" (Taming Java Threads Discussed in the first chapter). This means that some implementations of Java virtual machines (such as Java Virtual machines on NT) should emulate a collaborative process within them, and other Java virtual machines should emulate a preemptive thread. And it's easy to add these extensions to a Java virtual machine. A Java thread should always be preemptive. This means that a thread in a Java programming language should work like a lightweight process of Solaris. The Runnable interface can be used to define a Solaris-style "green thread" that must be able to transfer control to other green threads running in the same lightweight process. For example, the current syntax: can effectively produce a green thread for a Runnable object and bind it to a lightweight process represented by the Thread object. This implementation is transparent to the existing code because its validity is exactly the same as the existing one. To make a Runnable object a green thread, you can extend the existing syntax of the Java programming language to support multiple green threads in a single lightweight thread by simply passing several Runnable objects to the thread's constructor. (Green threads can collaborate with each other, but they can be preempted by green processes (Runnable objects) running on other lightweight processes (thread objects). )。 For example, the following code creates a green thread for each Runnable object that shares the lightweight process represented by the thread object. The existing overwrite (override) Thread object and the habit of implementing run () continue to work, but it should map to aA green thread that is bound to a lightweight process. (The default run () method in the Thread () class creates a second Runnable object efficiently internally.) Collaboration between threads should include more features in the language to support inter-threading communication. Currently, the PipedInputStream and PipedOutputStream classes can be used for this purpose. But for most applications, they are too weak. I recommend adding the following functions to the thread class: Add a Wait_for_start () method, which is usually in a blocking state until the run () method of a thread starts. (This is fine if the waiting thread is freed before calling run.) In this way, a thread can create one or more worker threads and ensure that the worker threads are running before the creation thread continues to perform the operation. Add $send (Object o) and object= $receive () methods (to the object class), which transfer objects between threads using an internal blocking queue. The blocking queue should be automatically created as a by-product of the first $send () call. The $send () call adds the object to the queue. The $receive () call is usually in a blocked state until an object is queued and then it returns this object. Variables in this method should support the ability to set queue and out-of-queue operation timeouts: $send (Object o, long Timeout), and $receive (long timeout). The concept of an internal support read-write lock for read-write locks should be built into the Java programming language. The reader lock is discussed in detail in "Taming Java Threads" (and elsewhere): a read-write lock allows multiple threads to access an object at the same time, but only one thread can modify the object at the same time, and cannot be modified while the access is in progress. The syntax of a read-write lock can be borrowed from the Synchronized keyword: For an object, multiple threads should be supported to enter $reading block only if there are no threads in the $writing block. During a read operation, a thread attempting to enter $writing block is blocked until the read thread exits the $reading block. When another thread is in the $writing block, the thread attempting to enter the $reading or $writing block is blocked until the write thread exits the $writing block. If both the read and write threads are waiting, by default, the read line routines first. However, you can use the $writer_priority property to modify the definition of the class to change the default mode. Such as:The access section created by the object should be illegal in the current case, JLS allows access to the partially created object. For example, a thread created in a constructor can access the object being created, even if the object is not completely created. The result of the following code cannot be determined: the thread that sets X to 1 can be at the same time as the thread that sets x to 0. Therefore, the value of x cannot be predicted at this time. One workaround for this problem is to disable the run () method for the thread created in this constructor, even if it has a higher priority than the thread calling new, before the constructor returns. This means that the start () request must be deferred until the constructor returns. In addition, the Java programming language should allow the synchronization of constructors. In other words, the following code (which is illegal in the present case) works as expected: I think the first method is more concise than the second one, but it is more difficult to implement. The volatile keyword should work as expected JLS requires retention of requests for volatile operations. Most Java virtual machines simply ignore this part of the content, which is not the way. In the case of multiprocessor, many hosts have this problem, but it should have been solved by JLS. If you're interested in this, Bill Pugh of the University of Maryland is working on this (see Resources). A problem with access is that if good access control is missing, threading can be very difficult to program. In most cases, you do not have to consider thread safety (THREADSAFE) issues if you can ensure that threads are only called from the synchronization subsystem. I recommend restricting access to the Java programming language by using the Package keyword precisely. I think that when the default behavior is a flaw in any computer language, I'm confused about the default permissions that are present (and the default is "package" level instead of "private"). In other respects, the Java programming language does not provide equivalent default keywords. Although using an explicit package qualifier breaks existing code, it makes the code more readable and eliminates potential errors for the entire class (for example, if access is due to errors being ignored rather than deliberately ignored). Re-introduce private protected, which functions the same as the current protected, but should not allow package-level access. Allow private private syntax to specify that implemented access is proprietary to all external objects, even to the same class as the current object. For "." The only reference to the left (implicit or explicit) should be this. Expands the syntax of public to authorize it to develop specific classes ofAccess. For example, the following code should allow an object of the Fred class to call Some_method (), but the method should be private for objects of other classes. This advice differs from the "friend" mechanism of C + +. In the "friend" mechanism, it authorizes a class to access all private parts of another class. Here, I recommend tightly controlled access to a limited set of methods. In this way, a class can define an interface for another class, which is not visible to the rest of the system class. One obvious change is that unless the domain refers to an object that is truly immutable (immutable) or a static final base type, all domains should be defined as private. Direct access to a domain in a class violates the two basic rules of OO Design: abstraction and encapsulation. From a thread's point of view, allowing direct access to a domain makes it easier to make non-synchronous access to it. Add $property keyword. An object with this keyword can be accessed by a "bean box" application that uses the reflection operations (introspection) API defined in class classes, otherwise it works with private private. $property properties are available in fields and methods so that an existing JavaBean Getter/setter method can be easily defined as a property. Invariance (immutability) because access to immutable objects does not require synchronization, in multithreaded conditions, the invariant concept (the value of an object cannot be changed after creation) is priceless. In Java programming language, the implementation of immutability is not strict enough, there are two reasons: for an immutable object, it can be accessed before it is completely created. This access can produce incorrect values for some domains. The definition for constant (all fields of the class is final) is too loose. For objects specified by the final reference, although the reference itself cannot be changed, the object itself can change state. The first problem can be resolved by not allowing the thread to start executing in the constructor (or to not execute the start request until the constructor returns). For the second problem, this problem can be resolved by qualifying the final modifier to point to the constant object. This means that for an object, only all of the fields are final, and all referenced objects are final, and this object is truly constant. In order not to break existing code, this definition can be enhanced with the compiler, which is the invariant class when only one class is explicitly marked as invariant. Here's how: with the $immutable modifier, the final modifier in the domain definition is optional. Finally, when using the inner class (inner class), the JavaAn error in the translator makes it impossible to reliably create immutable objects. When a class has important inner classes (which my code often has), the compiler often incorrectly displays the following error message: This error message occurs even if the null final is initialized in each constructor. This error has been in the compiler since the introduction of the inner class in version 1.1. In this release (three years later), this error persists. Now it's time to correct the mistake. For instance-level access to class-level domains in addition to access permissions, there is also the problem that class-level (static) methods and instance (non-static) methods have direct access to class-level (static) domains. This access is very dangerous because the synchronization of the instance method does not acquire a class-level lock, so a synchronized static method and a synchronized method can access the domain of the class at the same time. An obvious way to correct this problem is to require that only the static access method in the instance method be used to access the static domain of the immutable class. Of course, this requirement requires a compiler and run-time check. Under this rule, the following code is illegal: since F () and g () can run in parallel, they can simultaneously change the value of x (resulting in indeterminate results). Remember, there are two locks: the static method requires locks belonging to class objects, and not static methods that require locks belonging to such instances. When accessing a non-immutable static domain from an instance method, the compiler should require that either of the following two structures be satisfied: Or, the compiler should be using a read/write lock: Another way is (this is also an ideal method)-the compiler should automatically use a read/write lock to synchronize access to the non-immutable static domain. This way, programmers do not have to worry about this problem. Abrupt end of the background thread when all non-background threads are terminated, the background thread ends abruptly. When a background thread creates some global resources (such as a database connection or a temporary file), these resources are not closed or deleted when the latter thread ends, causing problems. For this issue, I recommend making the rule that the Java virtual machine does not close the application under the following circumstances: any non-background threads are running, or there are any background threads executing a synchronized method or synchronized code block. A background thread can be closed immediately after it executes the synchronized block or synchronized method. Re-introducing the Stop (), suspend (), and resume () keywords for practical reasons this may not be possible, but I hope not to abolish stop () (in Thread and Threadgroup). However, I will change the stop ()Semantics so that it does not break existing code when it is called. However, with regard to stop (), keep in mind that when a thread terminates, stop () Releases all locks, which may potentially cause the thread that is working on this object to enter an unstable (local modification) state. Because the stopped thread has freed all of its locks on this object, these objects can no longer be accessed. For this problem, you can redefine the behavior of stop () so that the thread terminates immediately only if no locks are occupied. If it occupies a lock, I recommend that this thread release the last lock before terminating it. You can implement this behavior using a mechanism similar to throwing an exception. The Stop line thread executes sets a flag and tests this flag as soon as all synchronization blocks are exited. If this flag is set, an implicit exception is thrown, but the exception should no longer be snapped and no output will be produced when the thread ends. Note that Microsoft's NT operating system does not handle the abrupt stop (abrupt) of an external instruction well. (It does not notify the dynamic Connection Library of the Stop message, so it can cause system-level resource vulnerabilities.) This is why I recommend using a similar exception method to simply cause the run () to return. The real problem with this and the exception-like approach is that you must insert code after each synchronized block to test the "stopped" flag. And this additional code reduces system performance and increases the length of the code. Another way I can think of is to have stop () implement a "deferred (lazy)" stop, in which case the next call to wait () or yield () is terminated. I also want to add a isStopped () and stopped () method to thread (at this point, the thread will work like isinterrupted () and interrupted (), but will detect the "stop-requested" state) 。 This method is not common to the first, but is feasible and does not overload. The suspend () and resume () methods should be put back into the Java programming language, which is useful and I don't want to be a kindergarten child. Because they may be potentially dangerous (when suspended, a thread can occupy a lock), it makes no sense to remove them. Please let me decide for myself whether to use them. If the received line is impersonating occupy the lock, Sun should treat them as a run-time exception (Run-time exception) that calls suspend (), or, better yet, defer the actual suspend process until all locks are released by the thread. Blocked I/O should be able to interrupt any blocked operation instead of just having them wait () and SLEep (). I discussed this issue in the socket section of "Taming Java Threads" in the second chapter. But now, for I/O operations on a blocked socket, the only way to interrupt it is to close the socket, and there is no way to interrupt a blocked file I/O operation. For example, once a read request is started and a blocking state is entered, the thread is blocked until it actually reads something. Even closing the file handle does not interrupt the read operation. Also, the program should support timeouts for I/O operations. This method should also be supported for all objects that may be blocking operations (such as the InputStream object): This is equivalent to the Setsotimeout (time) method of the Socket class. Similarly, it should be supported to pass the timeout as a parameter to the blocking call. The Threadgroup class Threadgroup should implement all the methods in thread that can change the state of the threads. I particularly want it to implement the join () method so that I can wait for all threads in the group to terminate. Summing up the above is my suggestion. As I said in the title, if I were the king ... (ay). I hope that these changes (or other equivalent methods) can eventually be introduced into the Java language. I do think that the Java language is a great programming language, but I also think that the Java threading model is not well designed, which is a pity. However, the Java programming language is evolving, so there is a prospect to improve. This article is an update excerpt of The Taming Java Threads. This book explores the pitfalls and problems of multithreaded programming in the Java language, and provides a thread-dependent Java package to address these issues. Bill Pugh of the University of Maryland is working to modify JLS to improve its threading model. Bill's proposal is not as broad as this article suggests, and he focuses on making the existing threading model work in a more logical way. More information is available from www.cs.umd.edu/~pugh/java/memoryModel/. All Java language specifications can be found from the Sun Web site. To look at threads from a purely technical perspective, refer to Doug Lea's Concurrent Programming in Java:design Principles and Patterns Second edition. This is a great book, but its style is very academic and not necessarily suitable for all readers. "Taming Java Threads" is a very good supplementary reading. by Scott Oaks The Java Threads written by Henry Wong is lighter than taming Java Threads, but it's more appropriate if you've never written a thread program. Oaks and Wong have also implemented the help classes provided by Holub, and it is always helpful to look at different solutions to the same problem. Threads PRIMER:A Guide to multithreaded programming, written by Bill Lewis and Daniel J. Berg, is a good introduction to threading (not limited to Java). Some technical information about Java threads can be found on the Sun website. In "Multiprocessor Safety and Java" Paul Jakubik discusses the SMP problem of multithreaded systems.
Flaws in Java threads