This article discusses several approaches to parallel processing in Java applications. From managing Java threads to a variety of other workarounds, executor services, Forkjoin frameworks, and actor models in calculations. 4 styles of Java Concurrent Programming: Threads,executors,forkjoin and actors
We live in a world where things happen in parallel. Naturally, the programs we write also reflect this feature, and they can be executed concurrently. Of course, in addition to Python code (translator Note: The link inside the python's global interpreter lock, explained the reason), but you can still use Jython to run your program on the JVM, to take advantage of the power of multiprocessor computers.
However, the complexity of concurrent programs goes far beyond the processing power of the human brain. In comparison, we are simply weak: we are not born to think about multithreaded routines, evaluate concurrent access to limited resources, and predict where errors or bottlenecks can occur.
Faced with these difficulties, humans have summed up a number of concurrent computing solutions and models. These models emphasize different parts of the problem, and when we implement parallel computing, we can make different choices based on the problem.
In this article, I'll use a different code to implement the concurrency solution for the same problem, and then discuss what's good and what's wrong with these scenarios, and what pitfalls might be waiting for you.
We'll cover the following ways of concurrent processing and asynchronous code:
• Bare Thread
Executors and services
forkjoin framework and parallel streams
Actor Model
To be more interesting, I didn't just use some code to illustrate these methods, but instead used a common task, so the code in each section was almost equivalent. In addition, the code is only displayed, the initialized code is not written, and they are not product-level software examples.
By the way, one last thing: At the end of the article, there's a small survey of what concurrency patterns you or your organization are using. For your fellow engineers, please fill out the survey!
Task
Task : Implement a method that receives a message and a set of strings as parameters that correspond to the query page of a search engine. For each string, this method emits an HTTP request to query the message and returns the first available result, as soon as possible.
If an error occurs, it is possible to throw an exception or return null. I'm just trying to avoid an infinite loop in order to wait for results.
Simple note: This time I will not really go into the details of how multithreading communicates, or go deep into the Java memory model. If you are eager to understand this, you can look at my previous article using jcstress to test concurrency.
So, let's implement concurrency on the JVM from the most direct and core approach: Manually manage the bare thread.
Method 1: Use "original" Bare thread
Liberate your code, return to nature, and use bare threads! Threads are the most basic unit of concurrency. Java threads are inherently mapped to operating system threads, and each thread object corresponds to a computer's underlying thread.
Naturally, the JVM manages the lifetime of the thread, and you don't need to focus on thread scheduling as long as you don't need to communicate between threads.
Each thread has its own stack space, which takes up a specified portion of the JVM's process space.
The interface of the thread is quite concise, you just need to provide a Runnable, call . Start () to begin the calculation. There is no ready-made API to end the thread, you need to implement it yourself and communicate through a Boolean-like tag.
In the following example, we create a thread for each search engine that is queried. The result of the query is set to Atomicreference, which does not require a lock or other mechanism to guarantee that only one write occurs. Let's go!
Private StaticString Getfirstresult (String question, list<string>engines) {atomicreference<String> result =NewAtomicreference<>(); for(String base:engines) {string URL= base +question; NewThread ((){Result.compareandset (NULL, Ws.url (URL). get ()); }). Start (); } while(Result.get () = =NULL);//wait for some the result to appear returnresult.get ();}
The main advantage of using bare threads is that you are close to the operating system/hardware model for concurrent computing, and this model is very simple. Multiple threads run, through shared memory communication, that's it.
The biggest disadvantage of managing threads yourself is that you can easily get too focused on the number of threads. Threads are very expensive objects, and creating them takes a lot of memory and time. This is a contradiction, too few threads, you can not get good concurrency, too many threads, will likely lead to memory problems, scheduling also become more complex.
However, if you need a quick and simple solution, you can definitely use this method and don't hesitate.
Method 2: Treat executor and Completionservice seriously
Another option is to use the API to manage a set of threads. Fortunately, the JVM provides us with the capability to Executor interfaces. The definition of the executor interface is simple:
Public Interface Executor { void execute (Runnable command);}
It hides the details of how to handle runnable. It just says, "Developer! You're just a bag of meat, give me the task, I'll deal with it! ”
Even cooler, the executors class provides a set of methods to create a well-configured thread pool and executor. We will use Newfixedthreadpool (), which creates a predefined number of threads, and does not allow the number of threads to exceed this predefined value. This means that if all the threads are used, the submitted command will be placed in a queue to wait for, of course, this is managed by executor.
On top of it, there are executorservice management executor life cycles, and completionservice will abstract out more details as a queue for completed tasks. Thanks to this, we don't have to worry about getting the first result.
A call to Service.take () below will return only one result.
Private StaticString Getfirstresultexecutors (String question, list<string>engines) {Executorcompletionservice<String> Service =NewExecutorcompletionservice<string> (Executors.newfixedthreadpool (4)); for(String base:engines) {string URL= base +question; Service.submit (()- { returnws.url (URL). get (); }); } Try { returnService.take (). get (); } Catch(Interruptedexception |executionexception e) { return NULL; }}
Executor and executor services will be the right choice if you need precise control of the number of threads produced and their precise behavior. For example, one important question that needs careful consideration is what kind of strategy is needed when all the threads are busy doing other things? Increase the number of threads or do not limit the quantity? Put the task into the queue to wait? What if the queue is full? Unlimited increase in queue size?
Thanks to the JDK, there are many configuration items that answer these questions and have intuitive names, such as Executors.newfixedthreadpool (4) above.
The life cycle of threads and services can also be configured with options so that resources can be shut down at the right time. The only inconvenience is that for beginners, configuration options can be simpler and more intuitive. However, in terms of concurrent programming, you can hardly find anything simpler.
In short, for large systems, I personally think it is most appropriate to use executor.
Method 3: Use Forkjoinpool (FJP) with parallel streams
Java 8 joins the parallel stream, from which we have a simple way to work with collections in parallel. Together with lambda, it forms a powerful tool for concurrent computing.
If you plan to use this approach, there are a few things to note. First of all, you have to master some of the concepts of functional programming, which actually has more advantages. Second, it's hard to know whether a parallel stream actually uses more than one thread, which is determined by the concrete implementation of the stream. If you can't control the data source of a stream, you can't be sure what it does.
In addition, you need to remember that by default the Forkjoinpool.commonpool () is implemented in parallel. This common pool is managed by the JVM and is shared by all threads within the JVM process. This simplifies configuration items, so you don't have to worry.
Private Static String Getfirstresult (String question, list<string>// get element as soon as it is available< /c4> optional<string> result = Engines.stream (). Parallel (). Map (base), {= base + question; return return result.get ();}
Looking at the example above, we don't care where the individual tasks are done and by whom. However, this also means that there may be some stalled tasks in your application that you can't even know about. In another article on parallel streams, I described the problem in detail. And there is a workaround, though it is not the most intuitive solution in the world.
Forkjoin is a good framework that is written and preconfigured by people smarter than me. So when I need to write a small program that contains parallel processing, it's my first choice.
Its biggest drawback is that you have to anticipate the complications it may produce. This is hard to do if you don't have a thorough understanding of the JVM as a whole. This can only come from experience.
Method 4: Hire an actor
The actor model is a strange addition to the approach we have explored in this article. There is no actor implementation in the JDK, so you must refer to some libraries that implement the actor.
To put it briefly, in the Actor model, you think of everything as an actor. An actor is a computational entity, just like the thread in the first example above, it can receive messages from other actors because everything is actor.
When a message is answered, it can send a message to another actor, create a new actor and interact with it, or change only its internal state.
Quite simple, but this is a very powerful concept. The lifecycle and messaging are managed by your framework, and you just need to specify what the compute unit is. In addition, the Actor model emphasizes the avoidance of global state, which brings a lot of convenience. You can apply monitoring policies such as free retry, simpler distributed system design, error tolerance, and more.
Here is an example of using Akka actors. Akka actors has a Java interface and is one of the most popular JVM actor libraries. In fact, it also has a Scala interface and is the current default actor library for Scala. Scala used to implement the actor internally. Many JVM languages have implemented actors, such as Fantom. These demonstrate that the actor model has been widely accepted and is seen as a very valuable complement to language.
Static classMessage {String URL; Message (String URL) { This. url =URL;}}Static classResult {String html; Result (String html) { This. html =html;}} Static classUrlfetcherextendsUntypedactor {@Override Public voidOnReceive (Object message)throwsException {if(MessageinstanceofMessage) {Message work=(message) message; String result=Ws.url (Work.url). get (); Getsender (). Tell (Newresult (Result), getself ()); } Else{unhandled (message); } }} Static classQuerierextendsUntypedactor {PrivateString question;PrivateList<string>engines;PrivateAtomicreference<string>result; PublicQuerier (String question, list<string> engines, atomicreference<string>result) { This. Question =question; This. Engines =engines; This. result =result;} @Override Public voidOnReceive (Object message)throwsException {if(MessageinstanceofResult) {Result.compareandset (NULL, (Result) message). html); GetContext (). Stop (self ()); } Else { for(String base:engines) {string URL= base +question; Actorref Fetcher= This. GetContext (). Actorof (Props.create (urlfetcher.class), "fetcher-" +Base.hashcode ()); Message m=NewMessage (URL); Fetcher.tell (M, Self ()); } } }} Private StaticString Getfirstresultactors (String question, list<string>engines) {Actorsystem system= Actorsystem.create ("Search"); Atomicreference<String> result =NewAtomicreference<>(); FinalActorref q =system.actorof (Props.create (untypedactorfactory) ()-NewQuerier (question, engines, result)), "Master"); Q.tell (NewObject (), Actorref.nosender ()); while(Result.get () = =NULL); returnresult.get ();}
The Akka actor uses the Forkjoin framework internally to handle work. The code here is lengthy. Don't worry. Most of the code is defined as message class messages and result, then two different actor:querier are used to organize all the search engines, and urlfetcher is used to get results from a given URL. The line of code here is much more because I don't want to write a lot of things on the same line. The power of the actor model comes from the interface of the props object, through which we can define specific selection patterns for the actor, custom email addresses, and so on. The result system is also configurable and contains only a few moving parts. This is a good sign!
One disadvantage of using the Actor model is that it requires you to avoid the global state, so you must design your application carefully, which can complicate the migration of your project. At the same time, it also has many advantages, so it is perfectly worthwhile to learn some new paradigms and use the new library.
Feedback time: What do you use?
What are your most common concurrency patterns? Do you understand what the computational pattern behind it is? Simply use a framework that includes a job or background task object to automatically add asynchronous computing power to your code?
To gather more information to find out if I should continue to explain some of the different concurrency patterns in more depth, such as writing an article about how Akka works, and the pros and cons of its Java interface, I created a simple survey. Dear readers, please fill out the survey form. I am very grateful for your interaction!
Summary
In this article we discuss several different ways to add parallelism to a Java application. Starting with our own management of Java threads, we gradually discover more advanced solutions that perform different executor services, Forkjoin frameworks, and actor computing models.
Don't know how to choose when you face real problems? They all have their pros and cons, and you need to make choices in terms of intuition and ease of use, configuration and increased/reduced machine performance.
Original link: Oleg Shelajev translation: Importnew.com-shenggordon
Link: http://www.importnew.com/14506.html
[ Reprint please keep the source, translator and translation links.] ]
Go Four flavors of Java concurrency: Thread, Executor, Forkjoin, and actor