Before thinking about how to reduce the latency of the API, there are several time-consuming operations in these APIs and serial execution, which can theoretically reduce the elapsed time by executing asynchronously, as shown in the following:
The specific implementation is simple, such as this:
Public classParallelretrievalexample {FinalCacheretriever Cacheretriever;FinalDbretriever Dbretriever;Parallelretrievalexample(Cacheretriever cacheretriever, Dbretriever dbretriever) { This.Cacheretriever= Cacheretriever; This.Dbretriever= Dbretriever; } PublicObjectRetrievecustomer(Final LongID) {FinalCompletablefuture<object> cachefuture = completablefuture.Supplyasync((), {returnCacheretriever.GetCustomer(ID); });FinalCompletablefuture<object> dbfuture = completablefuture.Supplyasync((), {returnDbretriever.GetCustomer(ID); });returnCompletablefuture.anyOf(Cachefuture, dbfuture); }
can be introduced with Java8 CompletableFuture
.
Don't repeat it here.
The main talk about such practices encountered pits and some of their own understanding.
Performance testing
Optimized code needs to be compared with unmodified (baseline) versions, considering performance under different loads.
Modifications to the API can use the AB tool, which is more convenient and can be used to simulate different loads by setting different concurrent users.
Testing is necessary, and many intuitively mention that high-performance points may not be able to improve or even perform as well as pre-optimization for reasons such as resource constraints on actual performance.
Tasks for handling & thread pool settings What kind of task do we want to optimize?
Tasks are also classified into three categories, computationally dense, io dense and mixed, in which the mixture can also be refined into the first two categories.
Computing in general Web development is less likely to be a bottleneck, primarily IO.
Some time-consuming blocking IO operations (databases, RPC calls) are often the cause of the slow interface, which is what is optimized for this type of operation.
However, rather than optimization, it is more appropriate to make these blocking operations asynchronous and shorten the overall time, and it is important to note where these tasks are located, if the logic in the last side of the API to optimize them is not necessary, or without affecting the business logic can be placed before them.
What kind of thread pool do we need?
As mentioned above to optimize the task is almost blocking IO, it means that these tasks take up CPU time is very short, mainly in the waiting state, the increase in the maximum cost of this thread is memory, the impact on the context switch is small.
Second, the number of threads must be limited, Java thread is too heavy, regardless of CPU factors also need to consider memory factors.
Finally, consider the situation where the thread pool is exhausted, and the worst is to go back to not optimizing, which is executing on the caller's thread.
CompletableFuture
runAsync
and supplyAsync
methods have no Executor
version, first look at the default thread pool is appropriate.
privatestaticfinal Executor asyncPool = useCommonPool ? ForkJoinPool.commonPoolnewThreadPerTaskExecutor();
useCommonPool
The judgment is based on ForkJoinPool
the degree of parallelism, which can simply be considered to be true (and can also be java.util.concurrent.ForkJoinPool.common.parallelism
set by parameters) before multicore.
The number of commonPool()
threads used is not many (the default and CPU cores are equal), the second ForkJoinPool
is designed for short-task operations, not for blocking IO, and the main slow operations we want to optimize are almost all blocking IO.
Next look at the demand is close Executors.newFixedThreadPool
, but through the implementation is not difficult to find his queue is unbounded, if the thread runs out of new tasks will wait, and can not use the Deny policy.
Only customized, according to the requirements mentioned above, customized as follows:
privatestaticfinalnew ThreadPoolExecutor(20200, TimeUnit.MILLISECONDS, new SynchronousQueue<Runnable>(), newCallerRunsPolicy());
The number of threads is fixed, the number of quantities can be adjusted according to the test situation, use SynchronousQueue
does not produce a queue, the deny policy is used to run on the caller thread, satisfies the need.
This thread pool is used specifically for IO intensive tasks, not for computationally intensive code use.
In practice encountered in this way results test performance is reduced by about 5 times times the situation, a look at the code in addition to the data from the database and a few for loops in the work of modifying the field, resulting in a large cost of context switching.
Thinking
In the above implementation, the reason for limiting the number of threads is because the overhead of the thread (which is mostly in memory) is too large, which means that the thread used here is too heavy, a better implementation should use a green thread-like technique and a 1-to-many mapping to the system thread.
In addition, the event-driven approach may be better in this scenario.
The core reason is whether synchronous blocking operations in the Java world are still in the majority, while the main optimization methods are using expensive threads, and some extensions that are easy to implement on other languages/platforms will encounter problems in Java.
In addition, Async is not supported by the language, causing asynchronous programming to be cumbersome and explicit in Java, C#
and the async
await
syntax sugar will be much sweeter.
The development after Java is still a long way to go.
Resources
Apache AB
Reactive design pattern The above diagram and ParallelRetrievalExample
code are taken from here
The cost of multithreading and context switching
Java Completablefuture Detailed
Concurrent Pain Thread,goroutine,actor
Using asynchronous tasks to reduce API Latency _ Practice Summary