When customizing ThreadPoolExecutor, follow the KISS Principle and generally provide the best performance. ForkJoinPoolA new thread pool is introduced in Java 7: ForkJoinPool.
Like ThreadPoolExecutor, it also implements the Executor and ExecutorService interfaces. It uses an infinite queue to save the tasks to be executed, and the number of threads is passed through the constructor. If the expected number of threads is not passed into the constructor, the number of CPUs available on the current computer is set to the number of threads as the default value.
ForkJoinPool is mainly used to solve the problem by Divide-and-Conquer Algorithm. Typical applications such as the Quick Sort Algorithm. The main point here is that ForkJoinPool requires a relatively small number of threads to process a large number of tasks. For example, to sort 10 million pieces of data, the task will be divided into two 5 million sorting tasks and a merge task for these two sets of 5 million pieces of data. Similarly, 5 million of data will be split, and a threshold value will be set at the end to specify the number of data shards to stop. For example, when the number of elements is less than 10, the split will be stopped and the elements will be sorted Using Insert sorting.
In the end, there will be about 2000000 + tasks. The key to the problem is that a task can be executed only after all its subtasks are completed.
Therefore, when ThreadPoolExecutor is used, there will be a problem with the grouping method, because the threads in ThreadPoolExecutor cannot add another task in the task queue and continue to execute the task after the task is completed. When ForkJoinPool is used, the thread can create a new task and suspend the current task. In this case, the thread can select a subtask for execution from the queue.
For example, if we need to count the number of elements smaller than 0.5 in a double array, we can use ForkJoinPool for implementation as follows:
public class ForkJoinTest { private double[] d; private class ForkJoinTask extends RecursiveTask
{ private int first; private int last; public ForkJoinTask(int first, int last) { this.first = first; this.last = last; } protected Integer compute() { int subCount; if (last - first < 10) { subCount = 0; for (int i = first; i <= last; i++) { if (d[i] < 0.5) subCount++; } } else { int mid = (first + last) >>> 1; ForkJoinTask left = new ForkJoinTask(first, mid); left.fork(); ForkJoinTask right = new ForkJoinTask(mid + 1, last); right.fork(); subCount = left.join(); subCount += right.join(); } return subCount; } } public static void main(String[] args) { d = createArrayOfRandomDoubles(); int n = new ForkJoinPool().invoke(new ForkJoinTask(0, 9999999)); System.out.println("Found " + n + " values"); }}
The preceding keys are the fork () and join () methods. In the thread used by ForkJoinPool, an internal queue is used to operate the tasks to be executed and subtasks to ensure their execution order.
So what is the performance difference when ThreadPoolExecutor or ForkJoinPool is used?
First, the ForkJoinPool can use a limited number of threads to complete many parent-child tasks. For example, four threads are used to complete more than 2 million tasks. However, it is impossible to use ThreadPoolExecutor, because threads in ThreadPoolExecutor cannot choose to execute subtasks first. To complete 2 million parent-child tasks, 2 million threads are required, obviously this is not feasible.
Of course, in the above example, we can also avoid the division and control method. Because of the independence between tasks, we can divide the entire array into several regions and then use ThreadPoolExecutor to solve the problem, this method does not create a large number of subtasks. The Code is as follows:
public class ThreadPoolTest { private double[] d; private class ThreadPoolExecutorTask implements Callable
{ private int first; private int last; public ThreadPoolExecutorTask(int first, int last) { this.first = first; this.last = last; } public Integer call() { int subCount = 0; for (int i = first; i <= last; i++) { if (d[i] < 0.5) { subCount++; } } return subCount; } } public static void main(String[] args) { d = createArrayOfRandomDoubles(); ThreadPoolExecutor tpe = new ThreadPoolExecutor(4, 4, Long.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue()); Future[] f = new Future[4]; int size = d.length / 4; for (int i = 0; i < 3; i++) { f[i] = tpe.submit(new ThreadPoolExecutorTask(i * size, (i + 1) * size - 1); } f[3] = tpe.submit(new ThreadPoolExecutorTask(3 * size, d.length - 1); int n = 0; for (int i = 0; i < 4; i++) { n += f.get(); } System.out.println("Found " + n + " values"); }}
When ForkJoinPool and ThreadPoolExecutor are used respectively, the time they handle this problem is as follows:
Number of threads |
ForkJoinPool |
ThreadPoolExecutor |
1 |
3.2 s |
0.31 s |
4 |
1.9 s |
0.15 s |
GC during execution is also monitored. It is found that the total GC time is 1.2 s when ForkJoinPool is used, while ThreadPoolExecutor does not trigger any GC operations. This is because a large number of subtasks are created during the ForkJoinPool operation. When they are executed, they will be recycled. Otherwise, ThreadPoolExecutor does not create any subtasks, and thus does not cause any GC operations.
Another feature of ForkJoinPool is its ability to implement Work Stealing. Each thread in the thread pool maintains a queue to store tasks to be executed. When all tasks in the thread's queue are completed, it will get unexecuted tasks from other threads and help them execute them.
You can use the following code to test the Work Stealing feature of ForkJoinPool:
for (int i = first; i <= last; i++) { if (d[i] < 0.5) { subCount++; } for (int j = 0; j < d.length - i; j++) { for (int k = 0; k < 100; k++) { dummy = j * k + i; // dummy is volatile, so multiple writes occur d[i] = dummy; } }}
Because the number of loops in the layer (j) depends on the value of I in the outer layer, the execution time of this Code depends on the value of I. When I = 0, the execution time is the longest, while when I = last, the execution time is the shortest. This means that the workload of the task is different. When the I value is small, the workload of the task is large. As I increases, the workload of the task becomes smaller. Therefore, this is a typical scenario where task load is not balanced.
In this case, it is not appropriate to select ThreadPoolExecutor, because the threads in it do not pay attention to the difference in the number of tasks between each task. When the thread of the task with the smallest number of tasks is completed, it will be in the Idle state (Idle), waiting for the task with the largest number of tasks to be completed.
The ForkJoinPool scenario is different. Even if the workload of a job is different, when a thread executes a job with a heavy workload, other Idle threads will help it complete the remaining tasks. Therefore, the thread utilization is improved and the overall performance is improved.
The execution time of these two thread pools when the task workload is not balanced:
Number of threads |
ForkJoinPool |
ThreadPoolExecutor |
1 |
54.5 s |
53.3 s |
4 |
16.6 s |
24.2 s |
Note that when the number of threads is 1, the execution time difference between the two is not obvious. This is because the total computing workload is the same, while the ForkJoinPool is slow because it creates many tasks and increases the GC workload.
When the number of threads increases to 4, the execution time is significantly different. The ForkJoinPool performance is nearly 50% better than ThreadPoolExecutor. It can be seen that Work Stealing is not balanced when the number of tasks is not balanced, ensures resource utilization.
Therefore, when the task volume is balanced, it is always better to select ThreadPoolExecutor. Otherwise, select ForkJoinPool.
In addition, for ForkJoinPool, there is another factor that will affect its performance, that is, the threshold value to stop splitting tasks. For example, in the previous quick sorting, when the number of remaining elements is less than 10, the creation of subtasks will be stopped. The following table shows the ForkJoinPool performance under different thresholds:
Number of threads |
ForkJoinPool |
20 |
17.8 s |
10 |
16.6 s |
5 |
15.6 s |
1 |
16.8 s |
It can be found that when the threshold value is different, it will also affect the performance. Therefore, when ForkJoinPool is used, this threshold value is tested. using the most appropriate value also contributes to overall performance.
Automatic Parallelization)In Java 8, the concept of automatic parallelization is introduced. It allows some Java code to be automatically executed in parallel, provided that ForkJoinPool is used.
Java 8 adds a general thread pool for ForkJoinPool to process tasks that are not explicitly submitted to any thread pool. It is a static element of the ForkJoinPool type and has the default number of threads equal to the number of processors on the running computer.
When the new method added on the Arrays class is called, automatic parallelization will occur. For example, it is used to sort the parallel and fast sorting of an array and to traverse the elements in an array in parallel. Automatic parallelization is also applied to the newly added Stream API of Java 8.
For example, the following code is used to traverse the elements in the list and perform the required calculations:
Stream
stream = arrayList.parallelStream();stream.forEach(a -> { String symbol = StockPriceUtils.makeSymbol(a); StockPriceHistory sph = new StockPriceHistoryImpl(symbol, startDate, endDate, entityManager);});
The calculation of the elements in the list is executed in parallel. The forEach method creates a task for the computing operation of each element, which is processed by the general thread pool in the ForkJoinPool mentioned above. The preceding parallel computing logic can also be completed using ThreadPoolExecutor, but ForkJoinPool is superior in terms of code readability and amount.
For the number of threads in the general thread pool of ForkJoinPool, you can use the default value, that is, the number of processors of the computer at runtime. To adjust the number of threads, you can set the system attributes:-Djava.util.concurrent.ForkJoinPool.common.parallelism=N
The following set of data is used to compare the performance when the general thread pool in ThreadPoolExecutor and ForkJoinPool is used to complete the preceding simple computing:
Number of threads |
ThreadPoolExecutor (seconds) |
ForkJoinPool Common Pool (seconds) |
1 |
255.6 |
135.4 |
2 |
134.8 |
110.2 |
4 |
77.0 |
96.5 |
8 |
81.7 |
84.0 |
16 |
85.6 |
84.6 |
Note that when the number of threads is 1, 2, 4, the performance difference is obvious. The performance of the ForkJoinPool common thread pool with 1 thread and ThreadPoolExecutor with 2 threads is very close.
The reason for this is that the forEach method uses some tricks. It also uses the thread that executes forEach as a worker thread in the thread pool. Therefore, even if you set the number of threads in the general thread pool of ForkJoinPool to 1, there will actually be two working threads. Therefore, when using forEach, The ForkJoinPool common thread pool with a thread number of 1 is equivalent to ThreadPoolExecutor with a thread Number of 2.
Therefore, when the ForkJoinPool general thread pool actually requires four working threads, you can set it to 3, and the available working threads at runtime are 4.
Summary
- Use ForkJoinPool when processing recursive grouping algorithms.
- Carefully set the threshold value for no job Division. This threshold value has an impact on performance.
- Some features in Java 8 use the general thread pool in ForkJoinPool. In some cases, you need to adjust the default number of threads in the thread pool.