Performance comparison of different concurrency implementations in Java

Source: Internet
Author: User
Tags square root java 8 stream

How does the Fork/join framework behave under different configurations?

Like the upcoming Star Wars, Java 8 's parallel stream is also mixed. The syntax of the parallel stream (Parallel stream) is as exciting as the new lightsaber in the trailer. There are many ways to implement concurrent programming in Java, and we want to see what the performance gains and risks are. After more than 260 tests to get the data to see, or add a lot of new ideas, here we would like to share with you.

Executorservice vs. Fork/join framework vs. parallel streams

Long, long ago, on a distant planet. Well, actually I just want to say that 10 years ago, Java concurrency was only possible through third-party libraries. Then Java 5 came and introduced the Java.util.concurrent package, which bears the imprint of the deep Doug Lea. Executorservice provides us with a simple way to manipulate the thread pool. Of course, the Java.util.concurrent package is also being perfected, and Java 7 introduces a Fork/join framework based on the Executorservice thread pool implementation. For many developers, the Fork/join framework is still very mysterious, so the Java 8 stream provides a way to use it more easily. Let's look at the differences between these ways.

We have two tasks to test, one is CPU-intensive, one is IO-intensive, the same function, in 4 different scenarios to test. The number of threads in different implementations is also a very important factor, so this is one of the goals of our testing. There are 8 cores in the test machine, so we use 4,8,16,32 threads to test them separately. For each task, we also test the next single-threaded version, but this is not shown in the figure because it takes much longer. If you want to know how these test cases work, you can take a look at the final Base library section. Here we go.

Index text for 5.8 million lines of 6GB size

In this test we generated a very large text file and indexed it in the same way. Let's see what the results are:

Single-Threaded execution time: 176,267 milliseconds, approximately 3 minutes. Note that it starts at 20000 milliseconds.

1. Too few threads will waste the CPU, and too much will increase the load

The first thing that is easy to notice is the shape of the histogram--the light from these 4 data will probably understand how each implementation behaves. 8 threads to 16 threads are skewed here because some threads are blocking the file IO here, so adding threads can make better use of CPU resources. When added to 32 threads, performance starts to get worse due to additional overhead.

2. Parallel streams perform best. About 1 seconds faster than direct use of fork/join

Parallel streams provide more than just syntactic sugars (this is not a lambda expression), and it also performs better than fork/join frameworks and Executorservice. It takes only 24.33 seconds to index files that are 6GB in size. Believe in Java, and it will perform well.

3. But. Parallel streaming is also the worst: it's more than 30 seconds.

Why parallel streaming affects performance, here's a lesson for you. This is possible on machines that are already running multi-threaded applications. Since the available threads are very few in themselves, it is better to use the Fork/join framework directly than to use parallel streams-the results of the two differ by 5 seconds, which is about 18% performance loss.

4. Do not use the default thread pool size if the IO operation is involved

The default thread pool size is used in the test (the default is the number of CPU cores for the machine, which is 8) in parallel streams, which is 2 seconds slower than using 16 threads. This means that using the default pool size is 7% slower. This is caused by a blocked IO thread. Because many threads are waiting, the introduction of more threads makes better use of CPU resources, while other threads do not leave them idle while waiting for scheduling.

What if I change the size of the default Fork/join pool for parallel streams? You can modify the size of the common fork/join thread pool through a JVM parameter:

(By default, all Fork/join tasks share the same thread pool, and the number of threads equals the number of cores of the CPU.) The advantage is that when the thread is idle, it can be collected to handle other tasks. )

Alternatively, you can use this little trick to run a parallel stream with a custom Fork/join pool. It overwrites the default common Fork/join pool and allows you to use your own configured thread pool. means a little despicable. We are using a common thread pool in our tests.

5. Single thread performance is 7.25 times times slower than the fastest results

Concurrency can improve performance by up to 7.25 times times, considering that the machine is 8 cores, which means that the approach is 8 times times higher! The bad point should be the cost of consuming threads. Not only that, even the worst-performing parallel version of the test, which is the parallel stream implementation of 4 threads (30.23 seconds), is 5.8 times times faster than a single-threaded version (176.27 seconds).

What if I don't consider IO? such as judging whether a number is a prime

For this test, we will remove the IO part to test how long it takes to determine whether a large integer is a prime number. How big is this number? 19 Bits, 1,530,692,068,127,007,263, in other words, San 20,007,263. All right, let me get some air first. We did not do any optimizations, but we worked directly to the square root of it, and for that we checked all the even numbers, even though the large number was not divisible by 2, just to make the operation take longer. The first play: This is really a prime number. The number of times each implementation operation is the same.

Here is the result of the test:

Single-Threaded execution time: 118,127 MS, approx. 2 min Note, starting at 20000 milliseconds

1.8 Threads and 16 threads are not very different

Unlike the IO test, there is no IO call, so the difference between 8 threads and 16 threads is not very big, with the exception of the Fork/join version. Because of its erratic performance, we have run several sets of tests to ensure that the results are correct, but it turns out that the results are still the same. I hope you can tell me what you think of this in the comments section below.

2. The best results for different implementations are very close

We see that the fastest results for different implementations are the same, about 28 seconds or so. No matter how the method is implemented, the results are similar. But that doesn't mean the same approach is used. Take a look at the point below.

3. The thread handling overhead of parallel streams is better than other implementations

This is very interesting. In this test, we found that 16 threads of the parallel stream won again. More than that, in this test, the performance of parallel streams is best regardless of the number of threads.

4. Single thread version is 4.2 times times slower than the fastest result

In addition, parallel versions have a twice-fold reduction in the benefit of running computationally intensive tasks than tests with IO. Since this is a CPU-intensive test, the result is quite plausible, unlike the previous test, which reduces the CPU's waiting time for IO to gain additional benefits.

Conclusion

Before I suggested that you read the source code, understand when to use parallel streams, and in Java for concurrent programming, do not arbitrarily jump to conclusions. The best way to test is to run a similar test case in a demo environment. The factors that require special attention include the hardware environment you are running (and the hardware environment you are testing), and the number of bus threads for your application. The thread pool that includes the public fork/join and the threads that are included in the code written by other developers in the team. Before you can write your own concurrency logic, it's a good idea to check out the above and have an overall understanding of your application.

Base Library

We are running this test on the C3.2xlarge instance of EC2, which has 8 Vcpus and 15GB of memory. The Vcpus are because of the hyper-Threading technology used here, so there are actually only 4 physical cores, but each core is modeled as two. For the operating system scheduler, we are considered to have 8 cores altogether. To be as fair as possible, each implementation runs 10 times and selects the average run time for the 2nd to the 9th time. That is, it runs 260 times! Processing length is also important. The task we choose will run for more than 20 seconds, so time differences can be easily seen, not influenced by external factors.

At last

The original test results are here, and the code is on GitHub. You are welcome to make changes and tell us about your test results. If you find anything interesting new insights or phenomena that we're not talking about here, please let us know and we'd like to add them to this article.

Performance comparisons for different concurrency implementations in Java

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.