Java optimization in multi-core platform

Last Update:2014-08-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Multicore CPUs are now the mainstream. Using multi-core technology, can effectively play the ability of hardware, improve throughput, for Java programs, can implement concurrent garbage collection. However, the use of multi-core technology in Java also brings some problems, mainly because of multi-threaded shared memory. The current bandwidth between memory and CPU is a major bottleneck, and each core can enjoy a portion of the cache, which can improve performance. The JVM uses the operating system's "lightweight process" to implement threads, so that each time a thread is operating with shared memory, it cannot be hit in the cache and is a costly system call. So different from the ordinary optimization, for multicore platforms, some special optimizations are needed.

　　Code optimization

　　Number of threads greater than or equal to the number of cores

If you use multi-threaded, only the number of threads running is larger than the number of cores, it is possible to squeeze dry CPU resources, otherwise there will be some cores idle. It is important to note that if the number of threads is too large, it will consume too much memory and cause performance to fall back. The JVM's garbage collection also requires threads, so the number of threads here contains the JVM's own thread

　　Minimize shared data Write operations

Each thread has its own working memory, and in this area the system can be optimized without scruple, and performance will not degrade if the shared memory area is read. However, once a thread wants to write shared memory (using the volatile keyword), it inserts many memory-barrier operations (Memories Barrier or memory Fence) instructions to ensure that the processor does not execute in a random order. Performance is much lower than writing local thread-owned variables. The approach is to minimize the sharing of data, which also conforms to the design principle of "data coupling".

　　Using the Synchronize keyword

In Java1.5, synchronize is inefficient in performance. Because this is a heavyweight operation, it is necessary to invoke the operating interface, resulting in a possible lock-up that consumes more system time than a lock-out operation. In contrast, using the Lock object provided by Java provides a higher performance. But in the Java1.6, there was a change. The synchronize is semantically clear and can be optimized for many optimizations, such as adaptive spin, lock removal, lock coarsening, lightweight locks, bias locks, and more. resulting in synchronize performance on Java1.6 is no worse than Lock. Officials also say they are more supportive of synchronize, and there is room for optimization in future releases.

　　Using optimistic policies

The traditional synchronous concurrency strategy is pessimistic. The expression semantics is: when the multi-threaded operation an object, always feel that there will be two threads at the same time operation, so need to lock up. The optimistic strategy is to assume that a thread is normally accessed and try again when there is a conflict. This is more efficient. The Atomicinteger of Java is the use of this strategy.

　　Using Thread local variables (ThreadLocal)

You can use ThreadLocal to generate a copy of a thread-local object and not share it with other threads. When the thread terminates, its local variables are all recycled.

　　Sorting Field in a class

You can put together a few field that a class will frequently access, so that they have more possibilities to be added to the cache together. And it's best to put them on the head. Basic variables and reference variables do not stagger emissions.

　　Batch processing of arrays

Now the processor can use a single instruction to process multiple records in an array, such as reading or writing store records to a byte array at the same time. So try to use a bulk interface like System.arraycopy () instead of manipulating the array yourself.

　　JVM Optimizations

　　Enable large Memory pages

Now the default page for an operating system is 4K. If your heap is 4GB, it means that you want to perform the 1024*1024 assignment operation. So it's better to make the page bigger. This quota design of the operating system, the JVM is not a single change. The configuration on Linux is a bit complicated and not detailed.

Uselargepages is turned on by default in Java1.6, Lasrgepageszieinbytes is set to 4M. I see some cases configured to 128MB, in the official performance test is configured to 256MB.

　　Enable compression pointers

Java's 64 performance is slower than 32 because its pointers are scaled from 32 to 64 bits, although the addressing space extends from 4GB to a few terabytes, which results in degraded performance and consumes more memory. So the pointer is compressed. The compressed pointer supports up to 32GB of memory and can achieve the performance of a 32-bit JVM.

The JDK6 update 23 is turned on by default, and the previous version can use-xx:+usecompressedoops to start the configuration.

Performance can be seen in this evaluation, the performance of the promotion is very considerable.

　　Enable NUMA

Numa is an attribute of a CPU. The CPU cores are symmetric under the SMP architecture, but they share a system bus. So the CPU is more, the bus will become a bottleneck. In a NUMA architecture, a number of CPUs are composed of one group, and the groups have a bit-to-point communication and are independent of each other. Starting it can improve performance.

NUMA requires hardware, operating system, and JVM to be enabled at the same time. Linux can be configured with Numactl to enable NUMA,JVM through-xx:+usenuma.

　　Aggressive optimization Features

In Java1.6, aggressive optimization (aggressiveopts) is turned on by default. Aggressive optimization is an optimization option that typically has some next release. But it is possible to cause instability. The previous time baseless assertion the JDK7 Bug, is to turn on this option to measure.

　　Escape analysis

Having an object created within a method, if he passes out, can be called a method escape, and if passed to another thread, becomes a thread escape. If you can know that an object has not escaped, you can allocate it to the stack instead of the heap, saving GC time. This object can also be disassembled, directly using its member variables, to facilitate the use of caching. If an object does not have a thread to escape, it can cancel all synchronous operations and greatly improve performance.

But the escape analysis is very difficult, because spent the CPU to the analysis of an object, if he does not escape, it can not be optimized, the analysis of the previous lost no return. Therefore, it is not possible to use complex algorithms, and now the JVM does not implement stack allocations. As a result, performance may also decrease after opening.

You can use-xx:+doescapeanalysis to turn on escape analysis.

　　High Throughput GC Configuration

For high throughput, Parallel scavenge can be used in young states, and annual old state can use the Parallel old garbage collector.

Open With-XX:+USEPARALLELOLDGC

The-xx:parallelgcthreads can be adjusted according to the number of CPUs. can be 1/2 or 5/8 of the number of CPUs

　　Low Latency GC Configuration

For low-latency applications, parnew can be used in younger states, and the CMS garbage collector can be used for years old state.

Can be opened using-XX:+USECONCMARKSWEEPGC and-XX:+USEPARNEWGC.

The-xx:parallelgcthreads can be adjusted according to the number of CPUs. can be 1/2 or 5/8 of the number of CPUs

You can adjust the-xx:maxtenuringthreshold (promotion of the age of older generations), the default is 15. This will reduce the pressure on older generations of GC.

You can-xx:targetsurvivorratio and adjust the Survivor occupancy ratio. Default 50%. Height increase can provide Survivor area utilization

-xx:survivorratio can be adjusted to adjust the specific gravity of Eden and Survivor. The default is 8. The smaller the proportion, the larger the Survivor, the more time the object can stay in a young state.

Java optimization in multi-core platform

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java optimization in multi-core platform

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java optimization in multi-core platform

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support