Java Performance Tuning

Last Update:2017-06-27 Source: Internet

Author: User

Tags cas

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

JVM tuning (the most critical parameters are:-xms-xmx-xmn-xx:survivorratio-xx:maxtenuringthreshold)

Generation Size tuning:

Avoid the Cenozoic size setting too small, avoid the Cenozoic size settings too large, avoid survivor set too small or too large, reasonable set up the new generation survival cycle.

-xmn Adjust the Cenozoic size, the larger the new generation usually means that more objects will be recycled in the minor GC phase, but it may be possible to cause the old generation size, resulting in frequent triggering of full GC, even outofmemoryerror.

-xx:survivorratio adjusts the size of the Eden and survivor areas, the larger the Eden area usually means that the lower the frequency of the minor GC occurs, but may cause the survivor area to be too small, causing the object to go directly to the old generation after the GC is minor. The full GC is triggered more frequently.

GC Policy tuning: The CMS GC most actions are performed concurrently with the application, and it does reduce the time it takes the GC action to suspend the application. For Web applications that require a GC that has a short pause time for the application, and the bottleneck of the Web application is not on the CPU, the CMS GC is a good choice when the G1 is not mature enough.

(If the system is not CPU-intensive and most of the objects from the new generation into the old generation are recyclable, the CMS GC can be used to better reclaim the objects before the old generation is full, and to reduce the likelihood of the complete GC.)

After adjusting the parameters of memory management, you should pass-xx:printgcdetails,-xx:+printgctimestamps,-xx:+ Printgcapplicationstoppedtime and Jstat or VISUALVM to observe the adjusted GC status.

Tuning parameters beyond memory management:-xx:compilethreshold,-xx:+usefastaccessormethods,-xx:+usebaiasedlocking.

Program Tuning

Serious CPU Consumption Solution

Workaround for CPU US High:

The main reason for CPU US high is that the execution thread does not need any pending actions and is executed all the time, causing the CPU to have no chance to schedule other threads to execute.

Tuning scheme: Increase Thread.Sleep to release CPU execution and reduce CPU consumption. At the cost of a single performance loss, but because it reduces CPU consumption, for multi-threaded applications, it improves the overall average performance.

(Similar scenarios in actual Java applications, where the best way to do this is to use the wait/notify mechanism instead)

For other similar cycles too many times, regular, calculation and so on caused by CPU us too high condition, you need to combine business tuning.

For GC frequent, you need to reduce the number of GC executions by tuning the JVM or by tuning the program.

Solution for High CPU sy:

The main reason for high CPU Sy is that the running state of the thread should be switched frequently, and in this case, a common optimization method is to reduce the number of threads.

Tuning scenario: Reduce the number of threads

This tuning may cause CPU us to be too high, so it is critical to set the number of threads reasonably.

For Java distributed applications, there is a typical phenomenon is that there are more network IO operations in the application and do need some lock competition mechanism (such as database connection pool), but in order to support the concurrency, we can use the coroutine to support the higher concurrency, avoid the increase of concurrency caused by CPU The SY consumes heavily, the system load increases rapidly, and the performance decreases.

The framework for implementing the Kilim,kilim in Java has the task of creating tasks, using the pause mechanism of the task, instead of Thread,kilim assuming the thread scheduling and the up and down switching actions, the task is much lighter than the native thread. And can better utilize the CPU. Kilim brings up the increase in thread utilization, but also consumes more memory in the case of Kilim, because the task context information is saved in the JVM heap. (There is also an implementation in JDK 7 that supports the process, and a JVM-based Scala actor can also be used in Java)

Critical solution for file IO consumption

From the point of view of the program, the main reason that the file IO consumption is serious is that multiple threads are writing large amounts of data to the same file, causing the file to quickly become large, so that the writing speed is getting slower and the threads scramble for file locks.

Common Tuning methods:

Write Files asynchronously

Bulk Read/write

Current limit

Limit file Size

A serious memory drain workaround

Frees unnecessary references: The code holds unwanted object references, causing the objects to fail to be GC, which occupies the JVM heap memory. (Use threadlocal: Note that when the inline action is complete, you need to perform threadlocal.set to clear the object, avoiding the need to hold unnecessary object references)

Using object cache pooling: Creating objects consumes a certain amount of CPU and memory, and using the object cache pool can reduce the use of JVM heap memory to some extent.

Using a reasonable cache invalidation algorithm: If you put too many objects in the cache pool, it will cause serious memory consumption, and because the cache pool has been a reference to these objects, resulting in an increase in the full GC, for this situation to reasonably control the size of the cache pool, to avoid the cache pool objects unlimited increase. (Classic cache invalidation algorithm to clear objects in the cache pool: FIFO, LRU, LFU, etc.)

objects that reasonably use SoftReference and weekreference:softreference are recycled when memory is not available, and Weekreference objects are recycled at full GC.

A workaround for situations where resource consumption is low but program execution is slow

Reduce lock competition: multi-line more, lock competition situation will be more obvious, this time the thread is very easy to wait for lock condition, resulting in performance degradation and CPU sy rise.

Use the classes in the bundle: most of the lock-free and nonblocking algorithms are used.

Using the Treiber algorithm: CAs and atomicreference based.

Using Michael-scott non-blocking queue algorithm: Based on CAs and atomicreference, typical concurrentlindkedqueue.

(It is a good choice to implement non-blocking based on CAs and atomicreference, but it is worth noting that the lock-free algorithm needs to be continuously compared to ensure the consistency of resources, for more conflict scenarios, will bring higher CPU consumption, Therefore, it is not necessary to use CAS to achieve non-blocking is better than the use of lock mode performance. There are some improvements to nonblocking algorithms: MCAS, WSTM, etc.)

Use as few locks as possible: lock only on resources that need to be controlled (there is usually no need to lock the entire method, minimize the lock as much as possible, lock only the mutex and atomic operations, and locking as much as possible in the minimized granularity of the protection resources-such as locking only the resources that need to be protected, not this).

Split Lock: Exclusive lock split into multiple locks (read-write lock split, similar to concurrenthashmap default split into 16 locks), to a large extent can improve the performance of read and write, but need to be aware that after the split lock, Operations that are global in nature can become more complex (such as the size operation in Concurrenthashmap). (Too many split locks can also cause side effects, such as a significant increase in CPU consumption)

Removal of the mutual exclusion of read and write operations: lock on modification, and copy the object for modification, after modification, switch the reference of the object, thus reading without locking. This is a typical implementation, the benefit of which is the ability to significantly improve read performance, suitable for read and write less scenes, but because the write operation every time to replicate a copy of the object, it consumes more memory.

This article is from the "Linux" blog, so be sure to keep this source http://syklinux.blog.51cto.com/9631548/1942316

Java Performance Tuning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More