First, JVM performance tuning strategy
Second, performance tuning
1. Java thread Pool (java.util.concurrent.ThreadPoolExecutor)
The thread pool used by most applications on JVM6 is the thread pool that comes with the JDK, and the thread pool of mature Java threads is verbose because the thread pool behaves a bit like we imagined. The Java thread pool has several important configuration parameters:
- Corepoolsize: Number of core threads (number of latest threads)
- Maximumpoolsize: Maximum number of threads, more than this number of tasks will be rejected, users can customize the processing through the Rejectedexecutionhandler interface
- KeepAliveTime: The time the thread remains active
- WorkQueue: Work queue, storing performed tasks
The Java thread pool needs to pass in a queue parameter (WorkQueue) to hold the execution of the task, and the different choices for the queue, the thread pool has a completely different behavior:
- Synchronousqueue: A non-capacity wait queue, the insert operation of one thread must wait for the remove operation on another, and the queue thread pool will assign a new thread to each task
- Linkedblockingqueue: Unbounded queue, which uses the queue, the thread pool ignores the Maximumpoolsize parameter and handles all tasks with only corepoolsize threads. Non-processed tasks are queued in Linkedblockingqueue
- Arrayblockingqueue: Bounded queues, with bounded queues and maximumpoolsize, programs will be difficult to tune: larger queue and small maximumpoolsize will lead to low CPU load; Small queue and large pool, The queue is not functioning as it should.
In fact, our requirements are very simple, we hope that the thread pool can be the same as the connection pool, can set the minimum number of threads, the maximum number of threads, when the minimum number < Task < Maximum number, should be allocated new threading; When the Task > maximum number, you should wait for the idle thread to process the task.
However, the thread pool is designed so that tasks should be placed in the queue, and when the queue does not fit, consider using a new thread, and reject the task if the queue is full and the new thread cannot be derived. The design results in "First put, wait for execution", "no More Execution", "No Wait". Therefore, according to the different queue parameters, to increase the throughput can not blindly increase the maximumpoolsize.
Of course, to achieve our goal, we must encapsulate the thread pool, fortunately, Threadpoolexecutor has enough custom interfaces to help us achieve our goals. The way we encapsulate it is to use Synchronousqueue as a parameter to make maximumpoolsize work to prevent threads from being allocated indefinitely, and to increase the throughput of the system by increasing the Maximumpoolsize Customizing a Rejectedexecutionhandler, when the number of threads exceeds maximumpoolsize, is handled at intervals to check whether the thread pool can execute a new task, and if the rejected task can be put back into the thread pool, check the Time depends on the size of the KeepAliveTime.
2. JVM Parameters
In the JVM startup parameters, you can set some parameters related to memory, garbage collection, the default is not to do any settings JVM will work well, but for some well-configured server and specific applications must be carefully tuned to achieve the best performance. By setting we want to achieve some goals:
- GC's time is small enough
- The number of GCS is low enough
- The period of the full GC has been long enough
The first two are currently inconsistent, to the GC time is small must be a smaller heap, to ensure that the number of GC is small enough to ensure a larger heap, we can only take its balance.
(1) for the JVM heap settings, you can generally limit its minimum and maximum value by-XMS-XMX, in order to prevent the garbage collector to shrink the heap between the minimum and maximum resulting in additional time, we usually set the maximum and minimum to the same value;
(2) The younger generation and the old generation will allocate heap memory according to the default scale (1:2) , either by adjusting the ratio between the two newradio to adjust the size between them, or for recycling generations, such as the younger generation, through-xx:newsize-xx: Maxnewsize to set its absolute size. Similarly, in order to prevent the shrinking of the young generation, we usually set the-xx:newsize-xx:maxnewsize to the same size;
(3) How much is it reasonable for young and old generations to set up? There is no doubt that I have no answer to this question, otherwise there will be no tuning. Let's take a look at the effects of the size change.
- The larger young generation will inevitably lead to smaller older generations, and the larger young generation would prolong the cycle of the ordinary GC, but increase the time of each GC; The small old generation leads to more frequent full GC
- Smaller young generations will inevitably lead to older generations, with small young generations leading to frequent GC, but shorter GC times per time, and older generations reducing the frequency of full GC
- How to choose a distribution that should depend on the life cycle of an Application object : If the application has a large number of temporary objects, you should choose a larger young generation, and if there are relatively many persistent objects, the older generation should be enlarged appropriately. However, many applications do not have such obvious characteristics, in the choice should be based on the following two points: (A) in full GC as little as possible, so that the old generation to cache common objects, the JVM's default ratio of 1:2 is also this reason (B) by observing the application for a period of time, see the other peak when the old generation of memory, Under the premise of not affecting the full GC, according to the actual situation to increase the younger generation, such as can be controlled at 1:1 percentage. But the old generation should have at least 1/3 room for growth.
(4) on a well-configured machine (such as multicore, large memory), you can choose the parallel collection algorithm for the old generation : -xx:+useparalleloldgc , the default is serial collection
(5) Thread stack settings: Each thread will open the 1M stack by default, to hold the stack frame, call parameters, local variables, etc., for most applications this default value is too, general 256K is sufficient. Theoretically, in the case of constant memory, reducing the stack per thread can produce more threads, but this is actually limited to the operating system
JVM Performance Tuning