JVM Performance Tuning

Source: Internet
Author: User

Recently, due to Project Memory leakage, large-scale JVM performance tuning has been performed.

,

Now we will record the experience.

I. JVM Memory Model and garbage collection Algorithm

1. According to the Java Virtual Machine specification, JVM divides the memory:

  • New (young generation)
  • Tenured (elder generation)
  • Permanent generation (Perm)

Among them, new and tenured are heap memory. The heap memory will be allocated from the memory specified by the JVM startup parameter (-xmx: 3g). perm Is not heap memory and is directly allocated by virtual machines, however, you can use-XX: permsize-XX: maxpermsize

And other parameters to adjust its size.

  • Young Generation (new): a young generation to store the Java objects just allocated by JVM.
  • Tenured: objects in the young generation that have not been recycled by garbage collection will be copied to the old generation.
  • Permanent generation (Perm): Permanently stores class and method metadata. The size depends on the project size, class, and Method Quantity. Generally, it is enough to set it to MB, the Setting principle is to reserve 30% of the space.

New is divided into several parts:

  • EDEN: Eden is used to store the objects just allocated by JVM.
  • Survivor1
  • Survivro2: The two consumer vor instances have the same size of space. When the objects in the Eden are not recycled after garbage collection, they are copied back and forth between the two consumer vor instances. When a condition is met, such as the number of copies, it will be copied to tenured. Obviously, VOR only increases the object's stay time in the young generation and increases the possibility of garbage collection.

2. Garbage collection Algorithm

Garbage collection algorithms can be divided into three types, all based on the tag-clear (copy) algorithm:

  • Serial Algorithm (single thread)
  • Parallel Algorithms
  • Concurrent Algorithm

JVM selects a suitable recycling Algorithm for each memory Generation Based on the hardware configuration of the machine. For example, if the machine has more than one core, it selects parallel algorithms for the young generation, for details about the selection, refer to the JVM tuning documentation.

A bit of explanation is that parallel algorithms use multiple threads for garbage collection, and the execution of the program is paused during the collection process. concurrent algorithms also use multiple threads for garbage collection, but do not stop application execution during the collection process. Therefore, concurrent algorithms are suitable for highly interactive programs. After observation, the concurrency algorithm reduces the size of the young generation. In fact, it uses a large old generation, which in turn is relatively lower than the parallel algorithm in terms of throughput.

Another question is, when will garbage collection be performed?

  • When the memory of the young generation is full, a common GC is triggered, which only recycles the young generation. In case that the young generation is full, the Eden generation is full, and full vor does not cause GC.
  • Full GC will be triggered when the old generation is full, and full GC will recycle the young and old generations at the same time.
  • Full GC will also be triggered when the permanent generation is full, which will cause the uninstallation of the class and method metadata.

Another problem is when outofmemoryexception will be thrown, not when the memory is exhausted.

  • Jvm98 % time spent in memory collection
  • Memory recycled each time is less than 2%

If the two conditions are met, the outofmemoryexception will be triggered, which leaves a small gap between the system for some previous operations, such as manually printing heap dump.

Ii. Memory leakage and Solutions

1. Some phenomena before system crash:

    • The time for garbage collection increases from 10 ms to 50 ms, and the time for fullgc is extended from 0.5s to 4 or 5 s.
    • Fullgc occurs more and more times, and the most frequent occurrence of fullgc occurs in less than 1 minute.
    • The memory of the old generation is getting larger and larger, and no memory is released in the old generation after each fullgc.

      Then the system will not be able to respond to new requests and gradually reach the critical value of outofmemoryerror.

      2. Generate the heap dump file

      Use JMX mbean to generate the current heap information. The size is a 3 GB hprof file (the size of the entire heap). If JMX is not started, you can generate the file using the jmap command of Java.

      3. Analyze the dump file

      The following is how to open the 3G heap information file. Obviously, the general window system does not have such a large memory, and the high-configuration Linux must be used. Of course, we can use X-window to Import images from Linux to Windows. We want to use the following tools to open the file:

      1. Visual VM
      2. IBM heapanalyzer
      3. JDK hprof Tool

      To ensure the loading speed, we recommend that you set the maximum memory to 6 GB. After use, we found that these tools could not intuitively observe the memory leakage. Although visual VM could observe the object size, it could not see the call stack; although heapanalyzer could see the call stack, but cannot open a 3G file correctly. Therefore, we chose the static memory analysis tool mat dedicated to eclipse.

      4. Analyze memory leakage

      Through the mat, we can clearly see which objects are suspected to be memory leaks, which objects occupy the largest space, and the call relationship between objects. In this case, there are many jbpmcontext instances in threadlocal. After investigation, the context of jbpm is not closed.

      In addition, through mat or JMX, we can also analyze the thread status and observe the object on which the thread is blocked to determine the System Bottleneck.

      5. Regression Problems

      Q: Why does garbage collection take longer and longer before the crash?

      A: According to the memory model and garbage collection algorithm, garbage collection is divided into two parts: Memory tag, clear (copy), and Mark part as long as the memory size remains unchanged for a fixed time, the replication part is changed because each garbage collection has some memory that cannot be recycled, so the replication volume is increased, resulting in a longer time. Therefore, the garbage collection time can also be used as a basis for determining the memory leakage.

      Q: Why does full GC count more and more times?

      A: As a result, the memory accumulation gradually consumes the memory of the old generation, resulting in no more space allocated to new objects, resulting in frequent garbage collection.

      Q: Why is the memory occupied by the old generation growing?

      A: Because the memory of the young generation cannot be recycled, it is increasingly copied to the old generation.

      Iii. Performance Tuning

      In addition to the above Memory leakage, we also found that the CPU is less than 3% for a long time, and the system throughput is not enough, which is a serious waste of resources for 8 core x 16g, 64 bit Linux servers.

      When the CPU load is insufficient, sometimes users may reflect the request for a long time. We realize that the program and JVM must be tuned. Perform the following operations:

      • Thread Pool: solves the problem of long user response time
      • Connection Pool
      • JVM startup parameters: Adjust the memory ratio and garbage collection algorithms of each generation to improve throughput.
      • Program Algorithm: improved program logic algorithm to improve performance

       

      1. Java thread pool (Java. util. Concurrent. threadpoolexecutor)

      The thread pool used by most applications on jvm6 is the built-in thread pool of JDK. The reason why mature Java thread pools are described is that the thread pool behavior is a little different from what we think. The Java thread pool has several important configuration parameters:

      • Corepoolsize: number of core threads (latest number of threads)
      • Maximumpoolsize: Maximum number of threads. Tasks exceeding this number will be rejected. You can customize the processing method through the rejectedexecutionhandler interface.
      • KeepAliveTime: the time when the thread remains active.
      • Workqueue: A work queue that stores executed tasks.

      The Java thread pool needs to pass in a queue parameter (workqueue) to store the executed task. For different queue options, the thread pool has completely different behaviors:

      • SynchronousQueue:

        For a non-capacity waiting queue, the insert operation of one thread must wait for the remove operation of another thread. Using this queue thread pool will allocate a new thread for each task.

      • LinkedBlockingQueue

        This queue is used for unbounded queues, And the thread pool will ignore

        The maximumpoolsize parameter uses only the corepoolsize thread to process all tasks.Queue blockingqueue

      • ArrayBlockingQueue:

        Bounded queues, in bounded queues and

        Under the influence of maximumpoolsize, the program will be difficult to tune: larger queue and smaller maximumpoolsize will lead to low CPU load; small queue and large pool, queue does not start its role.

      In fact, our requirements are very simple. We hope that the thread pool can be the same as the connection pool. We can set the minimum number of threads and the maximum number of threads. When the minimum number <task <maximum number, we should allocate new threads for processing; when the task is greater than the maximum number, wait for Idle threads to process the task.

      However, the thread pool design idea is that the task should be placed in the queue. When the queue cannot be placed, the new thread should be considered for processing. If the queue is full and the new thread cannot be derived, the task will be rejected. The design results in "waiting for execution", "no longer execution", and "no waiting ". Therefore, you cannot increase the maximumpoolsize to increase the Throughput Based on different queue parameters.

      Of course, to achieve our goal, we must encapsulate the thread pool. Fortunately, threadpoolexecutor has enough custom interfaces to help us achieve our goal. We encapsulate the following methods:

      • Using synchronousqueue as a parameter, maximumpoolsize can be used to prevent threads from being allocated without limit. At the same time, you can increase the system throughput by increasing the maximumpoolsize.
      • Customize a rejectedexecutionhandler that can be processed when the number of threads exceeds maximumpoolsize. The processing method is to check whether the thread pool can execute new tasks at intervals, if you can re-place the rejected tasks into the thread pool, the check time depends on the KeepAliveTime size.

      2. Connection Pool (Org. Apache. commons. DBCP. basicdatasource)

      When using Org. apache. commons. DBCP. because the default configuration is used in basicdatasource, when the traffic volume is large, we can see that many Tomcat threads are blocked in the Apache objectpool lock used by basicdatasource through JMX, the direct cause was that the maximum number of connections in the basicdatasource connection pool was too small. The default basicdatasource configuration only uses eight maximum connections.

      I also observed a problem. When the system is not accessed for a long period of time, for example, two days, MySQL on the DB will be disconnected so that the connection cached in the connection pool cannot be used. To solve these problems, we fully studied basicdatasource and found some optimization points:

      • MySQL supports 100 connections by default. Therefore, the configuration of each connection pool depends on the number of machines in the cluster. If there are 2 servers, you can set each connection pool to 60
      • Initialsize: the number of connections that have been enabled
      • Minevictableidletimemillis: this parameter sets the idle time for each connection. If the idle time is exceeded, the connection will be closed.
      • Timebetweenevictionrunsmillis: The running cycle of the background thread, used to detect expired connections
      • Maxactive: Maximum number of allocated connections
      • Maxidle: the maximum number of idle connections. When the number of connections exceeds maxidle, the connection is closed directly. Only connections with initialsize <x <maxidle are periodically checked for expiration. This parameter is mainly used to increase the throughput during peak access.
      • How is initialsize maintained? After research code, it is found that basicdatasource closes all expired connections and then opens the initialsize connections. This feature works with minevictableidletimemillis and timebetweenevictionrunsmillis to ensure that all expired initialsize connections will be reconnected, this avoids the disconnection problem of MySQL when no action is performed for a long time.

      3. JVM Parameters

      In the JVM startup parameters, you can set some parameter settings related to memory and garbage collection. By default, the JVM works well without any settings, however, some well-configured servers and specific applications must be carefully tuned for optimal performance. We hope to achieve some goals through setting:

      • GC time is small enough
      • The number of GC times is small enough.
      • The full GC cycle is long enough.

      The first two are currently against each other. To reduce GC time, you must have a smaller heap. To ensure that the number of GC operations is small enough, you must ensure a larger heap. We can only balance it.

      (1) The JVM heap is generally set. You can use-XMS-xmx to limit the minimum and maximum values. To prevent the garbage collector from shrinking the heap between the minimum and maximum values, additional time is generated, we usually set the maximum and minimum values to the same values.

      (2) The young and old generations will allocate heap memory according to the default ratio (). You can adjust the ratio between the two by adjusting newradio, you can also set the absolute size of the recycle generation, such as the young generation through-XX: newsize-XX: maxnewsize. Similarly, to prevent heap shrinkage of the young generation, we usually set-XX: newsize-XX: maxnewsize to the same size.

      (3) How big is the setting of the young and old generations reasonable? There is no answer to this question. Otherwise, there will be no tuning. Let's take a look at the effect of the two changes in size.

      • A larger young generation will inevitably lead to a smaller old generation. A larger young generation will prolong the General GC cycle, but will increase the time for each GC; A small old generation will lead to more frequent full GC
      • A younger generation will inevitably lead to a larger older generation. a younger generation will lead to frequent normal GC, but the GC time will be shorter each time. A larger older generation will reduce the frequency of full GC.
      • How to choose the distribution of Application Object lifecycle: if the application has a large number of temporary objects, select a larger young generation; if there are a relatively large number of persistent objects, the old generation should increase appropriately. However, many applications do not have such obvious features. When making a decision, we should follow the following two points: (a) In line with the full GC principle, let the old generation cache frequently used objects as much as possible, JVM's default ratio of is also the truth (B) by observing the application for a period of time, you can see the amount of memory occupied by the old generation at the peak, without affecting the full GC, increase the young generation according to the actual situation. For example, you can set the proportion. However, we should reserve at least 1/3 growth space for the old generation.

      (4) On a well-configured machine (such as multi-core and large memory), you can select parallel collection algorithms for older generations:-XX: + useparalleloldgc

      The default value is serial collection.

      (5) thread stack settings: Each thread opens a 1 m stack by default, which is used to store stack frames, call parameters, and local variables. For most applications, this default value is too large, generally, K is enough. Theoretically, when the memory remains unchanged, reducing the stack of each thread can generate more threads, but this is actually limited by the operating system.

      (4) You can use the following parameters to input heap dump information.

      • -XX: heapdumppath
      • -XX: + printgcdetails
      • -XX: + printgctimestamps
      • -Xloggc:/usr/AAA/dump/heap_trace.txt

      The following parameters can be used to control the heap printing information when outofmemoryerror occurs:

      • -XX: + heapdumponoutofmemoryerror

      Take a look at the Java parameter configuration for the next time: (server: Linux 64bit, 8core × 16g)

       Java_opts = "$ java_opts-server-xms3g-xmx3g-xss256k-XX: permsize = 128 M-XX: maxpermsize = 128 M-XX: + useparalleloldgc-XX: + example-XX: heapdumppath =/usr/AAA/Dump-XX: + printgcdetails-XX: + printgctimestamps-xloggc:/usr/AAA/dump/heap_trace.txt-XX: newsize = 1G-XX: maxnewsize = 1G"

      It is observed that the configuration is very stable, and the time for each common GC is about 10 ms. Full GC does not occur, or occurs only once after a long time.

      By analyzing the dump file, we can find that a full GC occurs every hour. After verification by multiple parties, as long as the JMX service is enabled in JVM, JMX executes full GC once an hour to clear references. For more information, see the attachment.

      4. Program Algorithm Optimization: This time is not the focus

      References:

      Http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

      Author: chen77716 posted on 14:48:00 Original article link Read: 926 comment: 0 view comment

      Contact Us

      The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

      If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

      A Free Trial That Lets You Build Big!

      Start building with 50+ products and up to 12 months usage for Elastic Compute Service

      • Sales Support

        1 on 1 presale consultation

      • After-Sales Support

        24/7 Technical Support 6 Free Tickets per Quarter Faster Response

      • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.