Jvm study note 2 (recommended to reduce GC overhead), jvmgc

Source: Internet
Author: User

Jvm study note 2 (recommended to reduce GC overhead), jvmgc

 

I:Condition for triggering the master GC (Garbage Collector)

The frequency of jvm gc is very high, but this GC takes a very short time, so it has little impact on the system. It is worth noting that the trigger condition of the main GC is obvious to the system. In general, there are two conditions that will trigger the main GC:

1) when the application is idle, that is, when no application thread is running, GC will be called. GC is performed in the thread with the lowest priority, so when the application is busy, the GC thread will not be called, except for the following conditions.

2) When the Java heap memory is insufficient, GC will be called. When the application thread is running and a new object is created during running, if the memory space is insufficient, JVM will forcibly call the GC thread to recycle the memory for new allocation. If the GC does not meet the memory allocation requirements after one time, the JVM will perform two further GC attempts. If the GC fails to meet the requirements, the JVM reports an error of "out of memory" and the Java application stops.

The JVM determines whether to perform the primary GC based on the system environment, and the system environment is constantly changing. Therefore, the operation of the primary GC is uncertain and cannot be predicted when it will inevitably emerge, however, it can be determined that the main GC is repeated for a long-running application.

 

II:Five programming skills to reduce GC overhead

According to the above GC mechanism, program running will directly affect the changes in the system environment, thus affecting GC triggering. If the GC features are not designed and encoded, a series of negative effects such as memory resident will occur. To avoid these impacts, the basic principle is to minimize garbage and reduce GC overhead. Specific measures include the following:

1. Avoid implicit String strings

String is an integral part of every data structure we manage. They cannot be modified after they are allocated. For example, the "+" operation will allocate a new string linking two strings. Worse, an implicit StringBuilder object is allocated here to link two String strings.

For example:

 a = a + b; // a and b are Strings
The compiler generates such a piece of code:
StringBuilder temp = new StringBuilder (). temp. append (B); a = temp. toString (); // a New String object is allocated // The first object "a" can be said to be junk now

It gets worse.

Let's take a look at this example:

String result = foo() + arg;result += boo();System.out.println(“result = “ + result);

In this example, three StringBuilders objects are allocated, each of which is generated by a "+" operation, and two additional String objects. One holds the result of the second allocation, the other is the String parameter passed into the print method. In a seemingly simple statement, there are five additional objects.

Imagine what will happen in actual code scenarios, for example, the process of generating a Web page through xml or text information in the file. In the nested loop structure, you will find that hundreds of objects are implicitly allocated. Although the VM has a mechanism to handle this garbage, there is still a high price-the cost may be borne by your users.

Solution:

One way to reduce junk objects is to be good at using StringBuilder to create objects. The following example implements the same functions as above, but only generates a StringBuilder object, and a String object that stores the final result.

StringBuilder value = new StringBuilder(“result = “);value.append(foo()).append(arg).append(boo());System.out.println(value);
By paying attention to the possibility that String and StringBuilder are implicitly allocated, you can reduce the number of short-term objects to be allocated, especially when there is a lot of code.
2. Plan the List capacity

A dynamic set like ArrayList is used to store the basic structure of variable-length data. ArrayList and some other sets (such as HashMap and TreeMap) are implemented by using the Object [] array at the underlying layer. The size of the String array (which are themselves packaged in the char [] array) remains unchanged. Then the problem arises. If their sizes remain unchanged, how can we put item records in the set? The answer is obvious: dynamically allocate arrays.

See the following example:

List<Item> items = new ArrayList<Item>();for (int i = 0; i < len; i++){Item item = readNextItem();items.add(item);}

 The len value determines the final items size at the end of the loop. However, at first, the ArrayList constructor did not know the size of this value. The constructor will allocate the size of a default Object array. Once the internal array overflows, it will be replaced by a new and large enough array, which makes the previously allocated number into garbage.

If you execute thousands of cycles, the new array allocation operation will be performed more times, and the old array collection operation will be performed more times. For code running in a large-scale environment, these allocation and release operations should be removed from the CPU cycle as much as possible.

Solution:

Whenever possible, allocate an initial capacity to the List or Map, like this:

List<MyObject> items = new ArrayList<MyObject>(len);

Because List initialization has sufficient capacity, this can reduce unnecessary allocation and release of internal arrays during runtime. If you do not know the size, it is best to estimate the average value and add some buffering to prevent accidental overflow.
3. Use efficient original collection

The current Java compiler version supports arrays and Map types (basic key/value ), these are the raw value values of the-"boxing" packaging standard objects that can be allocated and recycled by GC.

This has some negative effects. Java can implement most collections by using internal arrays. Each key/value record added to a HashMap is assigned an internal object that stores the key and value. When processing maps, it is simply evil. This means that every time you put a record into the map, there will be an additional allocation and release operation. This may lead to a large number, and a new internal array has to be reassigned. When processing a Map with hundreds or even more records, these internally assigned operations will increase GC costs.

A common case is the ing between an original value (such as id) and an object. Because Java's HashMap is created to hold the object type (vs original), this means that, each map insert operation may allocate an additional object to store the original value ("boxing" it ).

Integer. the valueOf method is cached between-128-127. However, for each value outside the range, a new object will be allocated in addition to the internal key/value record object. This is probably more than three times the GC overhead for map. For a C ++ developer, this is really disturbing news. In C ++, STL templates can solve such problems very efficiently.

Fortunately, this problem will be solved in the next Java version. By that time, this will be quickly processed by some libraries that provide basic Tree structures, maps, and lists. I strongly recommend Trove. I have been using it for a long time, and it can really reduce GC overhead when processing large amounts of code.

4. Use streaming instead of memory buffer (in-memory buffers)

In server applications, most of the data we operate on is presented to us in the form of a file or network data stream from another web server or DB. In most cases, incoming data is serialized and needs to be deserialized into Java objects before we use them. This process is very prone to a large amount of implicit distribution.

The simplest way is to read data into the memory through ByteArrayInputStream and ByteBuffer, and then perform deserialization.

This is a bad move, because when constructing a new object with complete data, you need to allocate space for it and then immediately release the space. In addition, because you do not know the data size, you can only guess-when the initial capacity is exceeded, You have to allocate and release byte [] arrays to store data.

The solution is very simple. Such as Java local serialization and Google's Protocol Buffers. In this way, most persistent libraries are created to deserialize data from files or network streams and do not need to be stored in the memory, you do not need to allocate a new byte array to accommodate the increasing data. If you can, you can compare this method with the method for loading data to the memory. I believe GC will thank you very much.

5. List Set

Immutability is good, but in large-scale situations, it has serious defects. When a List object is passed in to the method.

When a method returns a set, it is wise to create a set object (such as ArrayList) in the method, fill it in, and return it in the form of a set that is not changed.

In some cases, this will not be very effective. The most obvious thing is that when a set from multiple methods calls a final set. Due to immutability, a large number of temporary sets are allocated in the case of large data volumes.

In this case, the solution will not return a new set, but will pass in to those methods instead of the combination set using a separate set as a parameter.

Example 1 (low efficiency ):  

List <Item> items = new ArrayList <Item> (); for (FileData fileData: fileDatas) {// each call creates a temporary List of items that stores internal temporary arrays. addAll (readFileItem (fileData ));}

 Example 2:

List <Item> items = new ArrayList <Item> (fileDatas. size () * avgFileDataSize * 1.5); for (FileData fileData: fileDatas) {readFileItem (fileData, items); // Add records internally}

 In Example 2, when the immutability rule is violated (should this be adhered to normally), N list distributions (and any temporary array allocation) can be saved ). This will be a big discount for your GC.

3. Measures to Reduce GC overhead

According to the above GC mechanism, program running will directly affect the changes in the system environment, thus affecting GC triggering. If the GC features are not designed and encoded, a series of negative effects such as memory resident will occur. To avoid these impacts, the basic principle is to minimize garbage and reduce GC overhead. Specific measures include the following:

  (1) do not explicitly call System. gc ()

The JVM is recommended to perform primary GC for this function. Although it is only recommended, it will trigger the primary GC in many cases to increase the frequency of primary GC, this increases the number of intermittent pauses.

  (2) minimize the use of temporary objects

After a function call is called, the temporary object will become garbage. Using less temporary variables is equivalent to reducing the generation of garbage, thus increasing the time for the second trigger condition, reduces the chance of main GC.

  (3) it is best to explicitly set the object to Null when it is not used.

Generally, Null objects are treated as garbage. Therefore, explicitly setting unused objects to Null is helpful for GC collectors to identify garbage and improve GC efficiency.

  (4) Try to use StringBuffer instead of String to accumulate strings.

Because String is a fixed-length String object, when adding a String object, it is not expanded in a String object, but a new String object is created, for example, if Str5 = Str1 + Str2 + Str3 + Str4, multiple spam objects are generated during the execution of this statement, because a new String object must be created during the "+" operation, however, these transition objects have no practical significance for the system and only increase more garbage. To avoid this situation, you can use StringBuffer to accumulate strings. Because StringBuffer is variable-length, it is expanded based on the original, without generating intermediate objects.

  (5) You can use basic types such as Int and Long to remove Integer and Long objects.

The memory resources occupied by the basic type variables are much less than those occupied by the corresponding objects. If not necessary, it is best to use the basic variables.

  (6) Use as few static object variables as possible

Static variables are global variables and will not be recycled by GC. They will continue to occupy the memory.

  (7) time when the dispersed object was created or deleted

When a large number of new objects are created in a short period of time, especially large objects, a large amount of memory is suddenly required. In this case, the JVM can only perform primary GC, to recycle memory or integrate memory fragments to increase the frequency of the primary GC. The same applies to deleting objects in a centralized manner. It causes a large number of spam objects to suddenly appear, and the free space is inevitably reduced, which greatly increases the chance to force the main GC when the next object is created.

Iv. Features of garbage collection

(1) unpredictability of garbage collection: Because different garbage collection algorithms are implemented and different collection mechanisms are adopted, it may occur on a regular basis, it may occur when the system idle CPU resources occur, or it may be the same as the original garbage collection, when the memory consumption limit occurs, this is related to the selection and specific settings of the garbage collector.
(2) Accuracy of garbage collection: mainly includes two aspects: (a) the garbage collector can accurately mark the living objects; (B) the garbage collector can accurately locate the reference relationship between objects. The former is the premise to completely recycle all discarded objects, otherwise it may cause memory leakage. The latter is a necessary condition for implementing algorithms such as merging and copying. All reachable objects can be reliably recycled, and all objects can be re-allocated. This allows object replication and Object Memory reduction, effectively preventing memory fragmentation.
(3) There are many different types of garbage collectors, each of which has its own algorithms and their performance is different. They both stop the application running when the garbage collection starts, in addition, when the garbage collection starts, the application thread is also allowed to run, and at the same time, the garbage collection is run in multiple threads.
(4) The implementation of garbage collection is closely related to the specific JVM and JVM memory models. Different JVMs may adopt different garbage collection methods, and the JVM memory model determines which types of garbage collection can be used by the JVM. Now, the memory systems in the HotSpot series JVM use advanced object-oriented framework design, which enables the series JVM to adopt the most advanced garbage collection.
(5) with the development of technology, modern garbage collection technology provides many optional garbage collectors, and different parameters can be set when configuring each collector, this makes it possible to obtain the optimal application performance based on different application environments.
For the above features, we should pay attention:
(1) do not try to assume the time when the garbage collection occurred. This is all unknown. For example, a temporary object in a method becomes useless after the method is called, and its memory can be released.
(2) Java provides some classes dealing with garbage collection, and provides a method to forcibly execute garbage collection-call System. gc (), but this is also an uncertain method. Java does not guarantee that every time this method is called, it will be able to start garbage collection, but it will send such an application to JVM, whether or not to really execute garbage collection, everything is unknown.
(3) Select a suitable garbage collector. In general, if the system does not have special and demanding performance requirements, you can use the default JVM option. Otherwise, you can consider using targeted garbage collectors. For example, incremental collectors are suitable for systems with high real-time requirements. The system has high configuration and many idle resources. You can consider using parallel mark/clear collectors.
(4) The key problem is memory leakage. Good Programming habits and rigorous programming attitude are always the most important. Do not make yourself a small error and cause a large memory vulnerability.
(5) Release reference of useless objects as soon as possible. When using temporary variables, most programmers automatically set the reference variable to null after exiting the scope. This implies that the Garbage Collector collects the object, you must also check whether the referenced object is monitored. If so, remove the listener and assign a null value.

 

Refer:

Http://blog.csdn.net/zsuguangh/article/details/6429592

Http://blog.csdn.net/tayanxunhua/article/details/21752781

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.