"Go" Java Programming for GC

Last Update:2014-09-25 Source: Internet

Author: User

Tags cas google guava

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java programmers typically do not need to consider memory issues during coding, and the JVM's highly optimized GC mechanism is most likely to handle heap cleanup issues well. So many Java programmers think, I just need to care about when to create objects, and to recycle objects, to the GC to do it! It has even been said that if memory problems are frequently considered during programming, it is a degradation that should be given to the compiler and given to the virtual machine for resolution.

In fact, there is not much problem, indeed, most of the scenes concerned about memory, GC problems, appear a little "unfounded", Gao said: Premature optimization is the root of all evils.

But on the other hand, what is "premature optimization"?

If we could do things right to the first time, why not?

In fact, the JVM's memory model (JMM) is supposed to be the basic knowledge of Java programmers, after processing a few times the JVM on-line memory problems will be clearly felt, a lot of system problems, are memory problems.

The memory structure of the JVM interested students can look at the analysis of the Java Virtual machine structure and mechanism of this article, this article will not repeat, this article is not concerned about the specific GC algorithm, related articles voluminous, ready to check.

Also, do not expect GC-optimized techniques that can multiply application performance, especially for I/O intensive applications or optimizations that actually fall on YOUNGGC, and may only help you reduce the frequency of YOUNGGC.

But I think that the value of good programmers is not in the skills of the Dragon Slayer, but in the details , as said before, if we can do things right at once, and do well, to the extent possible to pursue excellence in the scope of the allowed, why not do it?

First, the basic hypothesis of GC generational

Most GC algorithms use heap memory for generational (Generation) processing, but why divide, and why not memory partitioning, fragmentation, and the use of time-and age-oriented "generations" to represent different areas of memory?

The basic assumptions of GC generational are:

Most objects have a very short life cycle and short survival times.

And these short-lived objects are exactly what GC algorithms need to focus on first. So in most GC, YOUNGGC (also known as MINORGC) accounts for the vast majority of applications that are not heavily loaded and may run for months without FULLGC.

Based on this premise, in the coding process, we should shorten the life cycle of the object as much as possible . In the past, allocating objects was a relatively heavy operation, so some programmers would minimize the number of new objects, try to reduce the allocation overhead of the heap, and reduce memory fragmentation.

However, the creation of short-lived objects is better than we thought in the JVM, so don't be stingy with the new keyword and go boldly to new.

Of course the premise is not to do meaningless creation, the higher the rate of object creation, then the faster the GC will be triggered.

Conclusion:

Allocating small objects for cost sharing is small, do not skimp to create.
The GC prefers this small, short-lived object.
Make the object's life cycle as short as possible, such as in the method body, so that it can be reclaimed as soon as possible in the YOUNGGC, and not be promoted (romote) to the older generation (old Generation).

Second, the optimization of the allocation of objects

Based on the majority of objects are small and short-lived, and there is no multi-threaded data competition. The allocation of these small objects will take precedence over the allocation of the thread-private Tlab , the objects created in Tlab, the absence of locks, or even the cost of CAs.

The space occupied by Tlab is in Eden Generation.

When the object is larger, the Tlab space is not enough to put down, and the JVM thinks that the current thread is taking up tlab remaining space is sufficient, it will be allocated directly on the Eden generation, there is concurrent competition, so there will be the cost of CAs, but it is OK.

When the object is large enough to Eden generation, the JVM can only attempt to go to the old generation assignment, which needs to be avoided as soon as it is allocated in the older generation, the object can only be Generation GC or FULLGC is recycled.

Iii. Benefits of Immutable objects

The GC algorithm usually needs to scan the surviving objects from the root node, scanning all surviving object references, and building an object graph.

The optimization of the GC by immutable objects is mainly embodied in the old generation.

As you can imagine, if an old generation object refers to the object of young generation, this must be taken into account in each YOUNGGC process.

In order to improve the performance of YOUNGGC, the Hotspot JVM avoids scanning object references in old generation each time YOUNGGC, using the card table method.

Simply put, when an object in an old generation creates a new reference relationship to an object in young generation or releases a reference, it is marked dirty (dirty) on the token in the card table, and YOUNGGC, You just need to scan these dirty items.

A mutable object's reference relationship to other objects may change frequently, and it is possible to hold more and more references, especially containers, during the run. These will cause the corresponding card table entries to be frequently marked as dirty.

The reference relationships of immutable objects are very stable, and they are not swept to their corresponding items when the card table is scanned.

Note that immutable objects here do not refer only to the immutable final object itself, but to the real immutable Objects.

Iv. references to null legends

Many of the earlier Java data mentioned that placing a variable in the method body as null optimizes the performance of the GC, similar to the following code:

list<string> list = new arraylist<string> ();//some codelist = null;

In fact, this approach does little to help the GC, and sometimes it can lead to code clutter.

I remember a few years ago @rednaxelafx discussed this issue in detail in the HLL VM Group, which I did not find, the conclusion is basically:

In a very large method body, the reference to a larger object, which is set to NULL, can help the GC to some extent.
In most cases, this behavior is of no benefit.

So, let's just give up this "optimization" method earlier.

The GC is smarter than we thought.

Five, the manual file GC

The following two artifice are available on many Java materials:

Give the CPU resources to other threads by Thread.yield () .
The GC is triggered by System.GC () .

In fact, the JVM never guarantees these two things, and System.GC () will trigger the FULLGCif explicit GC is allowed in the JVM startup parameter, which is almost equivalent to committing suicide in response to sensitive applications.

So, let's keep in mind two points:

Never use Thread.yield ().
Never use System.GC (). Unless you really need to recycle native Memory.

2nd there is an exception to the native memory if you are in the following scenario:

Use of NiO or NIO framework (Mina/netty)
Allocating byte buffers using the Directbytebuffer
Using Mappedbytebuffer for memory mapping

Since Native memory can only be recycled through FULLGC (or CMS GC) , it is not easy to call System.GC () and cherish it unless you know it is really necessary.

In addition, to prevent SYSTEM.GC calls in some frameworks (such as the NIO Framework, Java RMI), it is recommended that you add-XX:+DISABLEEXPLICITGC to the startup parameters to disable the explicit GC.

This parameter has a huge pit, if you disable the System.GC (), then the above 3 scenarios of memory can not be recycled, may cause oom, if you use a CMS GC, then you can use this parameter instead:-xx:+ Explicitgcinvokesconcurrent.

For System.GC (), you can refer to a few articles @bluedavy:

CMS GC will not reclaim direct Bytebuffer's memory
Tell me about the mistake I made on Java startup parameters.
Java.lang.OutOfMemoryError:Map failed

Vi. specifying container initialization size

One of the features of the Java container is the ability to dynamically expand, so usually we do not think about the initial size of the setting, not enough will be automatically expanded chant anyway.

But expansion does not mean there is no cost, even a high price.

For example, some array-based data structures, such as StringBuilder, StringBuffer, ArrayList, HashMap, and so on, need to be arraycopy when expanding, and for a growing structure, after several expansions, There will be a lot of useless old arrays, and the pressure to reclaim these arrays will all be added to the GC.

The constructors of these containers usually have a parameter that can specify size, which is recommended if the container can be estimated for some size.

However, because the expansion of the container is not to wait until the container is full, but there is a certain proportion, such as the HashMap expansion threshold and load factor (loadfactor) correlation.

The Google Guava Framework provides a very handy tool for the initial capacity of a container, such as:

Lists.newarraylistwithcapacity (initialarraysize); Lists.newarraylistwithexpectedsize (estimatedsize); Sets.newhashsetwithexpectedsize (expectedsize); Maps.newhashmapwithexpectedsize (expectedsize);

So we just pass in the estimated size, the capacity of the calculation will be given to guava to do it.

Example : If you use the default parameterless constructor to create a ArrayList, increasing the element until oom, this process will result in:

Multiple array expansion, re-allocating larger space arrays
Multiple array copies
Memory fragmentation

Vii. Object Pool

In order to reduce object allocation overhead and improve performance, it may be possible for someone to cache the collection of objects as a means of reuse by using the object pooling method.

However, objects in the object pool will be promoted to old Generation because they survive for long periods of time, and therefore cannot be reclaimed by YOUNGGC.

And usually ... There's no effect.

For the object itself:

If the object is small, the cost of allocation is inherently small, and the object pool only increases the complexity of the code.
If the object is larger, the pressure on the GC will be greater after the promotion to old generation.

From a thread-safe point of view, usually the pool will be accessed concurrently, then you need to deal with the problem of synchronization, this is a big pit, and the cost of synchronization is not necessarily more than you re-create a small object .

The only appropriate scenario for object pooling is that cache reuse makes sense when each object in the pool is expensive to create , such as creating a connection every time new or relying on RPC.

For example:

Thread pool
Database Connection Pool
TCP Connection Pool

Even if you really need to implement an object pool, use a mature open source framework, such as Apache Commons pool.

Also, use the JDK's threadpoolexecutor as the thread pool, and don't reinvent the wheel unless you read the AQS source and think you can write better than Doug Lea.

Viii. Object Scope

Minimize the scope of the object, that is, the life cycle.

If you can declare a local variable within a method, do not declare it as an instance variable.
Unless your object is singleton or invariant, declare the static variable as little as possible.

Nine, various types of references

Java.lang.ref.Reference has several subclasses that are used to process and GC-related references. There are several simple reference types for the JVM:

Strong Reference, the most common reference
Weak Reference, which is recycled by GC when there is no strong reference to it
Soft Reference can only be recycled by GC when approaching Oom.
Phantom Reference, primarily used to identify the timing of an object being GC, is typically used to do some cleanup work

When you need to implement a cache, consider prioritizing the use of weakhashmap instead of HashMap, and, of course, a better choice is to use the framework, such as guava Cache.

Finally, again, these may not be able to improve the performance of the code, but familiarity with these methods is intended to help us write better code, and to work with the GC.

　　Original link:http://www.kuqin.com/shuoit/20140507/339716.html

"Go" Java Programming for GC

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More