Summary
Although the Java Virtual Machine (JVM) and its garbage collector (garbage COLLECTOR,GC) are responsible for managing most of the memory tasks, a memory leak may still occur in the Java software program. In fact, this is a common problem in large projects. The first step in avoiding a memory leak is to figure out how it happened. This article describes some of the common memory leak traps for writing Java code, as well as some best practices for writing code that does not leak. Once a memory leak occurs, it is very difficult to point out the code that caused the leak. This article also introduces a new tool to diagnose leaks and to identify root causes. The cost of the tool is so small that it can be used to look for a memory leak in a system that is in production. the role of the garbage collector
While the garbage collector handles most memory management issues, making life easier for programmers, programmers can still make mistakes that lead to memory problems. Simply put, the GC loops through all references from the root object (the Stack object, static object, object that the JNI handle points to, and so on) and marks all objects that it can reach as active. The program can manipulate only these objects, and other objects are deleted. Because a GC makes it impossible for a program to reach an object that has been deleted, it is safe to do so.
While memory management can be said to be automated, this does not prevent programmers from thinking about memory management issues. For example, allocating (and freeing) memory always has overhead, although this overhead is not visible to programmers. Programs that create too many objects will be slower (and other conditions are the same) than those that create fewer objects than the same functionality.
Moreover, more closely related to this article is that if you forget to "free" previously allocated memory, you may cause a memory leak. If your program retains references to objects that will never be used, these objects will consume and run out of memory because the automated garbage collector cannot prove that these objects will no longer be used. As we said earlier, if there is a reference to an object, the object is defined as active and cannot be deleted. To ensure that the memory used by the object is recycled, the programmer must ensure that the object does not arrive. This is usually done by setting the object field to null or removing the object from the collection (collection). However, note that it is not necessary to explicitly set the local variable to NULL when it is no longer in use. References to these variables are automatically cleared as the method exits.
In a nutshell, this is the main reason for memory leaks in a memory managed Language: Object references that are preserved but never used again. Typical leaks
Now that we know that there is a real possibility of memory leaks in Java, let's look at some typical memory leaks and their causes. 1, Global collection
It is common to have a global repository of data in large applications, such as a jndi tree or a session table. In these cases, care must be taken to manage the size of the repository. There must be some mechanism to remove data that is no longer needed from the repository.
This can be done in a number of ways, but the most common one is some sort of cleanup task that runs periodically. The task verifies the data in the repository and removes any data that is no longer needed.
Another way to manage a repository is to use a reverse link (referrer) count. The collection is then responsible for counting the number of backlinks for each entry in the collection. This requires that the reverse link tell the collection when it will exit the portal. When the number of reverse links is zero, the element can be removed from the collection. 2, caching
Caching is a data structure that is used to quickly find the results of an action that has been performed. Therefore, if an operation is slow to execute, for commonly used input data, the result of the operation can be cached and the cached data will be used the next time the operation is called.
Caching is typically implemented dynamically, with new results added to the cache at execution time. The typical algorithms are:
Checks whether the result is in the cache and, if so, returns the result.
If the result is not in the cache, the calculation is done.
Adds the computed results to the cache so that subsequent calls to the operation can be used.
The algorithm's problem (or latent memory leak) is out in the final step. If there are quite a few different inputs when the operation is invoked, a considerable number of results are stored in the cache. Obviously it's not the right approach.
To prevent this potentially damaging design, the program must ensure that there is an upper limit on the amount of memory used for caching. Therefore, a better algorithm is:
Checks whether the result is in the cache and, if so, returns the result.
If the result is not in the cache, the calculation is done.
If the cache occupies too much space, remove the longest cache result.
Adds the computed results to the cache so that subsequent calls to the operation can be used.
By always removing the longest cache result, we actually make the assumption that in the future, the most recently entered data is more likely to be used than the longest cached data. This is usually a good assumption.
The new algorithm will ensure that the cached capacity is within the predefined memory range. The exact range can be difficult to calculate because the objects in the cache are changing and their references are all-encompassing. Setting the correct size for the cache is a very complex task, and you need to balance the amount of memory used and the speed with which the data is retrieved.
Another way to solve this problem is to use the Java.lang.ref.SoftReference class to track objects in the cache. This approach ensures that these references can be removed if the virtual machine is running out of memory and needs more heap. 3, ClassLoader
The use of the Java ClassLoader architecture provides a lot of opportunity for memory leaks. It is the complexity of the structure itself that causes ClassLoader to have so many problems with memory leaks. The special thing about ClassLoader is that it involves not only "general" object references, but also meta object references, such as fields, methods, and classes. This means that whenever there is a reference to a field, method, class, or ClassLoader object, ClassLoader resides in the JVM. Because the ClassLoader itself can associate many classes and their static fields, there is a lot of memory being leaked. determine the location of the leak
The first sign of a memory leak is that there is a outofmemoryerror in the application. This usually happens in the production environment that you least want it to happen, and debugging is almost impossible at this point. It is possible that the test environment runs the application in a way that is not exactly the same as the production system, causing the leak to occur only in production. In this case, you need to use some of the less expensive tools to monitor and find memory leaks. You also need to be able to connect these tools to a running system without restarting the system or modifying the code. Perhaps most importantly, when profiling, you need to be able to disconnect the tool and keep the system undisturbed.
While OutOfMemoryError is usually a signal of a memory leak, it is possible that the application is actually using so much memory, or that the latter must either increase the number of heaps available to the JVM or make some changes to the application so that it uses less memory. However, in many cases, outofmemoryerror is a signal of memory leaks. One way to find out is to continuously monitor the activities of the GC to determine whether memory usage has increased over time. If this is the case, a memory leak may occur. 1, detailed output
There are many ways to monitor garbage collector activity. One of the most widely used possibilities is to start the JVM with the-XVERBOSE:GC option and observe the output.
[Memory] 10.109-10.235:GC 65536k->16788k (65536K), 126.000 ms
The value that follows the arrow (16788K in this case) is the capacity of the heap used by garbage collection. 2. Control Station
It would be tedious to view the output of the continuous GC's detailed statistical information. Fortunately, there are tools in this area. JRockit Management Console can display a diagram of heap usage. With this diagram, it is easy to see whether the heap usage increases over time.
You can even configure the management console so that if the heap usage is too large (or based on other events), the console can send you e-mail. This obviously makes it easier to view memory leaks. 3. Memory Leak Detection Tool
There are other tools that specialize in memory leak detection. The JRockit Memory leak detector can be used to view memory leaks and to further identify the source of the leak. This powerful tool is tightly integrated into the JRockit JVM, with very little overhead and easy access to the heap of virtual machines.
Advantages of professional tools
Once you know that a memory leak does occur, a more professional tool is needed to find out why the leak occurred. The JVM is not going to tell you. There are basically two ways that these professional tools can get memory system information from the JVM: JVMTI and Bytecode Technology (byte code instrumentation). Java Virtual Machine tool interface (Java Machine tools INTERFACE,JVMTI) and its predecessor Java Virtual Machine Monitor interface (Java Machine Profiling Interface, JVMPI) is a standardized interface for external tools to communicate with the JVM and gather information from the JVM. Bytecode technology is the technique of using a probe to process bytecode to obtain the information needed by a tool.
For memory leak detection, these two technologies have two drawbacks, which makes them less suitable for production environments. First, they cannot be overlooked in terms of memory footprint and performance degradation. Information about heap usage must be exported in some way from the JVM and collected into the tool for processing. This means allocating memory for the tool. The export of information also affects the performance of the JVM. For example, the garbage collector will run slower when information is collected. Another disadvantage is the need to always connect the tools to the JVM. This is not possible: connect the tool to an already-started JVM, analyze it, disconnect the tool, and keep the JVM running.
Because the JRockit Memory leak detector is integrated into the JVM, there are no two drawbacks. First, many processing and analysis work is done inside the JVM, so there is no need to convert or re-create any data. Processing can also carry (piggyback) on the garbage collector itself, which means increased speed. Second, as long as the JVM is started with the-xmanagement option, which allows the JVM to be monitored and managed through a remote JMX interface, Memory leak detector can connect to or disconnect from the running JVM. When the tool is disconnected, nothing remains in the JVM, and the JVM runs the code at full speed, just as it did before the tool was connected.
Trend Analysis
Let's take a closer look at the tool and how it is used to track memory leaks. Once you know that a memory leak occurs, the first step is to figure out what data is leaking-which class of objects caused the leak. JRockit Memory Leak Detector This step by calculating the number of existing objects for each class at the time of garbage collection. If the number of objects for a particular class grows over time ("growth rate"), a memory leak can occur.
Because leaks can be as small as a trickle, trend analysis must run for a long time. In a short period of time, some of the class's local growth may occur, and then they fall again. But the overhead of trending is very small (the biggest overhead is simply sending packets from JRockit to memory leak detector each time garbage collection is collected). Overhead should not be a problem for any system-even a system in production that runs at full speed.
At first the numbers will jump, but after a while they will stabilize and show which classes are growing.
Find the root cause
Sometimes it is enough to know which objects of the class are leaking to illustrate the problem. These classes may be used only for a very limited part of the code, and a quick check of the code can show the problem. Unfortunately, it is quite possible that only such information is not enough. For example, it is common to leak out on objects of class java.lang.String, but because strings are used throughout the program, this doesn't help much.
What we want to know is what other objects are associated with the leaking object. In this case, string. Why the leaking object still exists. Which objects retain references to these objects. But all the objects that can be listed that hold a reference to string will be so much that they are of no practical use. To limit the number of data, you can group data by class so that you can see which classes of other objects are associated with a leak object (String). For example, a string is common in hashtable, so we might see a Hashtable data item object associated with a string. Pushed backwards by the Hashtable data item, we can finally find the Hashtable object and string (shown in Figure 3) that relate to these data items.
Backward push
Because we are still looking at objects as objects of classes rather than individual objects, we don't know which Hashtable are leaking. If we can figure out how large all the hashtable are in the system, we can assume that the biggest hashtable is the one that is leaking (because it accumulates leaks over time and grows quite large). Therefore, a list of all Hashtable objects and how much data they reference will help us to identify the exact hashtabl that caused the leak.
The calculation of the number of object reference data is very expensive (you need to traverse the reference graph as the root), and it will take a lot of time if you have to do this for many objects. You can find a shortcut if you understand the internal implementation principle of Hashtable. Inside the Hashtable is an array of Hashtable data items. The array grows as the number of objects in the Hashtable grows. So, to find the biggest hashtable, we just need to find the largest array of reference Hashtable data items. It's going to be a lot quicker.
Further
When we find the Hashtable instance where the leak occurred, we can see what other instances are referencing the hashtable and push back to see which Hashtable is leaking.
For example, the Hashtable might be referenced by a MyServer type of object in a field named Activesessions. This information is usually sufficient to locate the source code to locate the problem.
Find out where to allocate
When tracking memory leaks, it is useful to see where the object is allocated. It is not enough to know just how they relate to other objects (that is, which objects refer to them), and the information about where they are created is also useful. Of course, you do not want to create an application's secondary widget to print a stack trace for each assignment. Nor do you want to connect an analyzer to a production environment just to keep track of memory leaks while running the application.
With the help of JRockit Memory leak detector, code in your application can be dynamically added at the time of allocation to create a stack trace. These stack traces can be accumulated and analyzed in the tool. As long as it is not enabled, the cost is not incurred because of this feature, which means that allocation tracking can be done at any time. When a request is assigned to a trace, the JRockit compiler dynamically inserts code to monitor the assignment, but only for the specific class that is requested. Better yet, when data analysis is done, all the added code is removed, and there are no changes in the code that can degrade application performance.
Concluding remarks
Memory leaks are hard to spot. This article focuses on several best practices for avoiding memory leaks, including keeping in mind the content that is placed in the data structure, and closely monitoring memory usage to detect sudden growth.
We've all seen how the JRockit Memory leak detector is used in production systems to track memory leaks. The tool uses a three-step approach to finding leaks. First, conduct a trend analysis to find out which class of objects are leaking. Next, see what other classes are associated with the objects of the leaking class. Finally, look at the individual objects further and see how they relate to each other. It is also possible to perform a dynamic stack trace on all object allocations in the system. These features and the features that are tightly integrated into the JVM allow you to track memory leaks and fix them in a secure and powerful way.
Resources
JRockit Tools Download
BEA JRockit 5.0 Description Document
New features and new tools in JRockit 5.0
BEA JRockit DevCenter