A major advantage of Java compared with C ++ is that Java can automatically manage memory collection, which greatly reduces the burden on programmers. However, Java does not eliminate all memory problems, but it still causes memory leaks, but the reason is different from that of C ++, so it appears less. Java's memory garbage collection mechanism checks the reference chain from the main running objects of the program. After traversing it, it finds that the isolated objects that are not referenced are used as garbage collection. For more information, seeArticle(Http://www-900.ibm.com/developerworks/cn/java/j-leaks/index_eng.shtml). As code check tools become more and more advanced, C ++ memory vulnerability check has become much easier. However, Java memory vulnerabilities cannot be directly checked by tools due to different mechanisms, you only need to rely on auxiliary tools to detect and solve problems manually. The following is the process of debugging a memory leak problem in a large software system. I have also written down the failure practices, because they are of reference value: Symptom: The program runs normally after it is started. However, after a module is started, 1 MB of memory is consumed every minute, which continues to grow to Java. lang. outofmemoryerror, And the CPU usage also gradually increases to 100%. Judgment: due to problems in both the CPU and memory, it is impossible to determine which is the main cause or may be caused by multiple reasons. It is estimated that the problem is most likely caused by multithreading or memory leakage. Auxiliary tool: because it is not a program written by myself, it is impossible to directly estimate which part of the code is the problem. Therefore, you need to use auxiliary tools. I usually use the Eclipse plug-in Ru. nlmk. eclipse. plugins. profiler (free) and Borland optimizeit Suite (charged ). Because I use eclipse to develop programs, it is more convenient to use the former. I personally feel that the system resources used are a little less, and the CPU check is a little stronger, however, the latter is much more powerful in Memory checks. Analysis process: At that time, due to CPU and memory problems, it was impossible to determine which was the main cause, and two data needs to be checked. Memory leakage can be seen only after the program runs for a period of time, so it takes a lot of time to check. To reduce interference, I tried to shut down all unnecessary threads in the system one by one, and then shut down all threads that have no effect on memory. First, consider solving the memory problem. Because the Java Memory leakage problem is often related to hashmap, hashmap may be a breakthrough, and such a large amount of memory leakage, it is likely that some hashmaps store too many useless objects (if this is the reason, we can solve the problem by using weakhashmap ). I have derived sub-classes from hashmap and linkedhashmap, and added the data size check information to use these sub-classes in all aspects of the system that construct hashmap and linkedhashmap. However, after running, it is found that there are only a dozen hashmaps constructed in the system, and there are not many objects pointing to each. Only a few are close to. After carefully checking the code, it is found that many hashmaps pointing to objects will not cause too many memory leaks. In this way, my thinking is interrupted and I have to turn to tools to find doubts. It is found that after the CPU usage reaches 100% for a period of time, the program will be in the dead State and cannot run, and the memory issue will not be detected. Therefore, we decided to check the CPU first, use the profiler plug-in of eclipse to check. Figure 1 threads view of the profiler plug-in eclipse Because the problem occurs after a module is started, disabling the statistical function makes the result more meaningful (if all samples are collected, the resource consumption in the initialization phase has a great impact on the result ), it can also speed up program startup. After the problem module is started, start the statistical function. Click a thread in threads view and the running time of each method called in the thread method view is displayed. Figure 2 click a thread in the threads view. The related content is displayed in the thread Methods view. Sort by total time to find out which methods occupy the most CPU. Figure 3 View data of thread methods by total time However, at that time, I made a very serious mistake. In the past, the system class (such as Java. *. Sun. * And so on. If you forget to cancel the filter, the result is very confusing. The most CPU-Consuming function (more than 30%) has only one long value assignment statement! This is set in the properties of profiler, Figure 4 runtime attribute settings of the eclipse profiler plug-in Later I thought of this problem. I reset fliter and the detection results would be normal, but there was no valuable clue, the method that consumes a lot of CPU is the java system package (after finding out the problem, I think it is actually a clue here ). Therefore, it is suspected that the JVM memory management module will occupy a lot of CPU when the memory is used up quickly, which leads to a high CPU usage of the Java program (it is not known whether it is true yet). On the other hand, these detection tools consume a large amount of system resources, which is also the main cause of 100% CPU usage. As a result, Borland optimizeit suite is used to check memory problems. Start the profiler module in optimizeit suite: Figure 5 start the profiler Module Figure 6 Add a new setting on the main profiler Interface Note that when using Borland optimizeit suite, set main class in jar, and enter all external jar packages used in classpath. Otherwise, the system cannot start or run properly. This tool is unfriendly and the output information is not clear when it cannot be started, so it is best to check the information first. Figure 7 set main class and classpath in setting of profiler Generally, some packages run multiple times during the test. When you find that some packages have nothing to do with the problem, you can filter them out so that the final result is easier to read. Figure 8 set Filter Start the program with Profiler. After running the problem module, click the button on the rightmost of the toolbar to mark the memory usage of each object at that time. Figure 9 mark the current memory (black lines on the red column in the figure) Click the exclamation point on the toolbar and select the displayed content. I think Show size is the most important question. Figure 10 set the displayed content After running for a while, sort by size diff. After finding that the memory usage is large, double-click the row to open the detailed information of the object (currently the bug has been changed. I just clicked a class as an example ). Expand the entire tree to see the method in which all instances of this class are generated (if the CPU information is tracked, the resource allocation tree is also displayed ). Figure 11 structure location distribution of all objects in a class Because the most memory used is usually the basic data type of the system, such as char, int [], and so on, it is necessary to manually determine which classes are really suspicious. I found that the number of objects in several classes in the top 20-30 of the new applied memory usage is exactly the same, and the number of objects continues to grow synchronously. It is estimated that this is caused by repeated Object Construction in some threads, it is more suspicious than other classes that increase at irregular speeds, So I carefully checked the details of these classes and found that they come from the same function. After carefully reading the code, we can find out the cause of Memory leakage. It turns out that a line of code is wrong and a new button is continuously constructed. However, vector is used to save these buttons! The idea of checking hashmap is correct. It's a pity that we just started from our own programming habits and didn't take vector into account, otherwise the problem would be easier to find out. However, at that time, a mutation process was detected, and no reason has been found. When no one operates on the computer, the CPU and memory usage increases at a certain time. Figure 12 sudden changes in CPU usage Figure 13 sudden changes in memory usage |