Android ANR monitoring and analysis, and androidanr monitoring
Reprinted from: http://www.10tiao.com/html/203/201609/2649752287/1.html
Application Not Responding (ANR): The system detects that the APP has Not responded for a long time. Although ANR is Not abnormal, it seriously affects the user experience. Therefore, it is necessary to report and resolve ANR.
ANR trigger Condition
"The necessary condition for triggering ANR is that the main thread is blocked.
There are three types:
The main thread does not finish processing the input event within 5s;
Service blocked for 20 s;
The foreground broadcast is blocked for 10 s or the Background broadcast is blocked for 60 s.
In actual use, ANR is usually triggered by the first trigger condition.
ANR Execution Process
Understanding the ANR execution process helps us to develop ANR monitoring policies and obtain ANR information. The steps for ANR execution are as follows:
ANR is captured by the system;
Process sends Linux semaphore 3 to the current Process and other running processes in sequence;
Processes receive Linux semaphores and write process information to/data/anr/traces.txt;
Log logs print ANR information;
The process enters the ANR State (at this time, the ANR information of the process can be obtained );
The ANR prompt box is displayed;
The prompt box disappears and the process returns to normal.
Since it takes a long time to write information to the/data/anr/traces.txt file, the prompt box from the Input ANR trigger to the pop-up ANR is usually about 10 s (different rom times ).
ANR Monitoring Mechanism
When an ANR occurs in an APP, it does not have a callback interface when a java exception occurs. Therefore, we need to actively monitor ANR and obtain ANR information.
1 fileobserver Monitoring Mechanism
Monitoring Method: Use FileOberver to monitor file changes in the data/anr folder to determine the occurrence of ANR.
Obtain information: main thread stack information + ANR information.
Advantage: based on the underlying notification mechanism of Linux, there is no additional consumption of performance.
Disadvantage: Most models of Android5.0 and later versions cannot be monitored.
When ANR occurs, multiple processes write information to the data, anr, and traces.txt files sequentially. Therefore, the onEvent method is called back multiple times, therefore, after the onEvent method is called back for the first time, the internal logic of the method is no longer executed.
FileObserver monitoring process:
Because the process did not enter the ANR status when the ANR is monitored, the process information is first written to the/data/anr/traces.txt file, and the ANR information of the process is not obtained immediately, wait until the process enters the ANR state cyclically.
ANR Information Acquisition Process:
Obtain the master thread stack information
Thread mainThread = Looper. getMainLooper (). getThread ();
StackTraceElement [] mainStackTrace = mainThread. getStackTrace ();
2WatchDog Monitoring Mechanism
Monitoring Method: In the child thread, messages are sent to the main thread every 5s to determine whether the main thread is blocked. If blocking occurs, ANR is considered to occur.
Obtain information: main thread stack information + ANR information.
Advantage: the ANR of any model can be monitored and won't become invalid due to some system changes.
Disadvantage: it will produce additional consumption (with little impact ).
Set a local variable and a global variable. When monitoring is enabled, the two variables are equal, and the main thread executes the global variable + 1 operation. After five seconds, determine whether the global variable is equal to the local variable, if they are equal, the main thread is blocked for 5 s. Continue to let the main thread execute the global variable + 1 operation. Because the last 5s main thread is blocked, the process may have ANR. Therefore, the ANR information will be continuously obtained in the next 5s, after 5s, the system does not get the result and continues to judge whether the global variable has completed the plus 1 operation. (The ANR information is continuously obtained to determine whether the ANR has actually occurred)
WatchDog monitoring process:
WatchDog monitoring cycle:
If the monitoring interval is set to 5 seconds, the monitoring cycle is 5 seconds ~ 10 s.
For example, Green indicates that the main thread is idle, and red indicates that the main thread is blocked. Message is sent to the main thread at 0 s. At this time, the main thread is not blocked and the global variable + 1 is executed. At 5s, the global variable + 1WatchDog does not consider the main thread as blocked, and continue to send information to the main thread. When 9 s, the main thread executes the global variable + 1 in idle time, when 10 s, the global variable is complete + 1, and WatchDog does not recognize the main thread as blocking.
Phenomenon: WatchDog with 5s monitoring interval does not find that the main thread continues to be congested for 8 s.
Conclusion: The condition that WatchDog monitors the main thread blocking is that the main thread continues to be blocked at a monitoring interval.
The monitoring interval is 5 s, the main thread is blocked for 8 s, the WatchDog monitoring probability is (8-5)/5 * 100% = 60%, and the monitoring probability of the main thread being blocked for 10 s and above is 100%.
Since the blocking of the main thread usually persists until the ANR prompt box disappears, this process is about 10 seconds, so the monitoring interval is set to 5 seconds.
3. Hybrid Monitoring
Using the advantages of the above two monitoring mechanisms, we finally use the following solution to achieve full-model monitoring of ANR:
FileObserver + WatchDog
FileObserver is used for Android 5.0 or earlier versions;
Android 5.0 and later versions determine whether events in the data \ anr folder can be monitored
A. You can monitor and use FileObserver;
B. If the monitoring fails, use WatchDog.
Log Analysis
ANR information and main thread stack Information Analysis
1ANR Information
ANR processes and components;
ANR type (more than 5.0 with the length of the waiting queue and the length of the waiting queue header );
CPU information (you can determine whether the ANR is caused by performance issues)
Average CPU Load (1 minute, 5 minutes, 15 minutes) Average number of processes in use and waiting for CPU use at a time point;
Average CPU usage before and after ANR/total time x 100%;
User: user running time
Kernel: kernel running time
Iowait: Time to wait for I/O operations
Irq: CPU hard interrupt time
Softirq: CPU soft interrupt time
Faults: minor (secondary page error)
Major (Main Page error)
When the kernel reads data, it first looks for the CPU cache and memory. If the MPF information is not found, and the disk data is loaded into the memory, the kernel sends a MnPF message when reading the data again. The number of major operations can be used to infer that the process is performing disk I/O operations.
If the average CPU usage is close to 100%, the ANR may be caused by insufficient CPU resources;
If the CPU usage of a process is very high, it may be because the process occupies too much CPU resources, resulting in ANR of our process.
2. Master thread stack information
The underlying layer of the main thread stack information is the function called when the process starts. ActivityThread. main is the entrance of the Android program, and the top layer is the function being executed by the main thread.
We can find our program code from the bottom up based on the main thread stack information, and check whether time-consuming operations are performed in the code.
If no program code is found in the stack information, you can use ActivityThread. the main method starts to view the top-layer stack information calling method step by step. It analyzes the Framework layer source code to determine the current execution status of the main thread, and finds the possible cause of this status in our code.
How to Avoid ANR
1 main thread
Avoid time-consuming operations such as network, database, and a large number of operations in the main thread;
The main Thread should not use Thread. wait () or Thread. sleep () to wait for the sub-Thread, use the AsynTask Handler Bolts framework, and notify the main Thread when the sub-Thread completes;
When IntentService is used, IntentServcie is executed in the Child thread;
Reduce the priority of sub-threads. The main thread can have more opportunities to use CPU resources.
2 pause
The time that people perceive is 200 ms to ms.
If the application is performing time-consuming tasks in the background, you can use ProgressBar or ProgressDialog to prompt the user's work progress;
If the initialization process of an application is time-consuming, you can display a Splash Activity during initialization, or you can display the main interface before asynchronously loading the initialization data.
QACR Platform
The QACR platform supports monitoring Android client java exceptions, native exceptions, and ANR lagging. Report exceptions to the QACR exception statistics platform in a timely manner. The platform manages exceptions and provides multidimensional statistics. QACR provides two environments, release and beta, for developers to manage and collect statistics on exceptions of online packages and development test packages.
View comments