Android bug (3)-widely known criticism: frequent restart

Source: Internet
Author: User

Those who have used Android, especially the cottage pad, should be impressed with the restart of Android. Due to the complexity of its design, Android may cause the system to go into an abnormal state inadvertently. Therefore, Android designs a set of watchdog mechanism, which automatically restarts after detecting the problem.

Let's talk about the problem I encountered. When I started Android, the restart problem of Android was very serious. After a few operations, I stopped and waited for about one minute to restart ..., Trace is about:

W/Watchdog(  813): *** WATCHDOG KILLING SYSTEM PROCESS: com.android.server.am.ActivityManagerServiceW/AudioFlinger(  745): power manager service died !!!I/ServiceManager(  737): service 'input_method' diedI/ServiceManager(  737): service 'textservices' diedI/ServiceManager(  737): service 'uimode' diedI/ServiceManager(  737): service 'vibrator' diedI/ServiceManager(  737): service 'battery' diedI/ServiceManager(  737): service 'permission' diedI/ServiceManager(  737): service 'cpuinfo' died

From this trace, the problem lies in activitymanangerservice. But what is the problem? Continue to track the reboot mechanism of watchdog. We can see that its implementation mechanism is implemented by detecting whether the locks of various services in the system are normal (details are not described in detail, for more information, see Deng fanping's in-depth understanding of Android: Volume 1. This book is quite good.) When there is a deadlock, will kill the system server process so that the android framework restarts and the system continues to work.

This problem also left me confused for a long time at the beginning. It was quite painful for Android because of its complicated architecture and massive source code. Fortunately, the debugging methods and tools provided by Android are relatively complete. From the log, we found that ANR trace was generated before the watch dog exited, so let's analyze it from this place.

When I got the trace of ANR, I still had no clue. I used to call the dump of the stack. I took a closer look and found a good piece of information hidden in the stack frame information:
For example, the next stack frame:

----- pid 861 at 2012-02-11 14:57:50 -----Cmd line: system_server
DALVIK THREADS:(mutexes: tll=0 tsl=0 tscl=0 ghl=0)"main" prio=5 tid=1 MONITOR  | group="main" sCount=1 dsCount=0 obj=0x2ba9c460 self=0x8e820  | sysTid=861 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=716342112  | schedstat=( 0 0 0 ) utm=464 stm=65 core=0  at com.android.server.am.ActivityManagerService.isUserAMonkey(ActivityManagerService.java:~6546)  - waiting to lock <0x2c1141c8> (a com.android.server.am.ActivityManagerService) held by tid=59 (Binder Thread #6)  at android.app.ActivityManagerNative.onTransact(ActivityManagerNative.java:1273)  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:1545)  at android.os.Binder.execTransact(Binder.java:338)  at com.android.server.SystemServer.init1(Native Method)  at com.android.server.SystemServer.main(SystemServer.java:808)  at java.lang.reflect.Method.invokeNative(Native Method)  at java.lang.reflect.Method.invoke(Method.java:511)  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784)  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551)  at dalvik.system.NativeStart.main(Native Method)

What does this mean? Looking at the red part above, it indicates that the main thread is waiting to lock an object 0x2c1141c8 (usually the synchronized operation, where COM. android. server. am. activitymanagerservice is an object), but it is occupied by TID = 59. Let's look at the stack frame of tid = 59:

"Binder Thread #6" prio=5 tid=59 MONITOR  | group="main" sCount=1 dsCount=0 obj=0x2c3bd838 self=0x34c5d8  | sysTid=1120 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=3460688  | schedstat=( 0 0 0 ) utm=168 stm=48 core=0  at com.android.server.am.BatteryStatsService.noteStopWakelock(BatteryStatsService.java:~114)  - waiting to lock <0x2c117d50> (a com.android.internal.os.BatteryStatsImpl) held by tid=13 (ProcessStats)  at com.android.server.PowerManagerService.noteStopWakeLocked(PowerManagerService.java:798)  at com.android.server.PowerManagerService.releaseWakeLockLocked(PowerManagerService.java:1015)  at com.android.server.PowerManagerService.releaseWakeLock(PowerManagerService.java:967)  at android.os.PowerManager$WakeLock.release(PowerManager.java:319)  at android.os.PowerManager$WakeLock.release(PowerManager.java:300)  at com.android.server.am.ActivityStack.activityIdleInternal(ActivityStack.java:3254)  at com.android.server.am.ActivityManagerService.activityIdle(ActivityManagerService.java:3953)  at android.app.ActivityManagerNative.onTransact(ActivityManagerNative.java:362)  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:1545)  at android.os.Binder.execTransact(Binder.java:338)  at dalvik.system.NativeStart.run(Native Method)

Why didn't TID release the Lock Object 0x2c1141c8? Because it waits for the Lock Object 0x2c117d50 (an object of the COM. Android. Internal. OS. batterystatsimpl type )! If you have a wealth of insect catching experience, you must be clear about this. When you hold the lock, you will request the lock again, which is probably a deadlock!

Let's look at the situation where the requested lock is held by TID = 13:

"ProcessStats" prio=5 tid=13 MONITOR  | group="main" sCount=1 dsCount=0 obj=0x2c146f58 self=0x2954f0  | sysTid=877 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=2709096  | schedstat=( 0 0 0 ) utm=6 stm=4 core=0  at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:~12430)  - waiting to lock <0x2c1141c8> (a com.android.server.am.ActivityManagerService) held by tid=59 (Binder Thread #6)  at android.app.ContextImpl.sendBroadcast(ContextImpl.java:909)  at com.android.server.DropBoxManagerService.add(DropBoxManagerService.java:236)  at android.os.DropBoxManager.addText(DropBoxManager.java:272)  at com.android.server.am.ActivityManagerService$11.run(ActivityManagerService.java:7630)  at com.android.server.am.ActivityManagerService.addErrorToDropBox(ActivityManagerService.java:7635)  at com.android.server.am.ActivityManagerService.handleApplicationWtf(ActivityManagerService.java:7448)  at com.android.internal.os.RuntimeInit.wtf(RuntimeInit.java:345)  at android.util.Log$1.onTerribleFailure(Log.java:103)  at android.util.Log.wtf(Log.java:278)  at com.android.internal.os.BatteryStatsImpl.getNetworkStatsDetailGroupedByUid(BatteryStatsImpl.java:5738)  at com.android.internal.os.BatteryStatsImpl.access$100(BatteryStatsImpl.java:76)  at com.android.internal.os.BatteryStatsImpl$Uid.computeCurrentTcpBytesReceived(BatteryStatsImpl.java:2457)  at com.android.internal.os.BatteryStatsImpl$Uid.getTcpBytesReceived(BatteryStatsImpl.java:2446)  at com.android.internal.os.BatteryStatsImpl.writeSummaryToParcel(BatteryStatsImpl.java:5437)  at com.android.internal.os.BatteryStatsImpl.writeLocked(BatteryStatsImpl.java:4836)  at com.android.internal.os.BatteryStatsImpl.writeAsyncLocked(BatteryStatsImpl.java:4818)  at com.android.server.am.ActivityManagerService.updateCpuStatsNow(ActivityManagerService.java:1649)  at com.android.server.am.ActivityManagerService$3.run(ActivityManagerService.java:1531)

OK. Here the lock request lock is held again, and the requested lock is occupied by TID = 59! Here is the deadlock between tid = 59 and tid = 13!

The problem has been found. How can this problem be solved? In fact, the cause of the problem is not complex. After careful analysis of the stack with errors, you can find the rule. errors are caused when the system uses log. WTF () to record errors. WTF is short for what a terrible failure, which indicates that the system has encountered a serious error. This problem is traced because the kernel version is too low and netfilter is not supported.

However, this does not indicate that it is an android bug, but take a closer look: log. WTF will eventually call COM. android. server. am. activitymanagerservice. broadcastintent, which requires the Lock of COM. android. server. am. activitymanagerservice, if the code written (including Android and subsequent developers) is not careful, capture exceptions in some places to get a log. WTF will cause the system to restart, and the intended add Dropbox function cannot be implemented normally at this time. It seems that android does not take this function seriously when designing and testing it.

Solution: either seriously remove WTF, or simply comment out adderrortodropbox in the activitymanagerservice. Java method handleapplicationwtf. It is not a good job, and it only generates debugging information, which is of little significance to the product.

It should also be noted that this is only one cause of the restart problem. Because Android is open-source, everyone will change or add code. A slight carelessness in the lock operation will cause the system to restart. According to my experiences with serious debugging problems, about of the above problems are caused by such multi-thread synchronization/state machine problems.

One of the main reasons for the restart of Android is as follows: the lock of the system monitor is deadlocked or the lock takes too long. This article only addresses a public Android issue. In actual development, there are many bugs that cause deadlocks and lock hold for too long due to customization and hardware problems (such as GPU, this method can be used to find and solve the problem, so as to stabilize the system. Now, the restart problem of Android is still complained.

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.