First,problemphenomena
1, the user intuitively see the phenomenon is the system first ANR.
2, after the ANR system restart.
Test Method:
In the recording screen constantly slide the volume progress bar, while the landline phone to the test machine, the phone is not connected, I saw the interface freezes, eject the ANR, and then the system restarts.
platform:mt6732
Android version: 4.4.4KK
Buildtype:user
System software version: SWA3A+UMA0
System RAM:1GB
Problem probability: ≈2%
Reference Machine Behavior:
1. Low probability problem , no reference machine behavior .
Second,SolveProgramme
Through preliminary analysis, in-depth analysis ( specific analysis process, key code and log are attached below We know exactly why the problem occurred:
1, s related Setindex and getindex etc hread, Span style= "Color:rgb (0, 112, 192); > these each
2, these synchronized Span style= "Color:rgb (0, 112, 192); > method instance synchronized
3. It is a lazy way to completely declare a method with the Synchronized keyword, which results in too much granularity of the sync lock. No The granular critical section reduces the throughput of the code flow execution in a high-concurrency state of multithreading, and will be Increase in the deadlocks are dependent on each other during interleaved calls Collision the possible.
there is a certain probability in the execution state of the current code (very small, depending on the specifictiming of process scheduling) occurs because of the scheduling cause and is first dispatched to thethe Thread1andthe object was executedA'ssynchronization method, and thenSchedulingto theThread2andExecution ObjectB'ssynchronization method, in ObjectB'sthe synchronization method wants to gocalledObjectA'sSynchronization Method,There's a blockage, and then it's dispatched to theTHread1,continue to execute objectA'sSynchronize the code in the method, and then go to call the objectB'ssynchronization method, becausein Thread2inalready holdtheObjectB'slock, so at this pointTHread1alsoThere was a blockage,Currentthe state isTHread1and Thread2waiting for each othereach other releases the lockandwait indefinitely, a variety of code streams are not executed and deadlock.
Consider the complex logic in the audioservice, so we need to fix this problem with minimal risk changes, so the scheme given here does not make too much change, and it is more obvious that all the synchronized keyword of these problematic code, AOSP there is a certain amount of space for optimization.
Ultimately, for the root cause of the above problems, we offer the following solutions:
1. Replace the sync lock of type
In a critical code area that requires synchronization, the class's global lock is used instead of each instance's own lock, ensuring that multiple thread does not deadlock when the calls are interspersed with each other.
2. Program-related the specific code and BackTrace
The above is the backtrace call stack corresponding to the lock when the deadlock occurs, and the corresponding code, circled by the red line we can see the critical call relationship and state when the problem occurs.
3. The final scenario Code Modify
Third,Preliminary analysis of the problem
To Alto4.5tmo a typical backtrace and log in the case of a problem, found that the main thread of the Systemserver block in a audioservice inside a function, causing the ANR and SWT restart, specifically backtrace as follows:
Why will block? by looking at the corresponding code above, it is found that this method is synchronized, and the method will traverse and invoke the Synchronized method of the same type but different instance object when the condition is met. Thus the block is required to satisfy a condition: the Synchronized method of the same type of different instance object is not entered, that is, the synchronized method has been entered in the other thread.
According to this clue continue to see the systemserver in the Audioservice-related thread of the call stack, find binder_2 this thread, the specific backtrace as follows:
by BackTrace and corresponding code we find that the thread of binder_2 is also in the synchronized function of a audioservice interior, In the same function, the synchronized method of different instances of the same type is called when certain conditions are met.
Four,In- depth analysisproblem
After a preliminary analysis, we identified the first problem point, that is, two different thread blocks in the same type of synchronized method, but also produced 1 problems, and then we continue to further analysis to find the answer and the root cause of the problem.
1. two x T Hread why would at the same time block?
By further analyzing and viewing the code discovery, since the two thread is performing the synchronized method, if they have an interdependent relationship due to scheduling and execution reasons, then the block phenomenon will occur and deadlock, because BackTrace can only see the call relationship , we can not know the state of the individual object instances at runtime, so we simulate the problem state of the current two thread in the Systemserver according to BackTrace, the result exactly matches the current problem phenomenon, the concrete simulation code is as follows:
First, customize a thread class, receive two instances of the Testsync class and invoke the synchronous method of instance 1 in run, and pass instance 2 in the past.
Then define a Testsync class and define the member functions of the two synchronized, then sleep 10ms at the beginning of each function to meet the state of the process dispatch switch.
finally in The activity is tested in the Onresume method, and the results of the test activity are ANR, why would The ANR?
principle and the above Systemserver ANR and SWT restarts like, here Activity of the UI Main Thread and the new CT 1 Threads a deadlock has occurred.
The execution of the above code is roughly as follows:
1. New T1,T2 Two instances of the Testsync class and instances of the Cthread class Ct1 and pass T1 and T2 to the past
2. Start CT1 this thread
3. Whether the Ct1 code flow is first dispatched to execution or the main thread of the UI continues execution will enter the synchronized method of T1 or T2.
4, this assumes that Ct1 immediately after start is dispatched to and executed the T1 synchronized method, and then sleep 10ms, at this time the dispatch again.
5, the main thread of the UI is dispatched again, then executes the T2 synchronized method, sleep 10ms, again dispatched to the other thread.
6, wait until the Ct1 10ms sleep first dispatched to Ct1, and then execute T2 synchronized method, there will be blocking, because in the UI main thread has entered the T2 synchronized method, That is, the T2 instance own lock is already locked, and then dispatched to the other thread.
7, wait until the UI main thread 10ms sleep end again dispatched to the UI main thread, and then execute T1 synchronized method, here also block, because in Ct1 has entered T1 synchronized method, The T1 instance's own lock is already locked, and then dispatched to the other thread.
8, at this time Ct1 and UI main thread has produced interdependence and deadlock.
Change the synchronized keyword used in the above code to the global lock of the synchronization class, the problem is resolved, and the activity will no longer be ANR, specifically changed as follows:
Five,SolveProgrammePotentialthe Impact
because of the use class, and there is no subdivision critical section, so the throughput of the code execution stream can be slightly reduced in high concurrency, but this effect on S in Ystemserver A Udioservice the Setindex and GetIndex, etc. Method can be ignored, because these several methods are very lightweight and The amount of concurrency does not reach too high a magnitude.
Analyzed by Vincent.song from SWD2 Framework team.
[Email protected]
201506241646
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Android System ANR caused SWT restart issue