Android Stability Test occured SWT Restart issue

Source: Internet
Author: User

First,problemphenomena

1, System first ANR.

2, after the ANR system restart.

Test Method:

S tability Test .

platform:mt6732

Android version: 4.4.4KK

Buildtype:user

System software version: D17+ZX

System RAM:1GB

Problem probability: No statistics, until now only 1 times

Reference Machine Behavior:

1. Low probability problem , no reference machine behavior .

Second,SolveProgramme

Through preliminary analysis, in-depth analysis ( specific analysis process, key code and log are attached below We know exactly why the problem occurred:

1. S Ystemserver Process of the Main Thread Access Media Audio's Interface for cross-process when called be block.

2. the interface of Media audio will eventually be called to the MediaServer process the.

3. MediaServer in Progress Produce a deadlock.

4, final cause Systemserver first ANR then

there is a certain probability in the execution state of the current code (very small, must meet the SP pointer gets to the following and become last one Strong References of the conditions ) appears recursive Call a added the default to be of the normal type Mutex Auto Lock Method and create a deadlock.

Consider the complex logic in MediaServer, so fix the problem with minimal risk changes, so the scenario given here is not changing much.

Finally, for the root cause of the above problems, we give the following solutions:

1. Extended The life cycle of the SP pointer, making its life cycle longer than the mutex Autolock

In a function with mutex autolock that may be called recursively, the vector container is used to extend the life cycle of the SP pointer so that it can be autolock after the end of the life cycle of the mutex and unlock after the SP pointer is not deadlocked, if recursive calls occur again.

2. Program-related the specific code and BackTrace

The above is the backtrace call stack corresponding to the lock when the deadlock occurs, and the corresponding code, circled by the red line we can see the critical call relationship and state when the problem occurs.

3. The final scenario Code Modify

Third,Preliminary analysis of the problem

To ALTO4.5TF a typical backtrace and log in the case of a problem, found that the main thread of the Systemserver block in a audiosystem inside a native function, causing the ANR and SWT restart, specifically backtrace as follows:

Why will block? by looking at the corresponding code above, it is found that the Setparameter method calls the Setparameter method of Audioflinger in mediaserver across processes through the Audioflinger proxy object. So the next step is to mediaserver whether the process is working properly.

Continuing to look at the call stacks for each thread in mediaserver based on this thread, we found that almost all of the binder threads were block, which explains why the Systemserver call was not processed and the ANR and SWT reboots occurred. The vast majority of the binder thread is backtrace by block as follows:

Since MediaServer is a C/S/native code, you need to use address2line This tool to the corresponding PC pointer address in symbols to find the corresponding source code to analyze, Specific process no longer repeat, we can find some special articles to learn a simple, after this process to get the corresponding source code as follows:

From the above code we can see that the first code in the implementation of Getplayertype is to be slock.lock (), if lock does not succeed will wait, so then continue to see the other thread who occupy this lock. by looking at the other thread's backtrace, binder_5 This thread takes up Slock, but it is also blocked on another mutex lock, The specific backtrace and code are as follows:

Along this trail we continue to look at the other threads in the mediaserver process, to see who occupied ALooperroster 's mLock, and eventually found the RTSP Thread takes Mlock this lock when calling Alooperroster's unregisterstalehandlers function, but at the same time it calls itself indirectly. Unregisterstalehandlers, the final block on this mlock, the specific BackTrace and code as follows:

Four,In- depth analysisproblem

After a preliminary analysis we positioned the first point of the problem, that is, the rtsp thread block in its already lock on a mlock, but also produced 2 problems, and then we continue to further analysis in order to find the answer and the root cause of the problem.

1. why would Indirect recursion Call the Alooperroster::unregisterstalehandlers method?

2. Why RTSP Thread you can block on the mlock?

By further analyzing and viewing the code discovery, the RSTP thread would indirectly call the alooperroster::unregisterstalehandlers method recursively because sp<alooper> Looper = Info.mLooper.promote (); In this code, the Looper strong pointer sp, in the case of a problem, out of the scope of a for loop will be the last SP holding the Alooper object and the release of the Alooper object at the time of destruction, which causes the destruction of the Alooper object, The alooperroster::unregisterstalehandlers method is called again when the Alooper object is refactored, resulting in an indirect recursive call. About Android in the native layer of C + + smart pointers, SP, WP here is a brief introduction, the approximate principle is to use the SP and WP pointer object constructor and destructor to the object pointed to the reference count of the addition and subtraction, the object pointed to by the pointer must inherit from the class Refbase, the object is disposed according to the object's life-cycle flag in a strong reference count or a weak reference count of 0 o'clock:

The life cycle defaults to strong, which frees the object when the object 's strong reference count is 0 , and the key code is as follows:

After answering the first question, let's look at the second question, and before we answer the second question, we need to briefly say aUtolock class, which is in system/core/include/utils/ Mutex.h within the mutex class, the specific code is as follows:

The principle of this class is simple to understand is in the constructor lock, destructor unlock, which just takes advantage of C + + characteristics, so enter the scope of autolock object will lock, out of scope will be unlock, so as to achieve automatic mutex. The mutex class is a package of pthread_mutex_t , and pthread_mutex_t is the NORMAL type, which is not recursive re-entry :

and The alooperroster::unregisterstalehandlers method is synchronized using the mutex::autolock autolock (mLock) , In the absence of autolock the scope of time to call the recursive, then mlock this mutex is lock, because the mutex default of this non-recursive re-entry of the property eventually led to the current deadlock phenomenon, to this also answered why RTSP Thread himself why the block on the Mlock.

Five,SolveProgrammePotentialthe Impact

because only is a slight delay in the life cycle of the SP pointer, out of function Scope It will be released immediately, so the current scenario No other known effects , finally look at the modified code again and the Notes :


Analyzed by Vincent.song from SWD2 Framework team.

[Email protected]

201506242052


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Android Stability Test occured SWT restart issue

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.