Android Native層Binder.transact()函數調用 Binder.onTransact() 函數失敗分析
Q:Android Native層Binder.transact()函數調用 Binder.onTransact() 函數失敗?
在Android Native層調用Camera.h中的api實現一個截屏功能的應用時,發現通過gCamera->setListener(new ScreenCaptureListener())設定到Camera的mListener的用於接收Camera預覽資料的回呼函數沒有被調用,導致截屏失敗?
註:
Camera類檔案匯總:
libcamera_client.so
Camera
ICamera
ICameraClient
ICameraService
CameraBase
CameraHardwareInterface
libcameraservice.so
CameraService
CameraClient
Camera2Client
A: 原因分析
梳理Camera預覽的整個正確流程應該如下:
//TODO:Camera從上往下設定調用層次圖1.
Native demo -> Camera -> CameraService -> CameraClient -> CameraHardwareInterface -> CameraHal_Module -> XCDipHardware_einstein -> PipManager
//TODO:Camera從下往上回調調用層次圖2.
SN -> XCDipHardware_einstein -> CameraClient -> ICameraClient -> Camera -> SCREENSHOT_MAIN
其LOG如下:
01-01 23:13:40.855 D/XCDipHardware_einstein( 1154): call processLoop.
01-01 23:13:40.865 D/XCDipHardware_einstein( 1154): [XCDipHardware] handlePreviewData call datacb.
01-01 23:13:40.865 V/CameraClient( 1154): __data_cb
01-01 23:13:40.865 D/CameraClient( 1154): dataCallback(16, 0x10 )
01-01 23:13:40.865 D/CameraClient( 1154): CameraClient::handlepreviewData.
01-01 23:13:40.865 V/ICameraClient( 1154): dataCallback
01-01 23:13:40.865 D/ICameraClient( 1154): Bp tid: 4090111104, pid: 1154.
01-01 23:13:40.865 V/ICameraClient( 2879): DATA_CALLBACK
01-01 23:13:40.865 D/ICameraClient( 2879): Bn tid: 2915554048, pid: 2879.
01-01 23:13:40.865 D/Camera ( 2879): Camera::dataCallback
01-01 23:13:40.865 D/Camera ( 2879): Callback tid: 2915554048, pid: 2879.
01-01 23:13:40.865 I/SCREENSHOT_MAIN( 2879): ScreenCaptureListener::postData. offset = 0, size = 1228800,
但是在回傳採集到的映像資料的過程中,ICameraClient中的BpCameraClient調用完remote()->transact(DATA_CALLBACK, data, &reply, IBinder::FLAG_ONEWAY)函數後,並沒有接著調用到BnCameraClient::onTransact的DATA_CALLBACK,而是直接BpCameraClient dataCallback call transact finished.
從這兒來看,好像是Android的Binder調用出現了問題!
關於IBinder及其transact函數,在網上找到如下說明:
IBinder
android.os.IBinder
Class Overview
Base interface for a remotable object, the core part of a lightweight remote procedure call mechanism designed for high performance when performing in-process and cross-process calls. This interface describes the abstract protocol for interacting with a remotable object. Do not implement this interface directly, instead extend from
Binder
.
The key IBinder API is
transact()
matched by
Binder.onTransact()
. These methods allow you to send a call to an IBinder object and receive a call coming in to a Binder object, respectively. This transaction API is synchronous, such that a call to
transact()
does not return until the target has returned from
Binder.onTransact()
. this is the expected behavior when calling an object that exists in the local process, and the underlying inter-process communication (IPC) mechanism ensures that these same semantics apply when going across processes.
The system maintains a pool of transaction threads in each process that it runs in. These threads are used to dispatch all IPCs coming in from other processes. For example, when an IPC is made from process A to process B, the calling thread in A blocks in transact() as it sends the transaction to process B. The next available pool thread in B receives the incoming transaction, calls Binder.onTransact() on the target object, and replies with the result Parcel. Upon receiving its result, the thread in process A returns to allow its execution to continue. In effect, other processes appear to use as additional threads that you did not create executing in your own process.
Binder通訊過程中,transact()和Binder.onTransact()在不同的兩個進程中被調用,從我錄出來的log中發現,transact()函數跑在系統的mediaserver進程中,而Binder.onTransact()應該是跑在我的demo進程中的,這些API原本都應該是同步的,當mediaserver進程的調用線程把transaction發送給demo進程之後,自身就應該阻塞在transact()中,demo進程中的空閑線程然後接收過來的transaction,並對目標對象調用Binder.onTransact(),並用結果Parcel回複,mediaserver進程中的線程收到後,繼續從阻塞的地方開始執行。
但是現在發現,我的這個demo程式中出現兩個問題:
1. Transact()函數被調用後直接返回了,並沒有阻塞住?
2. Transact()函數調用後,我的demo進程並沒有去執行Binder.ontransact()函數,說明要麼是我的demo進程或者其相關線程些時不存在或者是阻塞住了?
為什麼transact()函數調用後並沒有阻塞住?
看ICameraClient.cpp檔中的BpCameraClient:: dataCallback()函數
圖3 BpCameraClient:: dataCallback()函數<喎?http://www.bkjia.com/kf/ware/vc/" target="_blank" class="keylink">vcD4NCjxwPrzssuK3tbvY1rVlcnKjrGxvZ7Kiw7vT0LTy06GjrMu1w/e3tbvY1rXV/ci3o7u87LLiQmluZGVyLnRyYW5zYWN0KCm1xLLOyv1JQmluZGVyOjpGTEFHX09ORVdBWaOsudnN+L3iys2jujxiciAvPg0KPGltZyBhbHQ9"這裡寫圖片描述" src="http://www.bkjia.com/uploads/allimg/150829/12310W1F-1.png" title="\" />
圖4 Binder參數FLAG_ONEWAY
在Binder.transact()中加多這個參數,這個函數就是一個非同步呼叫,會立刻返回,我做了一個測試,將這個參數去掉,再次運行時發現圖3最後一行log確實沒有輸出來,BpBinder調用線程阻塞住,但是demo程式依然沒有映像輸出來!說明Binder失敗的原因不在這兒!
Demo進程中相關線程阻塞或者不存在?
在demo程式運行卡住時,我在系統中使用
shell@pitaya:/data/capture # debuggerd64 ps | grep screenshot | busybox awk '{print $2}'
Sending request to dump task 2994.
Tombstone written to: /data/tombstones/tombstone_00
列印出當前demo進程狀態資訊,發現demo進程只有一個主線程
pid: 2994, tid: 2994, name: screenshot >>> /system/bin/screenshot <<<
這個線程堆棧如下:
backtrace:
#00 pc 0000000000019a5c /system/lib64/libc.so (syscall+28)
#01 pc 00000000000202b0 /system/lib64/libc.so (pthread_mutex_lock+252)
#02 pc 000000000001efb4 /system/lib64/libc.so (__pthread_cond_timedwait_relative(pthread_cond_t*, pthread_mutex_t*, timespec const*)+116)
#03 pc 000000000001f028 /system/lib64/libc.so (__pthread_cond_timedwait(pthread_cond_t*, pthread_mutex_t*, timespec const*, int)+68)
#04 pc 00000000000031b4 /system/bin/screenshot
可以看出,當前它正阻塞在gAvailableCV.waitRelative(gAvailableLock, 1000*1000000)函數中, 於是我用pthread_create函數另起了一個線程去執行Camera::connect()函數,依然是這樣的。
說明Bn端的Binder訊息並不是在主線程或者我們自己使用pthread_create建立的線程中處理的,也就是說我們的進程中並沒有處理Binder Bp端和Bn端訊息的線程,那麼該如何建立這兩個線程呢?
在Binder通訊機制中,一個服務如果要使用Binder,就必須做兩件事:
1. 開啟binder裝置;
2. 建立一個looper迴圈,然後等待請求。
在ICameraClient類及其衍生類別Camera和我們調用Camera介面的demo中,都沒有看到做這兩件事的代碼,
看看Camera與ICameraClient有關係的代碼:
class Camera : public CameraBase, public BnCameraClient{}class BnCameraClient : public BnInterface{}class ICameraClient : public IInterface{}
看起來,BnInterface似乎是開啟Binder裝置的
templateclass BnInterface : public INTERFACE, public BBinder{public: virtual sp queryLocalInterface(const String16& _descriptor); virtual const String16& getInterfaceDescriptor() const;protected: virtual IBinder* onAsBinder();};
兌現後變成
class BnInterface : public ICameraClient, public BBinderBBinder, BpBinder, 是不是和BnXXX以及BpXXX對應的呢?找到它定義的地方:BBinder::BBinder() : mExtras(NULL){ //也沒有開啟Binder裝置?}
說明Binder機制在自身初始化過程中,並沒有主動去開啟Binder裝置!
回到Android工程中,去找main_mediaserver中是如何使用Binder的:
圖5 main_systemserver main函數
第一個調用的函數是ProcessState::self(),然後賦值給了proc變數,程式運行完,proc會自動delete內部的內容,所以就自動釋放了先前分配的資源。
ProcessState位置在ProcessState位置在frameworks/native/libs/binder/ProcessState.cpp
sp ProcessState::self(){ if (gProcess != NULL) return gProcess; //第一次進來肯定不走這兒 AutoMutex _l(gProcessMutex); //鎖保護 if (gProcess == NULL) gProcess = new ProcessState; //建立一個ProcessState對象return gProcess; //這裡返回的是指標,但是函數返回的是sp,所以把sp看成是XXX*是可以的}
再來看看ProcessState建構函式
ProcessState::ProcessState() : mDriverFD(open_driver()) //Android很多代碼都是這麼寫的,稍不留神就沒看見這裡調用了一個很重要的函數 , mVMStart(MAP_FAILED)//映射記憶體的起始地址 , mManagesContexts(false) , mBinderContextCheckFunc(NULL) , mBinderContextUserData(NULL) , mThreadPoolStarted(false) , mThreadPoolSeq(1){ if (mDriverFD >= 0) { //BIDNER_VM_SIZE定義為(1*1024*1024) - (4096 *2) 1M-8K mVMStart = mmap(0, BINDER_VM_SIZE, PROT_READ, MAP_PRIVATE | MAP_NORESERVE, mDriverFD, 0);//這個需要你自己去man mmap的用法了,不過大概意思就是 //將fd映射為記憶體,這樣記憶體的memcpy等操作就相當於write/read(fd)了 } ...}
open_driver,就是開啟/dev/binder這個裝置,這個是android在核心中搞的一個專門用於完成進程間通訊而設定的一個虛擬裝置, 就是核心的提供的一個機制.
static int open_driver(){ int fd = open(/dev/binder, O_RDWR);//開啟/dev/binder if (fd >= 0) { .... size_t maxThreads = 15; //通過ioctl方式告訴核心,這個fd支援最大線程數是15個。 result = ioctl(fd, BINDER_SET_MAX_THREADS, &maxThreads); } return fd;}
sp proc(ProcessState::self())這兒應該是開啟Binder裝置的操作.
開啟binder裝置的地方是和進程相關的,一個進程開啟一個就可以了。
那麼,在哪裡進行類似的訊息迴圈looper操作呢?
sp proc(ProcessState::self());
這兒應該是開啟Binder裝置的操作, 那麼
ProcessState::self()->startThreadPool();IPCThreadState::self()->joinThreadPool();
應該是進行類似訊息迴圈的looper的操作啦!
看看startThreadPool:
void ProcessState::startThreadPool(){ ... spawnPooledThread(true);}void ProcessState::spawnPooledThread(bool isMain){ sp t = new PoolThread(isMain);isMain是TRUE //建立線程池,然後run起來,和java的Thread何其像也。 t->run(buf); }PoolThread從Thread類中派生,那麼此時會產生一個線程嗎?看看PoolThread和Thread的構造PoolThread::PoolThread(bool isMain) : mIsMain(isMain){}Thread::Thread(bool canCallJava)//canCallJava預設值是true : mCanCallJava(canCallJava), mThread(thread_id_t(-1)), mLock(Thread::mLock), mStatus(NO_ERROR), mExitPending(false), mRunning(false){}
這個時候還沒有建立線程, 然後調用PoolThread::run,實際調用了基類的run:
status_t Thread::run(const char* name, int32_t priority, size_t stack){ bool res; if (mCanCallJava) { res = createThreadEtc(_threadLoop,//線程函數是_threadLoop this, name, priority, stack, &mThread); }}
終於,在run函數中,建立線程了。從此主線程執行
IPCThreadState::self()->joinThreadPool();
新開的線程執行_threadLoop
int Thread::_threadLoop(void* user){ Thread* const self = static_cast(user); sp strong(self->mHoldSelf); wp weak(strong); self->mHoldSelf.clear(); do { ... if (result && !self->mExitPending) { result = self->threadLoop();哇塞,調用自己的threadLoop } }
我們是PoolThread對象,所以調用PoolThread的threadLoop函數
virtual bool PoolThread ::threadLoop(){ //mIsMain為true。 //而且注意,這是一個新的線程, //所以必然會建立一個新的IPCThreadState對象(記得執行緒區域儲存嗎?TLS),然後 IPCThreadState::self()->joinThreadPool(mIsMain); return false;}
主線程和背景工作執行緒都調用了joinThreadPool,看看這個幹嘛了!
void IPCThreadState::joinThreadPool(bool isMain){ mOut.writeInt32(isMain ? BC_ENTER_LOOPER : BC_REGISTER_LOOPER); status_t result; do { int32_t cmd; result = talkWithDriver(); result = executeCommand(cmd); } while (result != -ECONNREFUSED && result != -EBADF); mOut.writeInt32(BC_EXIT_LOOPER); talkWithDriver(false);}
有loop了,但是好像是有兩個線程都執行了這個啊!這裡有兩個訊息迴圈.
下面看看executeCommand
status_t IPCThreadState::executeCommand(int32_t cmd){ BBinder* obj; RefBase::weakref_type* refs; status_t result = NO_ERROR; case BR_TRANSACTION: { binder_transaction_data tr; result = mIn.read(&tr, sizeof(tr)); //來了一個命令,解析成BR_TRANSACTION,然後讀取後續的資訊 Parcel reply; if (tr.target.ptr) { //這裡用的是BBinder。 sp b((BBinder*)tr.cookie); const status_t error = b->transact(tr.code, buffer, &reply, 0); } }}
讓我們看看BBinder的transact函數幹嘛了
status_t BBinder::transact( uint32_t code, const Parcel& data, Parcel* reply, uint32_t flags){ //調用自己的onTransact函數 err = onTransact(code, data, reply, flags); return err;}
BnCameraClient從BBinder派生,所以會調用到它的onTransact函數
然後BnCameraClient的onTransact函數收取命令,然後派發到衍生類別Camera的函數,由它完成實際的工作。
從上面的分析來看,我的demo程式進程中需要調用
sp proc(ProcessState::self());
來開啟Binder裝置,還需要調用
ProcessState::self()->startThreadPool();IPCThreadState::self()->joinThreadPool();
來建立Binder訊息處理線程,該線程為一個loop迴圈處理線程。
在demo進程中添加這三句後,BpCameraClient果然就可以呼叫到BnCameraClient中去了,在串口中輸入:
Busybox ps –T命令查看當前demo進程中的線程狀態:
3151 0 0:00 /system/bin/screenshot
3152 0 0:00 {Binder_1} /system/bin/screenshot
3154 0 0:00 {Binder_2} /system/bin/screenshot
發現此時screenshot進程確實有三個進程,其中兩個為Binder線程,進一步列印出當前demo進程狀態資訊,發現demo進程中現在有三個線程了:
5 pid: 2994, tid: 2994, name: screenshot >>> /system/bin/screenshot <<<
687 pid: 2994, tid: 2995, name: Binder_1 >>> /system/bin/screenshot <<<
2019 pid: 2994, tid: 2997, name: Binder_2 >>> /system/bin/screenshot <<<
Screenshot為demo主線程,Binder_1堆棧如下:
backtrace:
#00 pc 000000000006104c /system/lib64/libc.so (nanosleep+4)
#01 pc 0000000000037e00 /system/lib64/libc.so (sleep+40)
#02 pc 0000000000002c64 /system/bin/screenshot
#03 pc 0000000000028af0 /system/lib64/libcamera_client.so (android::Camera::dataCallback(int, android::s
#04 pc 000000000002e8dc /system/lib64/libcamera_client.so (android::BnCameraClient::onTransact(unsigned
#05 pc 0000000000021bac /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Pa
#06 pc 000000000002a04c /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+876)
#07 pc 000000000002a22c /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+92)
#08 pc 000000000002a2a0 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+76)
#09 pc 0000000000031bd0 /system/lib64/libbinder.so
#10 pc 00000000000169c0 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+208)
#11 pc 000000000001620c /system/lib64/libutils.so
#12 pc 000000000001f168 /system/lib64/libc.so (__pthread_start(void*)+52)
#13 pc 000000000001b370 /system/lib64/libc.so (__start_thread+16)
Binder_1是BBinder訊息處理線程。
Binder_2堆棧如下:
backtrace:
#00 pc 000000000006173c /system/lib64/libc.so (__ioctl+4)
#01 pc 0000000000088a48 /system/lib64/libc.so (ioctl+100)
#02 pc 00000000000299a4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+164)
#03 pc 000000000002a1e8 /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+24
#04 pc 000000000002a2a0 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+76)
#05 pc 0000000000031bd0 /system/lib64/libbinder.so
#06 pc 00000000000169c0 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+208)
#07 pc 000000000001620c /system/lib64/libutils.so
#08 pc 000000000001f168 /system/lib64/libc.so (__pthread_start(void*)+52)
#09 pc 000000000001b370 /system/lib64/libc.so (__start_thread+16)
它應該就是BpBinder向Binder裝置發送訊息的線程。