6. Binder memory ing and receiving cache zone management
For the time being, let's take a look at how data arrives at the receiving end from the sending end in the traditional IPC method? Generally, the sender stores the prepared data in the cache and calls the API to enter the kernel through system calls. The kernel service program allocates memory in the kernel space and copies data from the sender's cache to the kernel cache. When the receiver reads data, it must also provide a cache area. The kernel copies the data from the kernel cache area to the cache area provided by the receiver and wakes up the receiving thread to complete data sending.
This storage-forwarding mechanism has two drawbacks: low efficiency and two copies are required: user space-> kernel space-> user space. In Linux, copy_from_user () and copy_to_user () are used for cross-space copying. If high memory (high memory) is used during this process, the page ing needs to be created or canceled temporarily, performance Loss. Second, the cache for receiving data should be provided by the receiver, but the receiver does not know how much cache is required, you can only open up as much space as possible or call the API to receive the message header to obtain the message body size, and then open up an appropriate space to receive the message body. Both methods are insufficient, either a waste of space or time.
The binder adopts a new policy: the binder driver is responsible for managing the data receiving cache. We have noticed that the binder driver implements MMAP () system call, which is special for character devices, because MMAP () is usually used in a file system with physical storage media, there is no physical media such as the binder, and the character devices that are purely used for communication do not need to support MMAP (). The binder driver is not used to map physical media to user space, but to create a cache space for data reception. Let's first look at how MMAP () is used:
FD = open ("/dev/binder", o_rdwr );
MMAP (null, map_size, prot_read, map_private, FD, 0 );
In this way, the receiver of the binder has a receiving cache with a size of map_size. The returned value of MMAP () is the address mapped to the memory in the user space. However, this space is managed by the driver and cannot be accessed directly without the need (the ing type is prot_read, read-only ing ).
After the receiving cache area is mapped, it can be used as a cache pool to receive and store data. As mentioned above, the structure of the received data packet is binder_transaction_data, but this is only the message header, and the actual effective load is in the memory that data. Buffer points. This memory does not need to be provided by the receiver. It is precisely from the cache pool mapped by MMAP. When data is copied from the recipient in the sending direction, the driver finds a suitable size space from the cache pool using the best matching algorithm based on the size of the sent data packet and copies the data from the sending cache. Note that
The binder_transaction_data structure and the memory space of all messages in Table 4 must be provided by the receiver. However, the data size is fixed and the number is small, which will not cause inconvenience to the receiver. The mapped cache pool must be large enough, because the thread pool of the receiver may process multiple concurrent interactions at the same time, and each interaction must obtain the destination storage zone from the cache pool, once the cache pool is exhausted, unexpected consequences will occur.
Distribution is bound to release. After the receiver completes data packet processing, it will notify the driver to release the memory zone pointed to by data. buffer. As mentioned in the introduction to the binder protocol, this is done by the command bc_free_buffer.
From the above introduction, we can see that the driver shares the most tedious tasks for the receiver: allocation/release sizes, which are difficult to predict the effective load cache zone, the receiver only needs to provide the cache to store the size of the fixed message header that can be predicted. In terms of efficiency, the memory allocated by MMAP () is mapped to the user space of the receiver, therefore, the overall effect is equivalent to performing a direct data copy from the sender's user space to the receiver's user space, saving the need to store data in the kernel and doubling the performance. By the way, the Linux kernel does not actually directly copy functions from one user space to another. You must first copy them to the kernel space using copy_from_user () and then copy them to another user space using copy_to_user. In order to copy the user space to the user space, the memory allocated by MMAP () is not only mapped to the receiver process, but also to the kernel space. Therefore, calling copy_from_user () to copy data to the kernel space is equivalent to copying the data to the receiver's user space. This is the 'secret 'That the binder only needs to copy at a time '.
7. Binder receiving thread management
Binder communication is actually the communication between threads in different processes. If process S is a server, it provides the binder entity. Thread T1 sends a request to process s through reference of binder in client process C1. S to process this request, thread T2 needs to be started, and thread T1 is in the waiting state for receiving returned data. After processing the T2 request, the processing result is returned to T1, and T1 is awakened to obtain the processing result. In this process, T2 seems to be the agent of T1 in process s, representing T1 to execute a remote task, and T1 is like executing a piece of code in S and returning to C1. To make this traversal more realistic, the driver will assign some properties of T1 to T2, especially the nice priority of T1, so that T2 will complete the task at a time similar to that of T1. Many materials use 'thread migration' to describe this phenomenon, which is easy to misunderstand. A thread cannot jump between processes at all. Besides T1, T2. T2. there are no similarities with other tasks, including identity, file opening, stack size, signal processing, and private data.
For server process s, many clients may initiate requests at the same time. To improve efficiency, the thread pool is often used to concurrently process received requests. How to Use the thread pool for concurrent processing? This is related to the specific IPC Mechanism. For example, the socket on the server is set to the listening mode. A special thread uses this socket to listen for connection requests from the client, that is, blocking on accept. This socket is like a chicken that generates eggs. Once you receive a request from the client, an egg is generated-create a new socket and return it from accept. The listening thread starts a working thread from the thread pool and delivers the egg to the thread. This thread completes subsequent business processing and
The client implements interaction.
But for the binder, there is no listening mode and it won't get down. How to manage the thread pool? One simple way is to create a heap of threads no matter how many threads are created. Each thread uses the binder_write_read command to read the binder. These threads will be blocked in the waiting queue where the driver is the binder. Once a data driver from the client is located, a thread will be awakened from the queue for processing. This is simple and intuitive, saving the thread pool, but it is a waste of resources to create a bunch of threads at the beginning. Therefore, the binder Protocol sets special commands or messages to help users manage threads.
Pool, including:
· Binder_set_max_threads
· Bc_register_loop
· Bc_enter_loop
· Bc_exit_loop
· Br_spawn_low.gov.cn
To manage the thread pool, you must know the size of the pool. The application uses binder_set_max_threads to tell the driver that a maximum of several threads can be created. In the future, each thread is being created and enters the main loop. When exiting the main loop, bc_register_loop, bc_enter_loop, and bc_exit_loop must be used to inform the driver so that the driver can collect and record the status of the current thread pool. When the driver receives the data packet and returns the thread reading the binder, check whether there are no Idle threads. If yes, and the total number of threads does not exceed the maximum number of threads in the thread pool, a br_spawn_lon_message will be appended to the data packet currently read, telling the user that the thread is about to be insufficient. Please start some more, otherwise, the next request may fail to respond in a timely manner. When a new thread is started, the driver update status is notified through bc_xxx_loop. In this way, as long as the threads are not exhausted, there are always Idle threads waiting in the queue to be on standby and process requests in a timely manner.
The binder driver also made a small Optimization on the startup of the working thread. When thread T1 of process P1 sends a request to process P2, the driver first checks whether thread T1 is also processing a request from a thread of P2 but has not yet completed (no response is sent ). This usually happens when both processes have binder entities and send requests to each other. If the driver finds such a thread in process P2, for example, T2, it will require T2. Since T2 has sent a request to T1 and has not yet received a returned packet, it indicates that T2 is certainly (or will) blocked in the status of reading the returned packet. At this time, we can let T2 do something by the way, which is better than waiting there for idle time. In addition, if T2 is not a thread in the thread pool, some work can be shared for the thread pool to reduce the usage of the thread pool.
8. Data Packet receiving queue and (thread) waiting queue management
Generally, the receiving end of data transmission has two queues: the packet receiving queue and the (thread) waiting queue to ease the contradiction between supply and demand. When there are too many incoming goods (data packets) in the supermarket, the goods will pile up in the warehouse; when there are too many shopping people (threads), the goods will wait in line at the cashier, the truth is the same. In the driver, each process has a global receiving queue, also known as the to-do queue, which stores data packets not sent to a specific thread. Correspondingly, there is a global waiting queue, all threads waiting to receive data from the global receiving queue in this queue. Each thread has its own private to-do queue, which stores the data packets sent to this thread. Each thread has its own private waiting queue, this thread is used to wait for the thread to receive data from its own to-do queue. Although the name is queue, in fact, there is only one thread in the thread private waiting queue, that is, it itself.
Since there is no special mark during sending, how does the driver determine which packets should be sent to the global To-Do queue and which packets should be sent to the to-do queue of a specific thread? There are two rules.
Rule 1: All request packets sent from the client to the server are submitted to the Global to-do queue of the server process. However, there is a special case, that is, the optimization of the binder on the worker thread startup mentioned in the previous section. After optimization, requests from T1 are not submitted to the Global to-do queue of P2, but are sent to the private to-do queue of T2.
Rule 2: The returned data packets (packets sent by bc_reply) of the synchronous request are all sent to the private to-do queue of the thread initiating the request. In the preceding example, if thread T1 of process P1 sends a synchronous request to thread T2 of process P2, then, the data packets returned by T2 are sent to the private to-do queue of T1 instead of the global To-Do queue of P1.
The rule that the thread enters the waiting queue determines the rule that the thread enters the waiting queue. That is, if a thread does not receive the returned data packet, it should wait for the new task in the global waiting queue, otherwise, you should wait for the server's returned data in its private wait queue. In the above example, T1 must wait in its private waiting queue after sending a synchronous request to T2, instead of queuing in P1's global waiting queue, otherwise, data packets returned by T2.
These hidden rules are the constraints imposed by the driver on both sides of the binder communication. They are reflected in the thread consistency during synchronous request interaction in applications: 1) Client Side, the thread waiting for the returned package must be the thread sending the request, but cannot send the request package by one thread. The other thread waits for the receiving package; otherwise, the returned package will not be received; 2) server side, the thread that sends the corresponding returned data packet must be the thread that receives the request data packet. Otherwise, the returned data packet cannot be sent to the thread that sends the request. This is because the purpose of the returned data packet is not specified by the user, but is recorded in the thread of receiving the request data packet, if the thread sending the return package is not the thread driver that receives the request package, there is no way to know where the return package will be sent.
Next we will discuss how the binder driver submits synchronous and asynchronous interactions. We know that the difference between synchronous interaction and Asynchronous interaction is that the sending (client) End of synchronous interaction needs to wait for the response packet from the receiving (server) end after sending the request packet, the sender of the Asynchronous interaction end the interaction after sending the request data packet. For these two types of interactive request data packets, the driver can discard all the requests to-do queues at the receiving end regardless of November 21. However, the driver does not throttling Asynchronous interaction to make way for Synchronous interaction. The specific method is as follows: As long as one Asynchronous interaction is not completed, for example, if it is being processed by a thread or queuing in any of the to-do queues, the Asynchronous interaction package sent to the object will not be delivered to the to-do queue, it is blocked in the Asynchronous interaction receiving Queue (async_todo domain of the binder node) opened for the object. However, during this period, synchronous interaction is still unrestricted and directly enters the to-do queue for processing. After the Asynchronous interaction process is completed, the next Asynchronous interaction party can leave the Asynchronous interaction queue to enter the to-do queue. The reason for this is that the request end of synchronous interaction needs to wait for the returned packet and must be processed quickly to avoid affecting the response speed of the request end. Asynchronous interaction belongs to 'no matter after transmitting ', A slight delay does not block other threads. Therefore, using a dedicated queue
Multiple asynchronous interactions are temporarily saved to avoid a large number of asynchronous interactions occupying the processing capacity of the server or consuming the threads in the thread pool, thus blocking synchronous interaction.
9 Summary
The binder uses the client-server communication method, which is safe, simple, and efficient. In addition, it is coupled with its object-oriented design philosophy and a unique method of receiving Cache Management and thread pool management, it has become the mainstay of communication between Android processes.
Author: universus