High-performance Network programming 3----receiving TCP messages

Source: Internet
Author: User
Tags socket
This article will attempt to show how the application receives a TCP message stream sent over the network, temporarily ignoring the reply to the ACK message and sliding the receive window due to space constraints. In order to quickly grasp the ideas in this article, we can take the following questions to read: 1, the application calls read, Recv and other methods, socket sockets can be set to block or non-blocking, how the two ways to work. 2, if the socket is the default blocking socket, at this time the Recv method passed the Len parameter, is to indicate that must time-out (So_rcvtimeo) or receive the Len length of the message, recv method will return it. Moreover, the socket can be set to a property called So_rcvlowat, which will be the intersection with Len, but also determine when the recv and other receive methods to return. 3, the application began to receive TCP messages, and the program is located on the machine card on the network received a TCP message sent, this is two independent process. How they interact with one another. For example, when an application is receiving a message, the kernel is handling the message on the TCP connection through the NIC. If the application does not call read or recv, the kernel receives a message on the TCP connection and then how to handle it. 4, recv such a receiving method can also be passed in various flags, such as Msg_waitall, Msg_peek, Msg_trunk and so on. how they work. 5, 1 socket sockets may be used by more than one process, the kernel is how to handle this situation when there is concurrent access. 6, Linux sysctl system parameters, there are similar to tcp_low_latency such a switch, the default is 0 or configured to 1 o'clock how to affect the TCP message processing process.

The book is connected to the above. In this paper, we will describe three typical TCP message scenarios with three images, and clarify the 4 queue containers implemented by the kernel to implement TCP message reception. Of course, understanding the implementation of the kernel is not an end, but how to use the socket interface, how to configure the operating system kernel parameters, to make TCP transport messages more efficient, this is the ultimate goal.
Many students do not want to be distracted by the kernel code thinking, how to read this article. I will explain the main methods such as TCP_V4_RCV from the code by explaining the steps in Figure 1. What is the effect of a non-blocking socket like the flags parameter, I said in the code introduction. Then I'll introduce the steps in Figure 2, Figure 3, and I'll be interspersed with a few small pieces of code that are not covered above. Do not like to know the kernel code of the students, please go straight through the steps of Figure 1, please skip to Figure 2, Figure 3, I think the 3 images covering the main TCP receive scenario, can help you clarify its process.
The system method that is called when the message is received is much more complex than the previous one sending the TCP message. The process of receiving a TCP message can be divided into split: first, the network card on the PC receives a message from the network cable, gets and resolves it as a TCP message through the soft interrupt kernel, and then the TCP module decides how to handle the TCP message. Second, the user process calls read, recv, and other methods to obtain the TCP message, is the kernel has been received from the network card message flow to the memory of the user process.
The first picture depicts the scenario is that the TCP connection will receive the message sequence number is S1 (TCP each message has a serial number, see the TCP/IP protocol in detail), at this time the operating system kernel received the sequence number S1-S2 the message, S3-S4, S2-S3 message, note that after the two packets disorderly sequence. The user process then allocates a Len-sized memory to receive the TCP message, at which time Len is greater than s4-s1. Additionally, the user process has never set the So_rcvlowat parameter to the socket, so the receive threshold So_rcvlowat uses the default value of 1. In addition, the system parameter tcp_low_latency is set to 0, that is, from the overall efficiency of the operating system, using the Prequeue queue to increase throughput. Of course, the Prequeue queue in this diagram is always empty because the user process is receiving messages and no new packages are coming. The table is not fine. Figure 1 below:
In the figure above, there are 13 steps, the application process uses a blocking socket, the flag flag bit is 0 when the Recv method is called, and the user process does not have process sleep when it reads the socket. The kernel uses 4 queue containers when processing received TCP messages (when the list is understood), receive, Out_of_order, Prequeue, and backlog queues, and this article explains the meaning of their existence. These 13 steps are described in detail below. 1. When the NIC receives the message and is judged to be the TCP protocol, the TCP_V4_RCV method of the kernel is called. At this point, the TCP connection needs to receive the next message sequence is exactly S1, and this step, the network card received a S1-S2 message, so the TCP_V4_RCV method will be inserted directly into the receive queue. Note: The receive queue is a queue that allows the user process to read directly, it is a TCP message that has been received, the TCP header is removed, the queued sequence is placed, and the user process can read directly sequentially. Because the socket is not in the process context (that is, no process is reading the socket), because we need to S1 the serial number of the message, and just received the S1-S2 message, so it entered the receive queue.
2. Next, we received the S3-S4 message. At the end of the 1th step, we need to receive the S2 serial number, but the arrival of the message is S3, how to do it. Enter the Out_of_order queue. As you can see from this queue name, all the chaotic messages are temporarily placed here.
3, still did not enter to read the socket, but came over we expect the S2-S3 message, it will be as the 1th step, directly into the receive queue. At different time, because the Out_of_order queue is not as empty as the 1th step, the 4th step is raised.
4. The Out_of_order queue is checked every time a message is inserted into the receive queue. Due to the receipt of the S2-S3 message, the expected serial number becomes S3, so that the only message S3-S4 messages in the Out_of_order queue will be moved out of this queue and inserted into the receive queue (this is done by the Tcp_ofo_queue method).
5. Finally, the user process begins to read the socket. The students who have done the application-side programming know that first to allocate a piece of memory in the process, and then call read or Recv method, the first address of the memory and memory length, and then set up a good connection socket also passed in. Of course, you can also configure its properties for this socket. Here, assuming that no property is set, the default value is used, so the socket is blocked at this point, and its so_rcvlowat is the default of 1. Of course, recv such a method will also receive a flag parameter, it can be set to Msg_waitall, Msg_peek, Msg_trunk and so on, here we assume the most common 0. The Recv method was called by the process.
6, regardless of the interface, C library and the kernel layer-by-layer encapsulation, receive TCP messages will eventually go to the Tcp_recvmsg method. The following describes the code details when it will be the focus.
7, in the Tcp_recvmsg method, will first lock the socket. Why is it. Therefore, the socket can be used by multiple processes at the same time, the kernel interrupt will also operate it, and the following code is the core, operation data, stateful code, can not be re-entered, locked, and then the user process comes in without the lock will be dormant in this. The kernel interrupts are also handled differently when seen locked, see Figure 2, Figure 3.
8. At this time, the 第1-4 step has prepared 3 messages for the receive queue. The topmost message is S1-S2, which is copied to the user-State memory. Since the 5th step flag parameter does not carry a flag such as Msg_peek, the S1-S2 message is removed from the header of the receive queue and released from the kernel state. Conversely, the MSG_PEEK flag bit will cause the receive queue not to delete the message. Therefore, Msg_peek is mainly used for multiple processes to read the same set of sockets.
9, such as the 8th step, copy the S2-S3 message to the user state memory. Of course, before performing a copy, the user-state memory is checked to see if the remaining space is sufficient to drop the current message, not enough to directly return the number of bytes already copied. 10, ibid.
11, receive queue is empty, this time will first check So_rcvlowat this threshold. If the number of bytes already copied is less than it is now, it may cause the process to hibernate and wait for more data to be copied. The 5th step has been explained, the socket socket uses the default So_rcvlowat, which is 1, which indicates that as long as the message is read, it is considered to be able to return. Finish this check and check the backlog queue again. The backlog queue is the message that the network card receives when the process is copying data into the queue. At this point, if the backlog queue has data, it is handled by the passing. Figure 3 Covers this scenario.
12, in this diagram corresponding to the scene, the backlog queue is no data, the number of bytes has been copied is s4-s1, it is greater than 1, so release the 7th step in Riga Lock, ready to return to the user state.
13, the user process code to start execution, at this time recv and other methods return is S4-S1, that is, the number of bytes copied from the kernel.

The scenario depicted in Figure 1 is the simplest of 1 scenarios, let's take a look at how the above steps are implemented through kernel code (the following code is 2.6.18 kernel code).

We know that the processing of interrupts in Linux is divided into the upper half and the lower part, which is the overall efficiency of the system. All we're going to introduce is in the lower half of the network soft interrupt, such as this TCP_V4_RCV method. The 第1-4 step in Figure 1 is done in this way. [CPP]  View Plain  copy   INT TCP_V4_RCV (STRUCT SK_BUFF *SKB)    {           ... ...       // Whether a process is using this socket will have an impact on the processing process            //or from the code level, as long as the TCP_ Recvmsg, Lock_sock can only go to else after execution, and Release_sock will enter if       if  (!sock_owned_by_ User (SK))  {           {                //when  tcp_prequeue  returns 0 o'clock, indicating that this function does not process the message                 if  (!tcp_prequeue (sk,  SKB)//If the message is placed in the Prequeue queue, that means deferred processing, not taking a soft interrupt for too long                     RET = TCP_V4_DO_RCV (SK, SKB);// If you do not use Prequeue or no user process to read the socket (Figure 3 goes to this branch), start processing the message immediately &NBSp;          }       } else            sk_add_backlog (SK, SKB);//If the process is working on a socket, Insert the TCP message that SKB points to the backlog queue (Figure 3 covers this branch)            ...    }  
In the 1th step of Figure 1, we received a packet with the serial number S1-S2 from the network. At this point, no user process is reading the socket, so Sock_owned_by_user (SK) returns 0. Therefore, the Tcp_prequeue method will be executed. Simply look at it: [CPP]  View Plain  copy   Static inline int tcp_prequeue (struct sock *sk,  STRUCT SK_BUFF *SKB)    {       STRUCT TCP_SOCK *TP  = tcp_sk (SK);          //check tcp_low_latency, default is 0, Represents the use of the Prequeue queue. Tp->ucopy.task is not 0, which indicates that a process has initiated the process of copying TCP messages        if  (!sysctl_tcp_low_latency  && tp->ucopy.task)  {           // Here, it is usually the user process reading the data without reading the specified size of data, hibernate. Insert the message directly at the end of the Prequeue queue, deferred processing            __skb_queue_tail (&tp- >UCOPY.PREQUEUE, SKB);           tp->ucopy.memory  += skb->truesize;           //of course, although it is usually deferred processing, However, if the TCP receive buffer is not enough, all messages in the Prequeue queue will be processed immediately         &NBsp;  if  (TP->UCOPY.MEMORY > SK->SK_RCVBUF)  {  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.