The reliability of TCP is generally guaranteed by 3 kinds of methods: 1. Confirmation and retransmission. 2. Flow control. 3. Congestion avoidance. The sliding window used in the flow control makes the TCP sender and receiver speed match, which provides the reliability support for the transmission. This article introduces the sliding window in the approximate code of Linux, the basic knowledge of sliding window has been numerous excellent articles, but also a TCP/IP volume for reference, this article will not repeat. The code is based on Linux 2.6.32.
1. Introduction to Background issues
We know that TCP has a confirmation mechanism, that is, for each byte sent by the sender, the receiver will explicitly confirm (the continuous confirmation is actually confirmed each byte). So think about it, how do you deal with the confirmation process? For example, after sending a certain amount of data, sender stops to wait for receiver to confirm, then continue to send, and then continue to wait ... So there is a problem of transmission efficiency here.
Then think of another problem, if the sending side of the sending speed, and the receiving end of the slow reception, at this time, if the speed can not be coordinated transmission, will lead to receive less time drops, and then heavier retransmission, throw more packets, causing the network avalanche.
The above two problems are basically a sliding window to solve the problem, that is, to improve transmission efficiency and flow control functions.
2. Flow control Simple Process
TCP traffic control is mainly to coordinate the speed of sending and receiving, on the sending side to maintain the send and receive Windows, the same, at the receiving end, the same is true. Before you say a specific operation, say a few related concepts: Send queue, receive queue, retransmission queue, sliding window, send window, congestion window, notification window.
2.1 Conceptual interpretation
- Send queue-We know that the protocol stack is sent to the sending queue when it sends the message, and each open socket maintains a receive queue and a send queue.
- Retransmission queue-When a message is sent from the sending queue, a copy is placed in the retransmission queue, and the retransmission queue is used for retransmission after the timeout timer expires.
- Sliding Window--a sliding window is a range that can be slid. At the time of sending, it is usually the sending window, and when it is received, it refers to the receiving window. Sliding windows are a mechanism for improving transmission efficiency, so there are both in the sending and receiving process.
- Congestion window-Because TCP has congestion avoidance mechanism, so it leads to a congestion window, he is also a limit of the transmission speed of things, when congestion occurs, adjust the sending window size. Usually the sending window is the smaller value of the congestion window and the advertised window.
- Notification window-the Notification window is a window value passed to the sender by the receiving end, indicating how much free space the receiving side has to receive the data. Therefore, the sending side of the sending window is adjusted according to the notification window, so that send and receive speed matching.
2.2 Sending and receiving
When there is data to be sent, the data will be hung in the send queue of the socket, in Linux this queue is using the bidirectional linked list implementation
struct sk_buff_head {/* These two members must be first. */struct sk_buff *next;struct sk_buff *prev;__u32 qlen;spinlock_t lock;};
The
TCP receives the message function is TCP_V4_DO_RCV ()
, when the connection is established, the function to receive processing is tcp_rcv_established ()
, in which the fast The path and slow path are differentiated, and the fast and slow channels are confirmed by the first prediction, which is used to increase TCP processing speed. Usually most messages on the network Go Fast path, see processing details
if (len = = Tcp_header_len) {/* predicted packet is in window by definition. * seq = = rcv_nxt and rcv_wup <= rcv_nxt. * Hence, check seq<=rcv_wup reduces to: */ if (Tcp_header_len = = (sizeof (struct tcphdr) + tcpolen_tstamp_aligned) && tp->rcv_nxt = = tp->rcv_wup) tcp_store_ts_recent (TP); /* We know that such packets is checksummed * on entry. */ Tcp_ack (SK, SKB, 0 ); __KFREE_SKB (SKB); Tcp_data_snd_check (SK); span class= "kw" >return 0 ;}
Then the ACK message processing tcp_ack()
, in this function, will update the sending window, so that the window to move to the right, so that there will be a new message can be sent, the tcp_data_snd_check(sk);
message sent out.
The last call tcp_write_xmit()
, whose function is described below:
This routine writes packets to the network. It advances the
Send_head. This happens as incoming ACKs open up the remote
window for us.
As can be seen, this is really used to send a window after the expansion of the message. It can be seen from here that the TCP message is sent under the window mechanism by the received ACK.
- When there is data to be received, still look at the
tcp_rcv_established()
function, first check whether to meet the fast path, if there is no disorderly order, that is.
if (tp->copied_seq == tp->rcv_nxt && len - tcp_header_len <= tp->ucopy.len) {#ifdef CONFIG_NET_DMA if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) { 1; 1; }#endif if (tp->ucopy.task == current && sock_owned_by_user(sk) && !copied_early) { __set_current_state(TASK_RUNNING); if (!tcp_copy_to_iovec(sk, skb, tcp_header_len)) 1; }
The message is then copied from the kernel state to the user state, if the copy succeeds, the flag is eaten = 1; the next step is to calculate the RTT and update the received serial number.
if (eaten) { /* Predicted packet is in window by definition. * seq == rcv_nxt and rcv_wup <= rcv_nxt. * Hence, check seq<=rcv_wup reduces to: */ if (tcp_header_len == (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) && tp->rcv_nxt == tp->rcv_wup) tcp_store_ts_recent(tp); tcp_rcv_rtt_measure_ts(sk, skb); __skb_pull(skb, tcp_header_len); tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITSTOUSER);}
If there is no successful copy, then put the message in the receiving queue, while updating the RTT, the reason for failure, such as user message data length than the amount of user space cache is large.
if(!eaten) {if(Tcp_checksum_complete_user (SK, SKB))GotoCsum_error;/ * Predicted packet is in window by definition.* seq = = rcv_nxt and rcv_wup <= rcv_nxt.* Hence, check seq<=rcv_wup reduces to: */ if(Tcp_header_len = = (sizeof(structTCPHDR) + tcpolen_tstamp_aligned) && tp->rcv_nxt = = tp->rcv_wup) tcp_store_ts_recent (TP); Tcp_rcv_rtt_measure_ts (SK, SKB);if((int) skb->truesize > Sk->sk_forward_alloc)GotoSTEP5; NET_INC_STATS_BH (Sock_net (SK), linux_mib_tcphphits);/ * Bulk Data transfer:receiver * /__skb_pull (SKB, Tcp_header_len); __skb_queue_tail (&sk->sk_receive_queue, SKB); Skb_set_owner_r (SKB, SK); TP->RCV_NXT = TCP_SKB_CB (SKB)->end_seq;}
Finally, check if you need to send an ACK or a sack message.
if (!copied_early || tp->rcv_nxt != tp->rcv_wup) 0);
Another is if there is a disorderly order, such as the fast path does not meet the conditions, then take the slow path, in the slow path will have the message into the Chaos sequence queue, and so on, the specific not table.
tcp_data_queue(sk, skb);
After that, the user process reads the message from the receiving queue through the recv read operation, where tcp_recvmsg
it can see the process:
Skb_queue_walk (&sk->sk_receive_queue, SKB) {/* Now, we have both receive queues this* shouldn ' t happen. */ if(WARN (Before (*seq, TCP_SKB_CB (SKB)->seq), Kern_info"recvmsg bug:copied%x" "seq%x rcvnxt%x fl%x\ n", *seq, TCP_SKB_CB (SKB)->seq, TP->RCV_NXT, flags)) Break; offset = *SEQ-TCP_SKB_CB (SKB)->seq;if(Tcp_hdr (SKB)->syn) offset--;if(Offset < Skb->len)GotoFOUND_OK_SKB;if(Tcp_hdr (SKB)->fin)GotoFOUND_FIN_OK; WARN (! ( Flags & Msg_peek), Kern_info"Recvmsg Bug 2:" "copied%x seq%x rcvnxt%x fl%x\ n", *seq, TCP_SKB_CB (SKB)->seq, TP->RCV_NXT, flags);}
3. Summary
TCP's sliding window traffic control is achieved by coordinating the sender and receiver speed, specifically, the sender window is the ACK driver back by the receiver, that is, the sender to continue to send packets need to receive the ACK continuously. On the other side, the receiver sends an ACK to respond to the message after it has been read, and loops to receive it. This process achieves flow control and improves transmission efficiency by driving the window's sustainable sliding.
Linux TCP sliding Window code Brief