TCP Sk_backlog (fallback queue analysis)

Source: Internet
Author: User

Three receive queues

TCP protocol stack data receive three receive caches are Prequeue, Sk_write_queue, Sk_backlog respectively.

The reason why you need three receive caches is as follows:

When the TCP stack receives a packet, the struct sock *sk may be occupied by the process below or by the interrupt context:

1, if in the process context sk_lock.owned=1, soft interrupt because of sk_lock.owned=1, so the data can only temporarily exist in the fallback queue (backlog), when the process context logic processing is complete callback Tcp_v4_do_ RCV Processing backlog queue as compensation, see the implementation of tcp_sendmsg function release_sock in detail.

2, if currently in the interrupt context, sk_lock.owned=0, then the data may be placed in the Receive_queue or Prequeue, the data first placed in the Prequeue, if the prequeue is full, it will be placed in Receive_ In the queue, there is a line in theory, but why is the TCP stack designed to be two? Actually, it's for the quick end of the soft interrupt processing process, the soft interrupt handler prevents process preemption and other soft interrupts from occurring, the efficiency should be very low, if the data is placed in the prequeue, then the soft interrupt process will soon end, if placed in Receive_ The queue then has very complex logic to deal with. The processing of the Receive_queue queue in a soft interrupt, the processing of the Prequeue queue is in the context of the process. The overall purpose is to improve the efficiency of the TCP protocol stack.

Processing logic for the fallback queue 1, when to use the fallback queue

TCP stack to struct sock *sk has two locks, the first is Sk_lock.slock, the second is sk_lock.owned. Sk_lock.slock is used to get modify permissions for members of a struct sock *sk object, sk_lock.owned is used to differentiate between the current process context or the soft interrupt context, sk_lock.owned is set to 1 for the process context, and the interrupt context is 0.

If it is to change the SK, the first is to take the lock Sk_lock.slock, followed by the judgment is currently a soft interrupt or process context, if it is a process context, then the received SKB can only be placed in the fallback queue Sk_backlog. If it is a soft interrupt context, it can be placed in Prequeue and Sk_write_queue.

The code snippet is as follows:

Bh_lock_sock_nested (SK);

ret = 0;

if (!sock_owned_by_user (SK)) {

#ifdef CONFIG_NET_DMA

struct Tcp_sock *TP = Tcp_sk (SK);

if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)

Tp->ucopy.dma_chan = Dma_find_channel (dma_memcpy);

if (Tp->ucopy.dma_chan)

ret = TCP_V4_DO_RCV (SK, SKB);

Else

#endif

{

if (!tcp_prequeue (SK, SKB))

ret = TCP_V4_DO_RCV (SK, SKB);

}

} else if (Unlikely (Sk_add_backlog (SK, SKB,

Sk->sk_rcvbuf + sk->sk_sndbuf)) {

Bh_unlock_sock (SK);

NET_INC_STATS_BH (NET, linux_mib_tcpbacklogdrop);

Goto Discard_and_relse;

}

Bh_unlock_sock (SK);

Bh_lock_sock_nested (SK);

Get the first lock.

if (!sock_owned_by_user (SK))

Determines whether the second lock is in a process context or a soft interrupt context.

if (!tcp_prequeue (SK, SKB))

ret = TCP_V4_DO_RCV (SK, SKB);

If the interrupt context is in place, the priority is placed in the Prequeue, if the Prequeue full is placed in Sk_write_queue.

} else if (Unlikely (Sk_add_backlog (SK, SKB,

Sk->sk_rcvbuf + sk->sk_sndbuf)) {

If it is in the process context, it is placed directly into the fallback queue (Sk_backlog).

2, SKB How to add to Sk_backlog

The Sk_add_backlog function is used for add SBK to Sk_backlog, so let's analyze the secondary function below.

/* The Per-socket spinlock must is held here. */

static inline __must_check int Sk_add_backlog (struct sock *sk, struct Sk_buff *skb,

unsigned int limit)

{

if (Sk_rcvqueues_full (SK, SKB, limit))

Return-enobufs;

__sk_add_backlog (SK, SKB);

sk_extended (SK)->sk_backlog.len + = skb->truesize;

return 0;

}

if (Sk_rcvqueues_full (SK, SKB, limit))

Return-enobufs;

To determine if the receive cache has been exhausted, it is clear that the Sk_backlog cache size is also counted in the total receive cache.

__sk_add_backlog (SK, SKB);

Add the SKB to the Sk_backlog queue.

sk_extended (SK)->sk_backlog.len + = skb->truesize;

Update the amount of data that is already mounted in the sk_backlog.

/* OOB Backlog Add */

static inline void __sk_add_backlog (struct sock *sk, struct Sk_buff *skb)

{

if (!sk->sk_backlog.tail) {

Sk->sk_backlog.head = Sk->sk_backlog.tail = SKB;

} else {

Sk->sk_backlog.tail->next = SKB;

Sk->sk_backlog.tail = SKB;

}

Skb->next = NULL;

}

if (!sk->sk_backlog.tail) {

Sk->sk_backlog.head = Sk->sk_backlog.tail = SKB;

If the current sk_backlog is null, both head and tail point to SKB.

} else {

Sk->sk_backlog.tail->next = SKB;

Sk->sk_backlog.tail = SKB;

The branch indicates that there is already data in the Sk_backlog, then the SKB is hung directly at the end of the tail, after which the tail pointer moves to SKB.

Skb->next = NULL;

This is important for sk_backlog processing to determine whether the SKB has been processed.

3. Treatment of SKB in Sk_backlog

It is obvious that sk_backlog processing is bound to the process context, for data reception, into the context of the interface is tcp_recvmmsg, so sk_backlog must be processed in the tcp_recvmmsg.

The code processing fragments of Tcp_recvmmsg Sk_backlog are as follows:

Tcp_cleanup_rbuf (sk, copied);

Tcp_check_timer (SK);

Release_sock (SK);

Release_sock (SK) involves sk_backlog processing.

void Release_sock (struct sock *sk)

{

/*

* The Sk_lock has Mutex_unlock () semantics:

*/

Mutex_release (&sk->sk_lock.dep_map, 1, _RET_IP_);

SPIN_LOCK_BH (&sk->sk_lock.slock);

if (Sk->sk_backlog.tail)

__release_sock (SK);

if (Proto_has_rhel_ext (Sk->sk_prot, RHEL_PROTO_HAS_RELEASE_CB) &&

SK->SK_PROT->RELEASE_CB)

SK->SK_PROT->RELEASE_CB (SK);

sk->sk_lock.owned = 0;

if (waitqueue_active (&SK->SK_LOCK.WQ))

WAKE_UP (&SK->SK_LOCK.WQ);

SPIN_UNLOCK_BH (&sk->sk_lock.slock);

}

SPIN_LOCK_BH (&sk->sk_lock.slock);

Get the first lock.

if (Sk->sk_backlog.tail)

__release_sock (SK);

If the fallback queue is not null, processing begins.

sk->sk_lock.owned = 0;

Into the context SKB finished processing, release the second lock.

SPIN_UNLOCK_BH (&sk->sk_lock.slock);

Release the first lock.

__release_sock (SK) is the real handler for the fallback queue.

static void __release_sock (struct sock *sk)

{

struct Sk_buff *SKB = sk->sk_backlog.head;

do {

Sk->sk_backlog.head = Sk->sk_backlog.tail = NULL;

Bh_unlock_sock (SK);

do {

struct Sk_buff *next = skb->next;

Skb->next = NULL;

SK_BACKLOG_RCV (SK, SKB);

/*

* We is in the process context here with Softirqs

* Disabled, use COND_RESCHED_SOFTIRQ () to preempt.

* This was safe to do because we ' ve taken the backlog

* Queue private:

*/

COND_RESCHED_SOFTIRQ ();

SKB = Next;

} while (SKB! = NULL);

Bh_lock_sock (SK);

} while ((SKB = sk->sk_backlog.head) = NULL);

/*

* Doing The zeroing here guarantee we can not loop forever

* While a wild producer attempts to flood us.

*/

sk_extended (SK)->sk_backlog.len = 0;

}

Sk->sk_backlog.head = Sk->sk_backlog.tail = NULL;

Bh_unlock_sock (SK);

Resets sk->sk_backlog.head, Sk->sk_backlog.tail is null. Sk_backlog is a doubly linked list, head points to the SKB of the list head, and tail points to the skb of the tail of the list. The reason for null head and tail is that the struct sk_buff *SKB = Sk->sk_backlog.head takes the SKB of head in advance, and then skb->next to get the next SKB processing. The end of the condition is skb->next=null, this is set in the __sk_add_backlog function, it is said that for sk_backlog processing head and tail pointers are no longer used.

Why put Nullsk->sk_backlog.head, sk->sk_backlog.tail? The first idea is that it may have to be reused. So under what circumstances will it be reused? Imagine that the current is in the process context, and sk->sk_lock.slock is not locked, is it possible to be interrupted by a soft interrupt? If you are interrupted by a soft interrupt is not to receive data, TCP stack for efficiency consideration is definitely to receive data, the previous analysis of this situation must be placed in the backup queue (Sk_backlog), so you can definitely set NULL Sk->sk_backlog.head, Sk->sk_backlog.tail is to be able to reuse sk_backlog when dealing with a sk_backlog, to create a new sk_backlog, perhaps someone would ask why not directly add to the original Sk_backlog tail end? I did not think too clearly about this question, perhaps the synchronization is not good to do it.

SK_BACKLOG_RCV (SK, SKB);

The SKB handler function, in fact, is called the TCP_V4_DO_RCV function.

} while (SKB! = NULL);

If Skb=null, then the previous sk_backlog has been processed.

} while ((SKB = sk->sk_backlog.head) = NULL);

When dealing with a previous sk_backlog, a soft interrupt may be interrupted, a new sk_backlog is created, and the newly created sk_backlog will be processed together.

4. Where is SKB handled?

It is clear that the received data will eventually be passed to the application layer, before it is passed to the application layer to ensure that the data in the three receive queue is orderly, then how do these three queues ensure that the data byte stream is delivered to the application layer in an orderly manner? If a careful analysis of the TCP_V4_DO_RCV function can be found, this function ensures that the data is arranged in an orderly manner, so whether it is processing sk_backlog or prequeue, the TCP_V4_DO_RCV function is eventually called to insert the data into the sk_write effectively. _queue, and finally taken away by the application layer.

TCP Sk_backlog (fallback queue analysis)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.