The backlog in Linux is detailed

Source: Internet
Author: User
Tags 04x ack goto

Speaking of the backlog, you will remember the listen backlog parameter in socket programming, which is the backlog that is processed in the Linux kernel?

int listen (int sockfd, int backlog)
Mans Listen

You can see an explanation about listen.

THE  backlog  argument defines The maximum length to which the queue of pending connections for  SOCKFD  may grow. If a connection request arrives when the queue was full, the client could receive an error with an indication of  econnrefused  or, if the underlying protocol supports retransmission, the request may be ignored so, a later R Eattempt at connection succeeds.

In fact, after the Linux kernel 2.2 version, the backlog parameter controls the size of the accept queue that has been successfully shook.

In Linux we can simply assume that there will be 2 queue in the handshake process, we first look at the Linux defined 2 structure

struct Request_sock_queue {/*points to the Request_sock accept queue if after 3 handshake would add the Request_sock fro M syn_table to here*/    struct request_sock    *rskq_accept_head;    struct Request_sock    *rskq_accept_tail;    rwlock_t        Syn_wait_lock;    U8            rskq_defer_accept;    /* 3 bytes hole, try to pack */    struct listen_sock    *listen_opt;};  struct Listen_sock {    U8            max_qlen_log;/*2^max_qlen_log is the length of the Accpet queue, max of Max_qlen_log is (2^10=1024) *//    * 3 bytes hole, try to use */    int qlen;/* Qlen are the current length of the            accpet que ue*/    int            Qlen_young;    int            Clock_hand;    U32            hash_rnd;    U32 nr_table_entries;/*nr_table_entries is the number of the            Syn_table,max is 512*/    struct Request_sock    *syn_table[0];};

In the struct, we see a syn_table another rskq_accept_head, that's what we just said. Two queues one is the queue for the handshake success, and one is the queue that has been successfully shook.

Receive the client's SYN request, put this request into Syn_table server-side reply Syn-ack-> received the client ack-> into the Accept queue

We divided the entire process into 5 parts, in which the process of placing the request into the syn_table and accept queue is also related to the backlog, which we will elaborate on below.

Let's briefly describe a few TCP operation functions, and the following is the IP4 protocol

const struct Inet_connection_sock_af_ops ipv4_specific = {. queue_xmit   = Ip_queue_xmit,.send_check   = tcp_v4_ Send_check,.rebuild_header   = inet_sk_rebuild_header,.conn_request   = Tcp_v4_conn_request,.syn_recv_sock   = Tcp_v4_syn_recv_sock,.remember_stamp   = Tcp_v4_remember_stamp,.net_header_len   = sizeof (struct IPHDR) ,. setsockopt   = ip_setsockopt,.getsockopt   = ip_getsockopt,.addr2sockaddr   = inet_csk_addr2sockaddr,. Sockaddr_len   = sizeof (struct sockaddr_in),. bind_conflict   = inet_csk_bind_conflict, #ifdef config_ compat.compat_setsockopt = Compat_ip_setsockopt,.compat_getsockopt = compat_ip_getsockopt, #endif};

In the two steps just mentioned, that is, the conn_request and Syn_recv_sock in the struct, the corresponding function is tcp_v4_conn_request and Tcp_v4_syn_recv_sock

Our focus is mainly on the drop logic in the method

Tcp_v4_conn_request function

int tcp_v4_conn_request (struct sock *sk, struct Sk_buff *skb) {/* never answer to SYNs send to broadcast or multicast */if (Skb_rtable (SKB)->rt_flags & (Rtcf_broadcast | rtcf_multicast) Goto drop;/* TW Buckets is converted to open requests without * limitations, they conserve resources and Peer is * evidently real one. */if (Inet_csk_reqsk_queue_is_full (SK) &&!isn) {#ifdef config_syn_cookiesif (sysctl_tcp_syncookies) {Want_ cookie = 1;} Else#endifgoto drop;} /* Accept backlog is full. If we have already queued enough * of warm entries in SYN Queue, drop request. It is better than * clogging syn queue with OPENREQS with exponentially increasing * timeout. */if (Sk_acceptq_is_full (SK) && Inet_csk_reqsk_queue_young (SK) > 1) goto drop;
1. Inet_csk_reqsk_queue_is_full (SK)

Judging by Queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;

Here is a qlen represents the length of the listen_opt syn_table, what is Max_qlen_log?

Nr_table_entries = min_t (u32, nr_table_entries, sysctl_max_syn_backlog); <span style= "White-space:pre" ></ Span>nr_table_entries = max_t (u32, nr_table_entries, 8); <span style= "White-space:pre" ></span>nr_ Table_entries = Roundup_pow_of_two (nr_table_entries + 1); for (lopt->max_qlen_log = 3;     (1 << lopt->max_qlen_log) < nr_table_entries;     lopt->max_qlen_log++);

That is, Max_qlen is listen incoming backlog and sysctl_max_syn_backlog minimum, and must be greater than the number of, Sysctl_max_syn_backlog is our familiar

/proc/sys/net/ipv4/tcp_max_syn_backlog

Let's take a look at the implementation of listen function in kernel

Syscall_define2 (Listen, int, fd, int, backlog) {struct Socket *sock;int err, fput_needed;int somaxconn;sock = Sockfd_looku P_light (FD, &err, &fput_needed); if (sock) {<span style= "Color:rgb (255, 102, 102);" >somaxconn = Sock_net (Sock->sk)->core.sysctl_somaxconn;if ((unsigned) backlog > somaxconn) backlog = Somaxconn;</span>err = Security_socket_listen (sock, backlog); if (!err) Err = Sock->ops->listen (sock, backlog); Fput_light (Sock->file, fput_needed);} return err;}

We clearly see that the backlog is not the size of the backlog that you call listen, which actually takes the minimum of the backlog and somaxconn

The value of Somaxconn is defined in the

/proc/sys/net/core/somaxconn

2.sk_acceptq_is_full

static inline int sk_acceptq_is_full (struct sock *sk) {return sk->sk_ack_backlog > sk->sk_max_ack_backlog;} int Inet_listen (struct socket *sock, int backlog) {<pre name= "code" class= "cpp" >sk->sk_max_ack_backlog = The backlog;}
is equal to the value in the listen we just saw in the previous section.


3.inet_csk_reqsk_queue_young

In the case of judging sk_acceptq_is_full, the same is also required to judge the inet_csk_reqsk_queue_young>1, that is, the structure of the Listen_sock Qlen_young

Qlen_young is the count of syn_table, into the SYN plus 1, out of the syn_table into accept_table-1

Some people might have questions.

If the Accept queue is full, then Qlen_young is not always increased, and the new client will be conditional if (Sk_acceptq_is_full (SK) && Inet_csk_reqsk_queue_young ( SK) > 1) and the ACK packet of the drop SYN, then the client will appear connected timeout, and in fact you will find in the environment of testing Linux does not appear this situation.

In fact, Linux in the server socket when the call Tcp_keepalive_timer start Tcp_synack_timer, will call Inet_csk_reqsk_queue_prune

if (sk->sk_state = = Tcp_listen) {tcp_synack_timer (SK); goto out;}
static void Tcp_synack_timer (struct sock *sk) {inet_csk_reqsk_queue_prune (SK, Tcp_synq_interval,   tcp_timeout_ INIT, Tcp_rto_max);}

And Inet_csk_reqsk_queue_prune will be checking the SYN table, while removing some requests that the request expires and complete the Retry SYN ACK packet.

struct Request_sock {struct request_sock*dl_next;/* Must be first member! */u16mss;u8retrans;u8cookie_ts;/* syncookie:e Ncode tcpopts in timestamp *//* The following II fields can be easily recomputed I Think-ak */u32window_clamp; /* Window clamp at creation time */u32rcv_wnd;  /* Rcv_wnd offered first time */u32ts_recent;unsigned long<span style= "color: #ff0000;" >expires</span>;const struct request_sock_ops*rsk_ops;struct sock*sk;u32secid;u32peer_secid;};

In order to improve the efficiency of Inet_csk_reqsk_queue_prune, Request_sock added expires, the expires initial value is HardCode 3HZ time, Inet_csk_reqsk_queue_ Prune will rotation syn_table in the already Exprie request, found if not to retry the number of times, then will increase expire time to know the retry end, and expire time for the remaining retry times *3hz, and not more than 120HZ

For retry, retry parameters can be set by setting the

/proc/sys/net/ipv4/tcp_syn_retries
Of course you can by setting

/proc/sys/net/ipv4/tcp_abort_on_overflow 1 does not allow SYN ACK retry

Because the syn_table is cleared by the Inet_csk_reqsk_queue_prune function, there is essentially no inet_csk_reqsk_queue_young>1 in the case of concurrency, i.e. no drop Sync situation, in the client performance, will not appear in the case of Connect timeout, here the implementation of Linux and Mac is very different.

Through the analysis of function tcp_v4_conn_request, the design of Linux is designed to allow new connections to be made as best as possible.

We may ask, just now the server SYN ACK back, if the client also reply to the ACK, and when the queue is full, what will be handled

We go back to the steps mentioned earlier, processing the ACK function of the client

Tcp_v4_syn_recv_sock function

struct sock *tcp_v4_syn_recv_sock (struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst) {struct Inet_request_sock *ireq;struct inet_sock *newinet;struct tcp_sock *newtp;struct sock *newsk; #ifdef CONFIG_TCP_ Md5sigstruct Tcp_md5sig_key *key; #endifif (Sk_acceptq_is_full (SK)) Goto exit_overflow;if (!dst && (DST = inet_ Csk_route_req (SK, req)) = = NULL) goto EXIT;NEWSK = Tcp_create_openreq_child (SK, req, SKB); if (!NEWSK) goto exit;newsk->  Sk_gso_type = Skb_gso_tcpv4;sk_setup_caps (NEWSK, DST); NEWTP = Tcp_sk (NEWSK); newinet = Inet_sk (NEWSK); ireq = Inet_rsk (req); newinet->inet_daddr = Ireq->rmt_addr;newinet->inet_rcv_saddr = ireq->loc_addr;newinet-     >inet_saddr = ireq->loc_addr;newinet->opt = Ireq->opt;ireq->opt = Null;newinet->mc_index = Inet_iif (SKB); Newinet->mc_ttl = Ip_hdr (SKB)->ttl;inet_csk (NEWSK)->icsk_ext_hdr_len = 0;if (newinet-&gt ; opt) INET_CSK (NEWSK)->icsk_ext_hdr_len= newinet->opt->optlen;newinet->inet_id = newtp->write_seq ^ jiffies;tcp_mtup_init (NEWSK); TCP_SYNC_MSS (    NEWSK, DST_MTU (DST)), Newtp->advmss = Dst_metric (DST, RTAX_ADVMSS); if (Tcp_sk (SK)->rx_opt.user_mss && Tcp_sk (SK)->rx_opt.user_mss < NEWTP-&GT;ADVMSS) Newtp->advmss = Tcp_sk (SK)->rx_opt.user_mss;tcp_ INITIALIZE_RCV_MSS (NEWSK); #ifdef config_tcp_md5sig/* Copy over the MD5 key from the original socket */key = Tcp_v4_md5_do_ Lookup (SK, NEWINET-&GT;INET_DADDR), if (key = NULL) {/* * * We ' re using one, so create a matching key * on the NEWSK structu Re. If we fail to get * memory, then we end copying the key * across. Shucks. */char *newkey = Kmemdup (Key->key, Key->keylen, gfp_atomic); if (newkey! = NULL) Tcp_v4_md5_do_add (NEWSK, newinet-& Gt;inet_daddr, Newkey, Key->keylen); Newsk->sk_route_caps &= ~netif_f_gso_mask;} #endif__inet_hash_nolisten (NEWSK, NULL); __inet_inherit_port (SK, NEWSK); return Newsk;exit_overflow:net_inc_stats_ BH (Sock_neT (SK), linux_mib_listenoverflows), Exit:net_inc_stats_bh (Sock_net (SK), Linux_mib_listendrops);d st_release (DST); return NULL;}

We see the familiar function Sk_acceptq_is_full, and at this time in the absence of function inet_csk_reqsk_queue_young>1 to protect, that is to say at this time if the queue is full, will be directly discarded just statistical parameters linux_ Mib_listenoverflows,linux_mib_listendrops and the values of these parameters can be

Netstat-s to find out.

In the function Tcp_v4_syn_recv_sock we see tcp_create_openreq_child, at this time only clone out a new socket, that is, only through the 3-time handshake, Linux will produce a new socket, And the socket in the 3 handshake is actually the server's LISTEN socket, which means that the socket has only one State Tcp_listen

The tangled Linux net state through the tcp_rcv_state_process can be placed in the state of the socket, while we typically use Netstat to see the state of these sockets
Case Tcp_syn_recv:if (acceptable) {tp->copied_seq = TP->RCV_NXT;SMP_MB (); Tcp_set_state (SK, TCP_ESTABLISHED);

We see that the status from SYN_RECV is set directly to established, that is, when the server receives the ACK from the client, the status is TCP_SYN_RECV and immediately enters the Tcp_rcv_state_ The process function is set to state established, there is basically no tcp_syn_recv state period, but we can still find that some of the sockets will still be in syn_recv state by using Netstat, in fact this is usually syn_ Table request, in order to show the status of a connection that has not yet passed the three handshake, when the request is still in the SYN table and does not have its own socket object, Linux writes this information to the
And in the case of Tcp_seq_state_openreq (that is, the SYN Synack ACK) 3 states are displayed as Tcp_syn_recv
static void Get_openreq4 (struct sock *sk, struct request_sock *req, struct seq_file *f, int i, int uid, int *len) {Const ST  Ruct Inet_request_sock *ireq = Inet_rsk (req); int TTD = req->expires-jiffies;seq_printf (f, "%4d:%08x:%04x%08X:%04X" " %02x%08x:%08x%02x:%08lx%08x%5d%8d%u%d%p%n ", I,ireq->loc_addr,ntohs (Inet_sk (SK)->inet_sport), Ireq->rmt_ Addr,ntohs (Ireq->rmt_port), <span style= "color: #ff0000;" >tcp_syn_recv</span>,0, 0,/* Could print option size, but this is AF dependent. */1,    /* Timers Active (only the expire timer) */jiffies_to_clock_t (TTD), req->retrans,uid,0,/  * Non standard Timer */0,/* open_requests have no inode */atomic_read (&sk->sk_refcnt), Req,len);}


And to the established state, do not need to server.accept, as long as in the accept queue has become a state established


The backlog in Linux

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.