(RPM) about TCP and UDP buffers

Source: Internet
Author: User

(i) Basic knowledge

    • The maximum size of the IPV4 datagram is 65535 (16 bits), including the IPv4 head.
    • IPV6 datagram Maximum size is 65575, including 40 bytes of IPV4 header
    • MTU, which is specified by hardware, such as the Ethernet MTU is 1500 bytes, IPV4 requires the minimum MTU is 68 bytes, IPV6 requires the minimum MTU is 576 bytes
    • Path MTU: Refers to the minimum MTU on a path between two hosts
    • Shard (Fragmentation): Refers to the IP datagram size more than the corresponding link Mtu,ipv4 and IPV6 will be the IP data shards, to the destination host after the reorganization.
    • IPv4 the DF bit of the head is used to set the Shard or not shard
    • MSS: Maximum section size, which advertises to the other party TCP the maximum amount of TCP data that the notifier can send in each sub-section. The purpose of MSS is to tell each other the actual value of its reassembly buffer size, thus avoiding fragmentation.


(ii) TCP with the UDP the output

Each TCP socket interface has a send buffer, which can be changed by using the SO_SNDBUF socket interface option. When the application process calls write data to the socket, the kernel copies all data from the application process buffer to the send buffer of the socket, if the socket send buffer does not contain all the data of the application, or if the buffer of the application process is larger than the send buffer of the socket. Or if there is another data in the send buffer for the socket, the application process will be suspended. The kernel will not return from write. Until all data in the application process buffer is copied to the socket send buffer. So, the successful return from writing a TCP socket to a write call simply means that we can reuse the application process buffer, and it does not tell us that the other party received the data. TCP sends the data to the other party, the other party must confirm the spear when it receives the data, and only when it receives the confirmation from the other side, TCP will remove the data from the TCP send buffer.

UDP because it is an unreliable connection, do not have to save the data copy of the application process, the application process of data in the protocol stack down, in some form of copy to the kernel buffer, when the data link layer of data out of the kernel buffer in the data copy deleted. Therefore, it does not require a send buffer. Writing the write return of the UDP socket indicates that the data or data shard of the application has entered the output queue of the link layer, and if the output queue does not have enough space to hold the data, an error ENOBUFS will be returned.

(iii) TCP Sockets the send and receive buffers   

applications can use TCP sockets to send application data to the network by calling send (write, sendmsg, etc.), while tcp/ The IP stack then sends the application data (TCP datagram) that has been organized into a struct sk_buff to the network through the network device interface, because the speed of the application call send is different from that of the network media, so, after some application data is organized into a TCP datagram, is cached in the Send cache queue of the TCP socket and waits for the network to be sent out when it is idle. At the same time, the TCP protocol requires the peer to ACK the serial number after receiving the TCP datagram, and only after receiving the ACK of a TCP datagram can the TCP datagram (in the form of a struct sk_buff) be purged from the socket's send buffer queue. The send buffer for the
TCP socket is actually a struct struct sk_buff queue, which we can call the Send buffer queue, represented by the member sk_write_queue of struct struct sock. Sk_write_queue is a struct struct sk_buff_head type, which is a two-way linked list of struct Sk_buff, which is defined as follows:


struct Sk_buff_head {
struct Sk_buff *next; Back pointer
struct Sk_buff *prev; Front pointer
__u32 Qlen; Queue Length (that is, contains several struct sk_buff)
spinlock_t lock; Chain List Lock
};


(1)

In the kernel code, a struct sk_buff sufficient to hold the data is created in this queue, and then the application data is deposited into the queue.
The member sk_wmem_queued of the struct struct sock represents the number of bytes allocated in the Send buffer queue, in general, assigning a struct sk_buff is used to hold a TCP datagram, and its allocated bytes should be the Mss+ protocol first ministerial level. In my experimental environment, the MSS value is 1448, the protocol header takes the maximum length max_tcp_header, in my experimental environment is 224. After the data is aligned, the truesize of the last struct Sk_buff is 1956. That is, each allocation of a struct sk_buff in the queue, the value of member Sk_wmem_queue increases by 1956.
The member sk_forward_alloc of the struct sock represents the pre-allocated length. When we first allocated a struct sk_buff for the send buffer queue, we did not directly allocate the required memory size, but rather the pre-allocation in memory pages.
The TCP protocol assigns a struct sk_buff function to be sk_stream_alloc_pskb. It first allocates a struct sk_buff in memory based on the size specified by the passed in parameter, and if successful, Sk_forward_alloc takes that size value and takes an integer multiple of the page (4096 bytes) up. And added to the member Sk_prot of the struct sock, that is, the member memory_allocated of the struct Mytcp_prot that represents the TCP protocol, which is a pointer to the variable tcp_memory_allocated, It represents the current memory allocated by the entire TCP protocol to the buffer (including the read buffer queue)
When the newly allocated struct sk_buff is placed into the buffer queue Sk_write_queue, the truesize value of the Sk_buff is subtracted from the sk_forward_alloc. The second time the struct Sk_buff is allocated, the truesize of the new Sk_buff is subtracted from the Sk_forward_alloc, if Sk_forward_alloc is less than the current truesize, It adds an integer multiplier to the page and joins Tcp_memory_allocated.
That is, by sk_forward_alloc the global variable tcp_memory_allocated saves the total buffer allocated memory for the current TCP protocol, and the size is aligned to the page boundary.

(2)

As mentioned earlier, member Sk_forward_alloc of struct sock represents the pre-allocated memory size used to mytcp_memory_allocated the buffer size of the entire TCP protocol that is currently allocated to the global variable. The reason to accumulate this value is to limit the total available buffer size for the TCP protocol. The struct body Mytcp_prot that represents the TCP protocol also has several members associated with the buffer.
Mysysctl_tcp_mem is an array that is pointed to by the member Sysctl_mem of the Mytcp_prot, the array has three elements, and mysysctl_tcp_mem[0] represents the minimum limit for the total available size of the buffer, The current total allocated buffer size is below this value, then there is no problem and the allocation is successful. Mysysctl_tcp_mem[2] represents the highest hard limit on the size of the buffer available, and once the total allocated buffer size exceeds this value, we have to reduce the default size of the send buffer of the TCP socket SK_SNDBUF to half the size of the allocated buffer queue. But not less than sock_min_sndbuf (2K), but ensure that this time the allocation succeeds. MYSYSCTL_TCP_MEM[1] In the middle of the previous two values, this is a warning value, once the value is exceeded, enter the warning state, in this state, depending on the invocation parameters to determine whether the allocation is successful.
The size of these three values is based on the size of the system's memory, determined at initialization, in my lab environment, the memory size is 256M, these three values are assigned: 96k,128k,192k. They can be modified in the/PROC/SYS/NET/IPV4/TCP_MEM through the/proc file system. Of course, you don't need to change these defaults unless you specifically want to.
Mysysctl_tcp_wmem is also an array of the same structure, indicating the size limit of the send buffer, which is pointed by the member Sysctl_wmem of Mytcp_prot, and whose default value is 4k,16k,128k. You can modify it in/proc/sys/net/ipv4/tcp_wmem by/proc the file system. The value of the member sk_sndbuf of the struct sock is the default size of the true send buffer queue, and its initial value is taken in the middle of a 16K. During the sending of the TCP datagram, once the sk_wmem_queued exceeds the SK_SNDBUF value, the send stops and waits for the send buffer to be available. Because it is possible that a batch of sent data has not received an ACK, while the buffer queue of data can be all sent out, has reached the purpose of emptying the buffer queue, so long as the network is not very poor (poor to no way to receive an ACK), this wait for a period of time will be successful.
The global variable mytcp_memory_pressure is a flag that, when the TCP buffer size enters the warning state, it is set to 1, otherwise 0.

(3)

Mytcp_sockets_allocated is the number of sockets created so far by the entire TCP protocol, which is pointed to by Mytcp_prot member sockets_allocated. Can be viewed in the/proc/net/sockstat file, which is just a statistical view of the data, without any practical restrictions.
Mytcp_orphan_count represents the number of sockets to be destroyed in the entire TCP protocol (a useless socket), which is pointed to by the Mytcp_prot member Orphan_count, or can be viewed in the/proc/net/sockstat file.
Mysysctl_tcp_rmem is an array of the same structure as MYSYSCTL_TCP_WMEM, which represents the size limit of the receive buffer, which is pointed by the member Sysctl_rmem of Mytcp_prot, with a default value of 4096bytes respectively. 87380bytes,174760bytes. They can be modified in the/PROC/SYS/NET/IPV4/TCP_RMEM through the/proc file system. The member sk_rcvbuf of the struct sock represents the size of the receiving buffer queue, its initial value is mysysctl_tcp_rmem[1], the member Sk_receive_queue is the receive buffer queue, and the structure is the same as Sk_write_queue.
The sending buffer queue of the TCP socket and the size of the receiving buffer queue can be modified either through the/proc file system or through the TCP option operation. The option SO_RCVBUF at the socket level can be used to get and modify the size of the receive buffer queue (that is, the value of the Strcut sock->sk_rcvbuf), such as the following code can be used to obtain the current system's receive buffer queue size:


int Rcvbuf_len;
int len = sizeof (Rcvbuf_len);
if (getsockopt (FD, Sol_socket, SO_RCVBUF, (void *) &rcvbuf_len, &len) < 0) {
Perror ("getsockopt:");
return-1;
}
printf ("The Recevice buf len:%d\n", Rcvbuf_len);


The option SO_SNDBUF on the socket level is used to get and modify the size of the send buffer queue (that is, the value of the struct sock->sk_sndbuf), as in the code above, just change so_rcvbuf to So_sndbuf.
It is relatively simple to get the size of the send and receive buffers, and the actions set up in the kernel will be slightly more complex, in addition, there will be differences on the interface, that is, the parameter that is passed by setsockopt represents the buffer size is 1/2 of the actual size, that is, if you want to set the send buffer size to 20K, You need to call setsockopt like this:


int Rcvbuf_len = 10 * 1024; Half of the actual buffer size.
int len = sizeof (Rcvbuf_len);
if (setsockopt (FD, Sol_socket, SO_SNDBUF, (void *) &rcvbuf_len, Len) < 0) {
Perror ("getsockopt:");
return-1;
}


In the kernel, the kernel first to determine whether the new set of values exceed the upper limit, if exceeded, the upper limit is the new value, send and receive buffer size of the upper limit of Sysctl_wmem_max and Sysctl_rmem_max, respectively, twice times. The values of the two global variables are equal, both (sizeof (struct Sk_buff) + 256) * 256, approximately 64K load data, and the actual send and receive buffers can be set to about 210K, due to the influence of the struct sk_buff. Their lower limit is 2K, which means that the buffer size cannot be less than 2K.
In addition, SO_SNDBUF and SO_RCVBUF have a special version: So_sndbufforce and So_rcvbufforce, which are not limited by the maximum send and receive buffer size, and can be set to any buffer size not less than 2K

(RPM) about TCP and UDP buffers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.