Http://blog.csdn.net/cyblueboy83/article/details/1791713
(1) applications can use TCP socket to send application data to the network by calling send (write, sendmsg, etc, the TCP/IP protocol stack then sends the application data (TCP datagram) that has been organized into struct sk_buff to the network through the network device interface, because the sending speed of an application is different from that of a network medium, some application data is organized into TCP datagram and cached in the sending cache queue of a TCP socket, wait until the network is idle. At the same time, the peer needs to ack the serial number of a TCP datagram after receiving it. Only when a TCP datagram is received
The TCP datagram (in the form of a struct sk_buff) can be cleared from the socket sending Buffer Queue after the ACK of the datagram.
The sending buffer of TCP socket is actually a queue of the struct sk_buff. We can call it a sending Buffer Queue, which is represented by the member sk_write_queue of the struct sock. Sk_write_queue is a struct type of struct sk_buff_head, which is a bidirectional linked list of struct sk_buff. Its definition is as follows:
Struct sk_buff_head {
Struct sk_buff * Next; // post pointer
Struct sk_buff * Prev; // front pointer
_ U32 qlen; // queue length (contains several struct sk_buff)
Spinlock_t lock; // chain table lock
};
In the kernel code, create a struct sk_buff sufficient to store data in this queue, and then store the application data to the queue.
Sk_wmem_queued, a member of struct sock, indicates the number of allocated bytes in the sending Buffer Queue. Generally, assigning a struct sk_buff is used to store a TCP datagram, the number of bytes allocated should be the length of the MSS + protocol header. In my experiment environment, the MSS value is 1448. the maximum length of the protocol header is max_tcp_header, and the value is 224 in my experiment environment. After data alignment, the truesize of struct sk_buff is 1956. That is, for each struct sk_buff allocated to the queue, the value of the member sk_wmem_queue increases by 1956.
The sk_forward_alloc member of struct sock indicates the pre-allocated length. When we allocate a struct sk_buff to the sending Buffer Queue for the first time, we do not directly allocate the required memory size, but pre-allocate it on the Memory Page.
The function assigned to struct sk_buff over TCP is sk_stream_alloc_pskb. It first allocates a struct sk_buff in the memory based on the size specified by the input parameter. If it succeeds, sk_forward_alloc obtains the size value and rounded up to an integer multiple of the page (4096 bytes. And accumulate it into the strct sock member sk_prot, that is, memory_allocated, a member of the TCP struct mytcp_prot. This member is a pointer pointing to the variable tcp_memory_allocated, it indicates the memory (including the read Buffer Queue) currently allocated for the buffer zone by the entire TCP protocol)
After the newly allocated struct sk_buff is put into the Buffer Queue sk_write_queue, The truesize value of the sk_buff is subtracted from sk_forward_alloc. When struct sk_buff is assigned for the second time, you only need to subtract the truesize of the new sk_buff from sk_forward_alloc. If sk_forward_alloc is smaller than the current truesize, add the integer multiple of the previous page, and add tcp_memory_allocated.
That is to say, the global variable tcp_memory_allocated is used through sk_forward_alloc to save the size of the memory allocated by the total buffer of the current TCP protocol, and the size is aligned with the page boundary.
(2) sk_forward_alloc, a member of struct sock, indicates the pre-allocated memory size. It is used to accumulate the buffer size of the entire TCP protocol allocated to the global variable mytcp_memory_allocated. The reason for accumulating this value is to limit the total available buffer size of the TCP protocol. It indicates the structure of the TCP protocol mytcp_prot. Several other members are related to the buffer.
Mysysctl_tcp_mem is an array directed by sysctl_mem, a member of mytcp_prot. The array has three elements. mysysctl_tcp_mem [0] indicates the minimum available size of the buffer, if the total size of the currently allocated buffer is lower than this value, no problem occurs. The allocation is successful. Mysysctl_tcp_mem [2] indicates the maximum hard limit on the available buffer size. Once the total allocated buffer size exceeds this value, we have
The default size of the socket sending buffer sk_sndbuf is reduced to half of the size of the allocated buffer queue, but cannot be smaller than sock_min_sndbuf (2 k). However, this allocation is successful. Mysysctl_tcp_mem [1] is between the first two values. This is a warning value. Once the value is exceeded, it enters the warning status. In this status, the allocation is successful Based on the call parameters.
The size of these three values is determined during initialization Based on the memory size of the system. In my experiment environment, the memory size is 256 MB. The three values are allocated as follows: 96 k, 128 K, 192 K. They can be modified in/proc/sys/NET/IPv4/tcp_mem through the/proc file system. Unless necessary, you do not need to change these default values.
Mysysctl_tcp_wmem is also an array of the same structure, indicating the size limit of the sending buffer, which is directed by the mytcp_prot member sysctl_wmem. The default values are 4 K, 16 K, and 128 K, respectively. You can modify the/proc file system in/proc/sys/NET/IPv4/tcp_wmem. The value of the sk_sndbuf member of struct sock is the preset size of the real sending Buffer Queue. The initial value is 16 K in the middle. In the process of sending TCP datagram, once sk_wmem_queued exceeds the value of sk_sndbuf, the sending is stopped, waiting for the sending buffer to be available. Because it is possible that a batch of data that has been sent has not yet received ACK, and data in the Buffer Queue can also be all sent, so as to clear the Buffer Queue, as long as the network is not very poor (it is almost impossible to receive ACK), this will succeed after a period of time.
The global variable mytcp_memory_pressure is a flag. When the TCP buffer size enters the warning state, it is set to 1; otherwise, it is set to 0.
(3) mytcp_sockets_allocated indicates the number of sockets created in the TCP protocol so far, which is indicated by the sockets_allocated member of mytcp_prot. You can view the data in the/proc/NET/sockstat file, which is only used for statistics and viewing.
Mytcp_orphan_count indicates the number of sockets to be destroyed in the TCP protocol (useless socket), which is directed by the member orphan_count of mytcp_prot or in the/proc/NET/sockstat file.
Mysysctl_tcp_rmem is an array in the same structure as mysysctl_tcp_wmem. It indicates the size limit of the buffer to be received. It is directed by the mytcp_prot member sysctl_rmem. The default values are 4096 bytes, 87380 bytes and. They can be modified in/proc/sys/NET/IPv4/tcp_rmem through the/proc file system. The sk_rcvbuf member of struct sock indicates the size of the receiving Buffer Queue. The initial value is mysysctl_tcp_rmem [1], and the member sk_receive_queue.
It is a receiving Buffer Queue with the same structure as sk_write_queue.
The size of the TCP socket sending Buffer Queue and receiving Buffer Queue can be modified either through the/proc file system or through the TCP option operation. The so_rcvbuf option at the socket level can be used to obtain and modify the size of the receiving Buffer Queue (that is, the value of strcut sock-> sk_rcvbuf ), for example, the following code can be used to obtain the size of the receiving Buffer Queue of the current system:
Int rcvbuf_len;
Int Len = sizeof (rcvbuf_len );
If (getsockopt (FD, sol_socket, so_rcvbuf, (void *) & rcvbuf_len, & Len) <0 ){
Perror ("getsockopt :");
Return-1;
}
Printf ("The recevice Buf Len: % d \ n", rcvbuf_len );
The socket-level option so_sndbuf is used to obtain and modify the size of the sending Buffer Queue (that is, the value of struct sock-> sk_sndbuf). The Code is the same as above. You only need to change so_rcvbuf to so_sndbuf.
The size of the buffer for obtaining sending and receiving is relatively simple, and the set operations are slightly more complex in the kernel. In addition, there are also differences in interfaces, that is, the parameter passed in by setsockopt to indicate the buffer size is 1/2 of the actual size, that is, if you want to set the size of the sending buffer to 20 K, you need to call setsockopt as follows:
Int rcvbuf_len = 10*1024; // half the actual buffer size.
Int Len = sizeof (rcvbuf_len );
If (setsockopt (FD, sol_socket, so_sndbuf, (void *) & rcvbuf_len, Len) <0 ){
Perror ("getsockopt :");
Return-1;
}
In the kernel, the kernel must first determine whether the new value has exceeded the upper limit. If the value has exceeded the upper limit, the new value is used as the upper limit, the upper limit of the sending and receiving buffer sizes is twice that of sysctl_wmem_max and sysctl_rmem_max, respectively. The values of the two global variables are equal, both of which are (sizeof (struct sk_buff) + 256) * 256, which is about 64 K load data. Due to the impact of struct sk_buff, the maximum size of the actual sending and receiving buffer is kb. The lower limit is 2 K, that is, the buffer size cannot be less than 2 K.
In addition, so_sndbuf and so_rcvbuf have a special version: so_sndbufforce and so_rcvbufforce. They are not limited by the maximum size of the sending and receiving buffers, and can be set to any buffer size not less than 2 K. (End)
Additional content:
What does TCP do if the number of bytes of write> the socket sending buffer?
In the non-blocking mode, the value is the number of outgoing mails within the specified sending time range.
The actual application is as follows:
In non-blocking mode, the setsockopt function is generally used to set the sending blocking time, and then send data by calling send (). When this time is exceeded, the send function returns the size of the sent data,
However, note that some data in the cache may not be sent to the network.
When the send function is called again at the application layer, a classic error is reported:
Resource temporarily unavailable
In the case of blocking, The send function will wait until all data at the application layer is sent and then return...
In addition, if UDP is used as the sending end, you do not need to consider blocking or blocking, and an error will be reported:
Message too long