Setsockopt parameter description (zz)

Source: Internet
Author: User
Tags sendfile socket error keep alive

Int Setsockopt (
SOCKET S,
Int Level,
Int Optname,
Const char * Optval,
Int Optlen
);

S (socket): point to an open set Interface Description
Level: (level): Specifies the option code type.
SOL_SOCKET: basic set of interfaces
IPPROTO_IP: IPv4 Interface
IPPROTO_IPV6: IPv6 Interface
IPPROTO_TCP: TCP interface set
Optname (option name): Option name
Optval (option value): It is a pointer type pointing to a variable: integer, set interface structure, other structure types: linger {}, timeval {}
Optlen (Option Length): optval size

Returned value: indicates that the binary option of a feature is enabled or disabled.
[/Code: 1: 59df4ce128]

 

========================================================== ======================================
SOL_SOCKET
------------------------------------------------------------------------
SO_BROADCAST allows sending broadcast data int
Applicable to UDP socket. The meaning is to allow UDP socket broadcast messages to the network.

So_debug allows int debugging

So_dontroute

So_error get socket error int

So_keepalive
Checks whether the host of the other party crashes to prevent (the server) from blocking the input of the TCP connection forever. After this option is set, if no data is exchanged in any direction of this interface within two hours, TCP automatically sends a keepalive probe to the other side ). This is a TCP segment that the other party must respond to. It may cause the following three situations: the other party receives everything normally: The expected ack response. 2 hours later, TCP sends out another detection shard. The other party has crashed and restarted: respond with RST. The interface to be processed is set to econnreset, and the interface itself is closed. The other party has no response: TCP sent from the Berkeley sends an additional eight detection segments, one in 75 seconds, and tries to get a response. If no response is returned after the first probe is sent for 11 minutes and 15 seconds, give up. The processing error of the Set interface is set to etimeout by mistake, and the Set interface itself is disabled. For example, if the ICMP error is "Host Unreachable (host inaccessible)", it indicates that the host of the other party has not crashed but is not reachable. In this case, the error to be handled is set to ehostunreach.

If so_dontlinger is true, the so_linger option is disabled.
So_linger delay disconnection struct linger
The preceding two options affect the close action.
Option interval close mode waiting for closing or not
So_dontlinger does not care about elegance or not
So_linger: Zero-force no
So_linger non-zero elegance is
If so_linger is set (that is, the l_onoff field in the linger structure is set to non-zero, see Section 2.4, 4.1.7 and 4.1.21), and the zero timeout interval is set, closesocket () run immediately without being blocked, whether or not there are queued data not sent or not confirmed. This method is called "forced" or "invalid", because the virtual circuit of the Set interface is reset immediately and unsent data is lost. A wsaeconnreset error occurs when remote Recv () is called.
If so_linger is set and a non-zero timeout interval is determined, closesocket () calls the blocking process until all data has been sent or timed out. This type of closure is called an "elegant" closure. Note that if the set interface is set to non-blocking and so_linger is set to a non-zero timeout value, closesocket () will return a wsaewouldblock error.
If so_dontlinger is set on a stream class interface (that is, the l_onoff field of the linger structure is set to zero; see section 2.4, 4.1.7, 4.1.21), closesocket () is called and returned immediately. However, if possible, the queued data will be sent before the set interface is closed. Note that in this case, the implementation of Windows interface sets will retain the interface sets and other resources for a period of uncertain time, which will affect the applications that want to use the interface.

So_oobinline puts out-of-band data into normal data streams and receives out-of-band data int in normal Data Streams

So_rcvbuf receive buffer size int
Set the retention size of the receiving buffer.
It has nothing to do with so_max_msg_size or TCP sliding window. If the packet sent frequently, use this option.

SO_SNDBUF sending buffer size int
Set the size of the sending Buffer
It has nothing to do with SO_MAX_MSG_SIZE or TCP sliding window. If the packet sent frequently, use this option.
Each set of interfaces has a sending buffer and a receiving buffer. The receiving buffer is used by TCP and UDP to keep the received data for read by the application process. TCP: The window size of TCP advertised to the other end. The receiving buffer of the TCP interface cannot overflow, because the other party cannot send data that exceeds the size of the advertised window. This is the traffic control of TCP. If the recipient ignores the window size and sends out data that exceeds the Zhoukou size, the receiver TCP will discard it. UDP: When the received data report is not included in the interface to receive the buffer, the datagram is discarded. There is no traffic control for UDP. A fast sender can easily drown out slow recipients, causing the receiver to discard the UDP datagram.

SO_RCVLOWAT lower limit int of the receiving buffer
SO_SNDLOWAT lower limit int of the sending Buffer
Each set of interfaces has a receiving low tide limit and a sending low tide limit. They are used by the function selectt, and the receiving low tide limit is to make the select return "readable" and the total amount of data required in the buffer zone received by the interface. -- For a TCP or UDP interface, the default value is 1. Sending low tide limit is to allow the select statement to return "writable", and the available space is required in the interface sending buffer. For TCP interfaces, the default value is 2048. For the low-tide limit of UDP usage, because the number of bytes in the available space in the sending buffer is never changed, as long as the buffer size of the UDP interface sending is greater than the low-tide limit of the Set interface, this UDP interface is always writable. UDP has no sending buffer, only the size of the sending buffer.

SO_RCVTIMEO receiving timeout struct timeval
SO_SNDTIMEO sending timeout struct timeval
SO_REUSERADDR allows reuse of the local address and port int
Bind the used address (or port number). For more information, see bind man.

SO_EXCLUSIVEADDRUSE
In the exclusive mode, a port is used not to be shared with other programs using SO_REUSEADDR.
When determining who is used by multiple bindings, it is based on the principle that the package is submitted to the user with the most explicit designation, and there is no permission, that is to say, users with low-level permissions can be rebound to high-level permissions, such as the port on which the service starts. This is a major security risk,
If you do not want your program to be listened to, use this option.

SO_TYPE: Obtain the socket type int.
SO_BSDCOMPAT is compatible with the BSD system int

========================================================== ========================================
IPPROTO_IP
--------------------------------------------------------------------------
IP_HDRINCL contains the IP header int in the data package.
This option is often used by hackers to hide their IP addresses.

IP_OPTINOS IP header option int
IP_TOS service type
IP_TTL time int

The following IPV4 Option is used for Multicast
IPv4 Option data type description
IP_ADD_MEMBERSHIP struct ip_mreq add to multicast group
IP_ROP_MEMBERSHIP struct ip_mreq exit from multicast group
IP_MULTICAST_IF struct ip_mreq specifies the interface for submitting multicast packets
IP_MULTICAST_TTL u_char specifies the TTL of the subscriber.
IP_MULTICAST_LOOP u_char makes the multicast loop valid or invalid
The ip_mreq structure is defined in the header file:
[Code: 1: 63724de67f]
Struct ip_mreq {
Struct in_addr imr_multiaddr;/* IP multicast address of group */
Struct in_addr imr_interface;/* local IP address of interface */
};
[/Code: 1: 63724de67f]
To add a process to a multicast group, use the setsockopt () function of soket to send this option. The option type is ip_mreq structure. Its first field imr_multiaddr specifies the address of the multicast group, and the second field imr_interface specifies the IPv4 address of the interface.
Ip_drop_membership
This option is used to exit a multicast group. The data structure ip_mreq is used in the same way as above.
Ip_multicast_if
This option can modify the network interface and define a new interface in the structure ip_mreq.
Ip_multicast_ttl
Set the TTL (TTL) of the packets in the multicast packets ). The default value is 1, indicating that data packets can only be transmitted in the local subnet.
Ip_multicast_loop
A member in a multicast group also receives the packet sent to the group. This option is used to select whether to activate this status.

Double reply: 21:21:52
IPPRO_TCP
--------------------------------------------------------------------------
TCP_MAXSEG maximum TCP Data Segment Size int
Gets or sets the maximum number of nodes (MSS) for a TCP connection ). The returned value is the maximum data volume that our TCP sends to the other end. It is often the MSS advertised by the other end using SYN, unless we select a value smaller than the MSS advertised by the other party for TCP. If this value is obtained before the set interface connection, the returned value is the default value when the Mss option is not received from the other end. A message smaller than the returned value can be used for connections, because a token occupies 12 bytes of TCP option capacity in each shard if the timestamp option is used. The maximum data size of each shard sent by TcP can also be changed during the connection period, provided that TCP supports the path MTU discovery function. If the path to the other party changes, this value can be adjusted up or down.
Tcp_nodelay does not use the Nagle algorithm int

Specify the idle time of the connection in seconds before TCP starts sending and keep alive detection. The default value must be at least 7200 seconds, that is, 2 hours. This option is valid only when the so_kepalivee set interface option is enabled.

Tcp_nodelay and tcp_cork,
Both options play an important role in network connection. Many Unix systems have implemented the tcp_nodelay option. However, tcp_cork is unique to Linux systems and is relatively new. It is first implemented in kernel version 2.4. In addition, other UNIX system versions have similar functions. It is worth noting that the tcp_nopush option on a BSD-derived system is actually part of the specific implementation of tcp_cork.
TCP_NODELAY and TCP_CORK basically control the "Nagle" of the package. The meaning of Nagle here is that the Nagle algorithm is used to assemble a smaller package into a larger frame. John Nagle is the inventor of the Nagle algorithm. The latter is named by his name, he used this method for the first time in 1984 to solve the network congestion problem of Ford Motor Corporation (for more information, see ietf rfc 896 ). The problem he solved is the so-called silly window syndrome, which is called the "stupid window syndrome" in Chinese. The specific meaning is that every time a universal Terminal application generates a key operation, it will send a packet, in typical cases, a packet has a data load of one byte and a 40-byte long packet header, resulting in 4000% overload, which can easily cause network congestion ,. Nagle became a standard and was immediately implemented on the Internet. It has now become the default configuration, but in our opinion, it is also necessary to turn this option off in some cases.
Now let's assume that an application sends a request to send small pieces of data. We can choose to send data immediately or wait for more data to be generated and then send it again. If we send data immediately, our interactive and customer/server applications will be greatly benefited. For example, when we are sending a short request and waiting for a large response, the associated overload will be lower than the total amount of data transmitted, and, if the request is sent immediately, the response time will be faster. You can set the TCP_NODELAY option of the socket to disable the Nagle algorithm.
In another case, we need to wait until the data size reaches the maximum to send all the data through the network. This data transmission method is beneficial to the communication performance of a large amount of data. A typical application is the file server. The application of the Nagle algorithm causes problems in this case. However, if you are sending a large amount of data, you can set the TCP_CORK option to disable Nagle. The method is exactly the same as that of TCP_NODELAY (TCP_CORK and TCP_NODELAY are mutually exclusive ). Next let's take a closer look at its working principles.
Assume that the application uses the sendfile () function to transfer a large amount of data. Application Protocols usually require sending certain information to pre-interpret the data, which is actually the header content. In typical cases, the header is small and TCP_NODELAY is set on the socket. Packets with headers will be transmitted immediately. In some cases (depending on the internal package counter), this packet is successfully received by the other party and needs to be confirmed by the other party. In this way, the transmission of a large amount of data will be postponed and unnecessary network traffic exchange will occur.
However, if we set the TCP_CORK option on the socket (which may be equivalent to inserting a "plug-in" on the pipeline), a packet with a header will fill in a large amount of data, all data is automatically transmitted through the package according to the size. When the data transmission is complete, it is best to cancel the TCP_CORK option setting to "Remove the plug" for the connection so that any part of the frames can be sent out. This is equally important for "congested" network connections.
All in all, if you can certainly send multiple data sets together (such as the HTTP response header and body), we recommend that you set the TCP_CORK option so that there is no latency between the data. It can greatly benefit the performance of WWW, FTP, and file servers, while also simplifying your work. The sample code is as follows:

Intfd, on = 1;
...
/* Create socket and other operations, which are omitted for space purposes */
...
Setsockopt (fd, SOL_TCP, TCP_CORK, & on, sizeof (on);/* cork */
Write (fd ,...);
Fprintf (fd ,...);
Sendfile (fd ,...);
Write (fd ,...);
Sendfile (fd ,...);
...
On = 0;
Setsockopt (fd, SOL_TCP, TCP_CORK, & on, sizeof (on);/* unplug the plug */

Unfortunately, many common programs do not consider the above issues. For example, sendmail written by Eric Allman does not set any options for its socket.

Apache HTTPD is the most popular Web server on the Internet. All its sockets are configured with the TCP_NODELAY option, and its performance is well received by most users. Why? The answer lies in the difference in implementation. The TCP/IP protocol stack derived from BSD (FreeBSD is worth noting) has different operations in this situation. When a large number of small data blocks are submitted for transmission in TCP_NODELAY mode, a large amount of information is sent by calling the write () function once. However, because the record responsible for request delivery validation is byte-oriented rather than packet-oriented (on Linux), the probability of latency introduction is much lower. The result is only related to the size of all data. Linux requires confirmation after the first package arrives, and FreeBSD will wait for several hundred packages before doing so.

In Linux, the effect of TCP_NODELAY is very different from that expected by developers who are used to the BSD TCP/IP protocol stack, and the Apache performance in Linux will be worse. Other applications that frequently use TCP_NODELAY on Linux have the same problem.

TCP_DEFER_ACCEPT

The first 1st options we should consider is TCP_DEFER_ACCEPT (this is the name of the Linux system, and some other operating systems also have the same options but use different names ). To understand the specific ideas of the TCP_DEFER_ACCEPT option, it is necessary to give a general description of the typical HTTP client/server interaction process. Recall how TCP establishes a connection with the destination for data transmission. On the network, the information transmitted between separated units is called an IP packet (or an IP datagram ). A packet always has a header containing service information, which is used for internal protocol processing and can also carry data load. A typical example of service information is a set of so-called labels, which indicate the special meaning in the TCP/IP protocol stack, such as the successful confirmation of packets received. Generally, it is entirely possible to carry the load in a tagged packet, but sometimes the internal logic forces the TCP/IP protocol stack to issue an IP packet with only a packet header. These packages often cause annoying network latency and increase the system load. As a result, the overall network performance is reduced.
Now the server creates a socket and waits for the connection. The TCP/IP connection process is called "three handshakes ". First, the customer program sends a TCP packet (one SYN Packet) that sets the SYN flag without data load ). The server sends a packet with the SYN/ACK mark (a SYN/ACK packet) as the confirmation response of the packet received just now. The customer then sends an ACK packet to confirm that 2nd packets are received, thus terminating the connection process. After receiving the SYN/ACK packet from the customer, the server will wake up a receiving process waiting for data to arrive. After three handshakes are completed, the customer program starts to send "useful" data to the server. Generally, an HTTP request is very small and can be fully loaded into a package. However, in the above cases, at least four packets will be used for bidirectional transmission, which increases the latency. In addition, you must note that the recipient has begun waiting for information before "useful" data is sent.
To mitigate the impact of these problems, Linux (and some other operating systems) includes the TCP_DEFER_ACCEPT option in its TCP implementation. They are set on the server that listens to the socket. the kernel of this option does not initialize the listening process until the last ACK packet is reached and the 1st packets with real data arrive. After a SYN/ACK packet is sent, the server waits for the client program to send an IP packet containing data. Now, you only need to transfer three packets on the network, and it also significantly reduces the delay of connection establishment, especially for HTTP Communication.
This option has an equivalent in many operating systems. For example, on FreeBSD, the same behavior can be implemented using the following code:

/* For clarity, skip irrelevant code here */
Struct accept_filter_arg af = {"dataready ",""};
Setsockopt (s, SOL_SOCKET, SO_ACCEPTFILTER, & af, sizeof (af ));
This feature is called "Accept filter" on FreeBSD and has multiple usage features. However, in almost all cases, the effect is the same as that of TCP_DEFER_ACCEPT: the server does not wait for the last ACK packet but only for the packet carrying the data load. For more information about this option and Its Significance to High-Performance Web servers, see the Apache documentation.
For HTTP client/server interaction, it is possible to change the behavior of the client program. Why does the customer program send this "useless" ACK package? This is because the TCP stack cannot know the status of the ACK package. If FTP is used instead of HTTP, the client program will not send data until it receives the data packet prompted by the FTP server. In this case, a delayed ACK will lead to a delay in customer/server interaction. To determine whether ACK is necessary, the customer program must know the application protocol and its current status. In this way, it is necessary to modify the customer behavior.
For Linux client programs, we can also use another option, also called TCP_DEFER_ACCEPT. We know that sockets are divided into two types: Listener sockets and connection sockets, so they also have their respective sets of TCP options. Therefore, the two options that are often used at the same time have the same name. After this option is set on the connection socket, the customer no longer sends the ACK packet after receiving a SYN/ACK packet, but waits for the user program to send data requests. Therefore, the number of packets sent by the server is reduced accordingly.

TCP_QUICKACK

Another way to prevent delay caused by sending useless packets is to use the tcp_quickack option. Unlike tcp_defer_accept, this option can be used not only to manage the connection establishment process, but also during normal data transmission. In addition, it can be set on either side of the client/server connection. If you know that the data is about to be sent soon, the delay in sending the ACK package will come in handy, and it is best to set the ACK flag on the data packet to minimize the network load. When the sender confirms that the data will be sent immediately (multiple packets), The tcp_quickack option can be set to 0. For sockets in the "connection" status, the default value of this option is 1. After the first use, the kernel will immediately reset this option to 1 (this is a one-time option ).
In some cases, it is very useful to issue an ACK package. The ack package will confirm the receipt of the data block, and the delay will not be introduced when the current one is processed. This data transmission mode is quite typical for the interaction process, because in such cases, users' input time cannot be predicted. In Linux, This is the default socket behavior.
In the above circumstances, the client program is sending an HTTP request to the server, but it knows that the request package is very short in advance, so it should be sent immediately after the connection is established. This is a typical way of working with HTTP. Since there is no need to send a pure ack package, it is entirely possible to set tcp_quickack to 0 to improve performance. On the server side, both options can be set only once on the listening socket. All sockets, that is, sockets indirectly created by the accepted call, inherit all the options of the original socket.
Through the combination of tcp_cork, tcp_defer_accept and tcp_quickack options, the number of data packets involved in each HTTP interaction will be reduced to a minimum acceptable level (based on TCP protocol requirements and security considerations ). The result is not only faster data transmission and request processing speed, but also the two-way latency of the customer/server is minimized.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.