the semantics of Read/write: Why is it blocked.
Let me start with write:
#include <unistd.h>
ssize_t Write (int fd, const void *buf, size_t count);
First, write returns successfully, except that the data in the BUF is replicated to the TCP send buffer in kernel. As to when the data is sent to the network, when received by the other host, when the process is read by the other side, the system call level will not give any guarantee and notice.
Under what circumstances will the write block. When the send buffer for the kernel socket is full. For each socket, have its own send buffer and receive buffer. Starting with Linux 2.6, the two buffer sizes are automatically adjusted by the system (autotuning), but generally float between default and Max.
# Gets the size of the Send/receive buffer for the socket: (The following value is the result of my test on the Linux 2.6.38 x86_64)
Sysctl Net.core.wmem_default #126976
Sysctl Net.core.wmem_max #131071
Sysctl Net.core.wmem_default #126976
Sysctl Net.core.wmem_max #131071
The data that has been sent to the network still needs to be present in send buffer, only after receiving an ACK from the other side, kernel clears this part of the data from the buffer and frees up space for subsequent data transmission. The receiving side will receive the data in the temporary presence of receive buffer, automatic confirmation. However, if the process in which the socket is located does not take the data out of the receive buffer in a timely manner, eventually the receive buffer fills up, and because of the sliding window and congestion control of the TCP, the receiver blocks the sender from sending data to it. These controls occur in the TCP/IP stack, transparent to the application, and the application continues to send data, which eventually causes the send buffer to fill up and the write call to block.
In general, the speed at which the receiving process reads data from the socket does not keep up with the sender's process of writing data to the socket, eventually causing the send-end write call to block.
The read invocation behavior is relatively easy to understand, copying data from the socket's receive buffer into the application's buffer. The read call is blocked, usually because the data on the sending end is not reached.
two. Blocking (default) and Nonblock mode read/write behavior difference:
Setting socket FD to Nonblock (non-blocking) is a common practice in server programming where the pattern of using blocking IO and creating a thread for each client is costly and scalable (with a lot of switching overhead), and more generally, a thread pool + Nonblock i/o+multiplexing (Select/poll, and Linux-specific epoll).
Sets a file descriptor of Nonblock
int set_nonblocking (int fd)
{
int flags;
if (flags = FCNTL (FD, F_GETFL, 0)) = = 1)
flags = 0;
Return Fcntl (FD, F_SETFL, Flags | O_nonblock);
}
A few important conclusions:
1. Read always returns immediately when there is data in the receive buffer, rather than when the given read buffer is filled.
The blocking mode waits only when the receive buffer is empty, and the Nonblock mode returns immediately-1 (errno = Eagain or Ewouldblock)
2. Blocking's write returns only when the buffer is sufficient to lay down the entire buffer (not the same as blocking read)
Nonblock Write returns the number of bytes that can be dropped, followed by a return of-1 (errno = Eagain or Ewouldblock)
A special case for blocking's write is that when write is blocking the wait and the socket is closed opposite to it, write immediately fills the remaining buffer and returns the number of bytes written, and then calls the Write failure (connection reset by peer). This is what the next section is going to mention:
three. Read/write The feedback behavior to the connection exception:
For an application, TCP communication with another process is actually a fully asynchronous process:
1. I don't know when or if I can get my data from the opposite side.
2. I don't know when I can get the data from the opposite side.
3. I do not know when the end of the communication (active exit or abnormal exit, machine failure, network failure, etc.)
For 1 and 2, use write ()-> read ()-> write ()-> read (). Sequence, the application is based on the blocking read or Nonblock read+ polling, and the applications are guaranteed to have the correct processing flow.
For 3,kernel, the "notifications" of these events are returned to the application tier through Read/write results.
Suppose a process a on a machine is communicating with process B on the B machine: A moment A is blocking on the read call to the socket (or nonblock the socket)
When the B process terminates, whether the application explicitly closes the socket (the OS is responsible for shutting down all file descriptors at the end of the process, and for the socket, a FIN packet is sent to the opposite).
"Sync Notifications": Process A calls the read to the socket that has received the fin, and returns eof:0 if the remaining bytes of receive buffer have been read.
Asynchronous notification: If process A is blocking on the read call (as mentioned earlier, the receive buffer must be empty because read returns when the receive buffer has content), the read call immediately returns EOF, and process A is awakened.
When the socket receives the FIN, the call to read returns EOF, but process a can still invoke write because, depending on the TCP protocol, receiving the Fin packet means that the other person will not send any more messages. In a process where both sides normally shut down, receive the remaining data from one end of the fin packet to the opposite (through one or more write), and then close the socket.
But things are far from easy to imagine. Gracefully (gracefully) closes a TCP connection and requires not only the application of both parties to comply with the agreement, but that there is no error in the middle.
If the B process is terminated abnormally, the Send fin package is the OS, the B process no longer exists, and when the machine receives the socket message again, it responds to RST (because the process that owns the socket has terminated). A process to receive the RST socket call write, the operating system will send sigpipe to a process, the default processing action is to terminate the process, know why your process died without warning:
From "Unix network programming, Vol1" 3rd Edition:
"It is okay to write to" a socket that has received a FIN, but it are an error to write to a socket that has received a RST ."
Through the above description, the kernel through the read/write of the socket to the two sides of the connection exception to the application layer notification, although very intuitive, seems to be enough.
Here is a digression:
Do not know if there is no classmate and I have the same feeling: in writing TCP/IP communication, it seems not how to consider the connection of the termination or error, only in the Read/write error back to close the socket, the program seems to be able to run normally, but in some cases will always be a strange problem. Want to deal with all kinds of mistakes perfectly, but find how to do wrong.
One reason is that the socket (or TCP/IP stack itself) is limited in its ability to respond to errors.
Consider such an error condition:
Unlike the B process exit (at which point the OS will be responsible for sending fin packets for all open sockets), when the B machine OS crashes (note is different from the human shutdown, because the shutdown of all processes can still be guaranteed)/host power off/network unreachable, The a process does not receive a hint that the fin package terminates as a connection.
If a process blocks on read, the result can only wait forever.
If a process first write and then block in read, the ack,tcp that does not receive the B machine TCP/IP stack will continue to retransmit 12 times (the time span is approximately 9 minutes) and then return an error on the blocked read call: etimedout/ehostunreach/ Enetunreach
If the B machine recovers the access to a machine at some point and receives a retransmission pack because it is not recognized, it returns a RST, at which point the blocked read call on a process returns an error econnrest
Well, the socket on these errors still have certain feedback ability, the premise is in the opposite can not reach when you still do a write call, not polling or blocking on read, then always in the retransmission cycle to detect errors. Without that write call, the application tier will never receive a notification of a connection error.
Write error finally through read to inform the application layer, a little strange.
Four. What else do I need to do?
At this point, we know that it is not reliable to detect anomalies simply by read/write, and that additional work is required:
1. Use the KeepAlive function of TCP.
Cat/proc/sys/net/ipv4/tcp_keepalive_time
7200
Cat/proc/sys/net/ipv4/tcp_keepalive_intvl
75
Cat/proc/sys/net/ipv4/tcp_keepalive_probes
9
The general meaning of the above parameters is: KeepAlive routine every 2 hours (7,200 seconds) to start once, send the first probe (probe packet), if in 75 seconds did not receive a reply then resend probe, when 9 consecutive probe is not answered, think the connection has been broken. (At this point the read call should be able to return an error, pending test)
But in my impression keepalive is not very good, the default time interval is too long, is the entire TCP/IP stack global parameters: changes will affect other processes, Linux seems to be able to modify the KeepAlive parameters per socket. (People who want to have experience can point it out), but these methods are not portable.
/*peakflys Increase
int keepalive = 1; Turn on keepalive properties
int keepidle = 60; If the connection does not have any data transactions within 60 seconds, probe
int keepinterval = 5; The time interval for the contract is 5 seconds when probing
int keepcount = 3; The number of probe attempts. If the 1th probe packet received a response, then 2 times after the no longer sent.
SetSockOpt (RS, Sol_socket, so_keepalive, (void *) &keepalive, sizeof (KEEPALIVE));
SetSockOpt (RS, Sol_tcp, Tcp_keepidle, (void*) &keepidle, sizeof (Keepidle));
SetSockOpt (RS, Sol_tcp, TCP_KEEPINTVL, (void *) &keepinterval, sizeof (Keepinterval));
SetSockOpt (RS, Sol_tcp, tcp_keepcnt, (void *) &keepcount, sizeof (Keepcount)); sizeof (Keepidle));
SetSockOpt (RS, Sol_tcp, TCP_KEEPINTVL, (void *) &keepinterval, sizeof (Keepinterval));
SetSockOpt (RS, Sol_tcp, tcp_keepcnt, (void *) &keepcount, sizeof (Keepcount)); *///peakflys Increased (PS: It seems that this method does not work in all cases)
2. The heartbeat of the application layer
In strict network procedures, the application layer of the heartbeat protocol is essential. Although it is much more troublesome than TCP's own keep alive (how to correctly implement the heartbeat of application layer, I may use a special article to talk about), but has its biggest advantage: controllable.
Of course, can also be simpler, for the connection to do timeout, closed for a period of time no communication "idle" connection.
transfer from: TCP/IP network programming socket behavior-high-profile coding, low-key person-C + + blog http://www.cppblog.com/peakflys/articles/186021.html