I think that to master the TCP/IP network programming under Linux, at least three levels of knowledge need to be familiar:
1. TCP/IP protocol (e.g. connection establishment and termination, retransmission and acknowledgement, sliding window and congestion control, etc.)
2. Socket I/O system calls (focus like read/write), which is the behavior of the TCP/IP protocol at the application level.
3. Write the performant, scalable server program. including multithreading, IO multiplexing, non-blocking, asynchronous and other technologies.
For the TCP/IP protocol, refer to Richard Stevens's TCP/IP illustrated,vol1 (TCP/IP detailed volume 1).
On the second level, Richard Stevens's "Unix Network Proggramming,vol1" (Unix Web Programming Volume 1) is still recommended, and these two books are recognized as the Bible for UNIX network programming.
As to the third level, UNP's book mentions, also has the famous c10k question, the industry also has the various framework and the solution, I caishuxueqian, here does not apply.
The focus of this paper is on the second level, mainly summarizes the behavior of Read/write system call in TCP/IP network programming under Linux, the knowledge originates from the superficial experience of network programming and summarizes the related chapters of UNIX network programming volume 1. Due to my contact with Linux under the network programming time is not long, errors and omissions again unavoidably, hope crossing generous enlighten.
I. Read/write semantics: Why is it blocking?
Start with the write:
#include <unistd.h>
ssize_t Write (int fd, const void *buf, size_t count);
First, write successfully returns, except that the data in the BUF is copied to the TCP send buffer in kernel. as to when the data is sent to the network, when the host receives the other, when the process is read by the other party, the system call level will not give any guarantee and notice.
Under what circumstances will write block? When the send buffer for the socket of kernel is full. For each socket, you have your own send buffer and receive buffer. Starting with Linux 2.6, two buffer sizes are automatically adjusted by the system (autotuning), but typically float between default and Max.
# Gets the size of the Send/receive buffer for the socket: (The following values are the results I tested on the Linux 2.6.38 x86_64)
Sysctl Net.core.wmem_default #126976
Sysctl Net.core.wmem_max #131071
Sysctl Net.core.wmem_default #126976
Sysctl Net.core.wmem_max #131071
Data that has been sent to the network still needs to be temporarily present in send buffer, and only after receiving an ACK from the other, kernel clears this part of the data from buffer and frees up space for subsequent data transmission. The receiving end will receive the data temporarily in the receive buffer, which is automatically confirmed. However, if the process in which the socket is located does not remove the data from the receive buffer in a timely fashion, the receive buffer fills up, and the receiving side prevents the sending side from sending data to it due to the TCP sliding window and congestion control. These controls occur in the TCP/IP stack, are transparent to the application, and the application continues to send data, eventually causing the send buffer to fill up and the write call to block.
In general, because the receiver process reads data from the socket at a speed that does not keep up with the speed at which the sending process writes data to the socket , it eventually causes the send-side write call to block.
The behavior of the read call is relatively easy to understand, copying data from the socket's receive buffer to the application's buffer. The read call is blocked, usually the data on the sending side is not reached.
Two. Differences in Read/write behavior in blocking (default) and Nonblock modes:
Setting the socket FD to Nonblock (non-blocking) is a common practice in server programming, where the pattern of blocking IO and creating a thread for each client is expensive and poorly scalable (with significant switching overhead), and a more general practice is to use the thread pool + Nonblock i/o+multiplexing (Select/poll, and the unique epoll on Linux).
12345678 |
//set a file descriptor of nonblock int set_nonblocking ( int fd) { &NBSP;&NBSP;&NBSP;&NBSP; int flags; &NBSP;&NBSP;&NBSP;&NBSP; if &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP; flags = 0; &NBSP;&NBSP;&NBSP;&NBSP; return fcntl (FD, F_SETFL, Flags | O_nonblock); |
A few important conclusions:
1. Read always returns as soon as the receive buffer has data, rather than waiting for the given read buffer to fill.
The blocking mode waits only when receive buffer is empty, and returns 1 (errno = Eagain or Ewouldblock) in Nonblock mode immediately
2. Blocking write is returned only if the buffer is sufficient to drop the entire buffers (not the same as blocking read)
Nonblock Write returns the number of bytes that can be dropped , followed by a return of 1 (errno = Eagain or Ewouldblock)
For blocking write there is a special case: When write is blocking the socket when it is blocked waiting, write will immediately fill the remaining buffer and return the number of bytes written, and the write fails again (connection reset by peer), This is the next section to mention:
Three. Read/write feedback behavior for connection exceptions:
For an application, TCP communication with another process is actually a completely asynchronous process:
1. I don't know when and whether I can get my data from the other side.
2. I don't know when I'm going to be able to receive data from the opposite
3. I do not know when the end of communication (active exit or abnormal exit, machine failure, network failure, etc.)
For 1 and 2, write ()-read ()-> with write () Sequence, the application is based on the blocking read or Nonblock read+ polling method, which guarantees the correct process flow.
For 3,kernel returns the "notification" of these events to the application layer through Read/write results.
Suppose a process a on a machine is communicating with process B on the B machine: A moment A is blocking on the read Call of the socket (or nonblock the socket)
When the B process terminates, regardless of whether the application explicitly shuts down the socket (the OS is responsible for closing all file descriptors at the end of the process, a FIN packet is sent to the opposite socket).
"Sync Notification": Process A calls read for a socket that has received fin, and returns eof:0 if the remaining bytes of receive buffer have been read
Asynchronous notification: If process A is blocking on the read call (as mentioned earlier, the receive buffer must be empty at this point, because read returns when the receive buffer has content), the read call immediately returns EOF, and process A is awakened.
After the socket receives fin, although the call to read returns EOF, process A can still call write because the TCP protocol, receiving the other's fin package only means that the other party will not send any more messages . In a process where both sides normally shut down, the end of the fin packet is received to send the remaining data to the opposite side (by one or more write), and then the socket is closed.
But things are far from easy to imagine. gracefully (gracefully) closes a TCP connection that requires not only the application of both parties to abide by the Convention, but also that there is no error in the middle.
If the B process is abnormally terminated, sending the FIN packet is the OS, and the B process no longer exists, and when the machine receives the message from the socket again, it responds to the RST (because the process that owns the socket has been terminated). A process sends a sigpipe to the socket when it receives the RST, and the default processing action is to terminate the process, knowing why your process died without warning:)
From "Unix Network programming, Vol1" 3rd Edition:
"It is okay-to-write to a sockets that have received a FIN, but it's an error to write to a socket, which has received an RST ."
Through the above narrative, the kernel through the socket of the read/write will be the connection between the two sides of the application layer , although very not intuitive , it seems enough.
Here's an off-topic:
I do not know if there are classmates and I have the same feeling: in writing TCP/IP communications, it seems that there is no way to consider the termination or error of the connection, only when the Read/write error returned when the socket is closed, the program seems to function normally, but in some cases there will always be strange problems. Want to deal with all kinds of mistakes perfectly, but find out how to do wrong.
One reason is that thesocket (or TCP/IP stack itself) has limited ability to respond to errors.
Consider such an error condition:
Unlike the B-process exit (at this point the OS will be responsible for sending fin packets for all open sockets) when the B machine's OS crashes (note differs from man shutdown because all process exit actions are guaranteed during shutdown)/ host power off / When the network is unreachable , the a process does not receive a fin package as a hint of termination of the connection.
If a process is blocked on read, then the result can only be waiting forever.
If the a process first write and then block in read, because the ack,tcp of the TCP/IP stack of the B machine will continue to retransmit 12 times (the time span is approximately 9 minutes), and then return an error on the blocked read call: etimedout/ehostunreach/ Enetunreach
If the B machine recovers the path to the a machine exactly at some point and receives a a retransmission pack, because it is not recognized, it returns an RST, at which time the blocked read call on the a process returns an error econnrest
Well, the socket has a certain amount of feedback on these errors, provided that you still make a write call when the opposite is unreachable , instead of polling or blocking on read, the error will always be detected during the retransmission cycle. If there is no write call, the application layer will never receive a connection error notification.
Write error finally through read to notify the application layer, a bit quirks?
Four. What else do I need to do?
At this point, we know that it is not reliable to detect anomalies by read/write only, and some additional work is required:
1. Use the keepalive feature of TCP?
Cat/proc/sys/net/ipv4/tcp_keepalive_time
7200
Cat/proc/sys/net/ipv4/tcp_keepalive_intvl
75
Cat/proc/sys/net/ipv4/tcp_keepalive_probes
9
The general meaning of the above parameters is: KeepAlive routine every 2 hours (7,200 seconds) to start once, send the first probe (probe packet), if not received in 75 seconds of the other party to resend the probe, when 9 consecutive probe is not answered, the connection is considered broken. (This time the read call should be able to return an error, pending test)
But in my impression keepalive not very good, the default time interval is too long, but also the whole TCP/IP stack global parameters: the modification will affect other processes, Linux seems to be able to modify the KeepAlive parameters per socket? (People who want to have experience with them can point it out), but these methods are not portable.
2. The heartbeat of the application layer
In a strict network program, the application layer's heartbeat protocol is essential. Although more cumbersome than the TCP keep alive (how to correctly implement the application layer's heartbeat, I may use a special article to talk about), but has its greatest advantage: controllable.
Of course, it can be simpler to do a timeout for the connection and close the "idle" connection for a period of time without communication. Here you can refer to an article:
Muduo Network Programming Example eight: Timing wheel kicks off idle connection by Aboutspeaker
Resources:
"TCP/IP Illustrated, vol 1" by Richard Stevens
"Unix Network Programming, vol 1" (3rd Edition) by Richard Stevens
Linux TCP Tuning
Using TCP keepalive under Linux
Original link
On the socket behavior in TCP/IP network programming