"UNIX Network Programming" reading notes

Source: Internet
Author: User
Tags ack terminates unix domain socket keep alive

UDP and TCP

UDP (User Datagram Protocol, Subscriber Datagram Protocol) is a no-connection protocol that does not guarantee that the UDP datagram will reach its final destination, does not guarantee that the order of the datagrams remains unchanged across the network, or that each datagram arrives only once.

UDP provides a service that is not connected because there is no long-term relationship between the UDP client and the server. A UDP client can send datagrams to multiple servers using a single socket, and a UDP server can also receive datagrams from different customers using the same socket.

Each UDP datagram is a length, and the length of the datagram is passed along with the data to the receiving process, while TCP is a byte stream protocol with no record boundaries.

TCP (transmission Control PROTOCL, transmission Protocol) is a connection-oriented protocol that provides a reliable full-duplex byte stream for user processes.

TCP provides a connection between the client and the server. The TCP client first establishes a connection to a given server, then exchanges data across that connection with that server, and then terminates the connection.

TCP provides reliability. When TCP sends data to the other end, it requires an acknowledgment to be returned to the end. If no acknowledgement is received, TCP automatically retransmissions the data and waits longer for a few retransmission failures before it is discarded. TCP contains algorithms for dynamically estimating round-trip time between customers and servers (round-trip Time,rtt) to know how long it will take to wait for a confirmation. TCP Sorts the data sent by associating a sequence number to each of these bytes, and the receiving end is reordered based on the sequence number of the received section, and the duplicate data is judged and discarded based on the serial number.

TCP provides traffic control. TCP always tells the peer at any time how many bytes of data it can receive at a time from the peer, which is called the advertisement window, which indicates the amount of space currently available in the receive buffer, thus ensuring that the data sent by the sending side does not overflow the receive buffer.

TCP connections are full-duplex, which means that applications on a given connection can send data and receive data at any point in and out of two directions.

TCP connection Setup and termination

TCP establishes a connection that requires 3 sub-nodes, called the TCP three-way handshake, while terminating a connection requires 4 sub-sections. Shows the actual packet switching of a complete TCP connection, including 3 stages of connection establishment, data transfer, and link termination, as well as showing the TCP status of each endpoint.

Each SYN option can contain multiple TCP options, and the following are common options.

    • MSS option. The TCP end of the Send SYN uses this option to advertise its maximum section size, MSS (Maximum Segment size), which is the maximum amount of data it is willing to accept in each TCP subsection of this connection.
    • Window sizing options. The maximum window size that TCP can advertise is 65535 because the corresponding field in the TCP header is 16 bits.
    • Timestamp options. It prevents data corruption that can result from a lost and reproduced grouping.

TCP involves connection establishment and connection termination operations that can be illustrated with a state transition diagram.

There are two reasons for the existence of the TIME_WAIT state:

    1. The termination of a TCP full-duplex connection is reliably implemented. Assuming that the final ACK is lost, the server will resend its final fin, so the customer must maintain the status information to allow it to resend the final ack.
    2. Allow old repeating sections to fade in the network. Assuming that another connection is established between the same IP address and port after closing a connection for a period of time, that is, the avatar of the previous connection, TCP must prevent the old repeating grouping from a connection from reproducing after the connection has terminated, thus being misunderstood as belonging to the grouping of its avatars. Therefore, TCP will not initiate a new avatar for the connection in the TIME_WAIT state, the Time_wait status duration is 2MSL (Maximum Segment Lifetime, Maximum sub-section life), is sufficient to allow the grouping in one direction and the other to answer up to the maximum survival of the MSL is discarded.

A TCP socket pair is a four-tuple that defines the two endpoints of the connection: a local IP address, a local TCP port number, a foreign IP address, a foreign TCP port number, and a socket pair that uniquely identifies each TCP connection on a network.

Buffer size and limits

Many networks have an MTU (Maximum transmission Unit, Maximum transmission unit) that can be specified by hardware, such as the MTU of Ethernet is 1500 bytes. The path MTU between the two hosts is known as the Path MTU, and in the opposite two directions of the two hosts, the paths can be inconsistent, because the Internet routing is often asymmetric. When an IP datagram is emitted from an interface, if its size exceeds the MTU,IP of the corresponding link, the shards will not be reorganized until they reach their final destination.

Both IPV4 and IPv6 define the minimum reassembly buffer size, which is a minimum datagram size that must be guaranteed for any implementation of IPV4 or IPV6, with a value of 576 bytes for IPv4.

The MSS of TCP is used to advertise to the peer TCP the maximum amount of TCP data that can be sent in each subsection. The purpose of MSS is to tell the actual value of its reassembly buffer size to the end so that the view avoids sharding. MSS is usually set to MTU minus IP and TCP header fixed length (all 20 bytes), such as MSS with IPV4 in Ethernet 1460 (1500-20-20).

Each TCP socket has a send buffer, and when an application process calls write, the kernel copies all the data from the buffer of the application process to the send buffer of the write socket. For blocking sockets, if the send buffer does not contain all the data for the app process, the process is put to sleep until all the data in the app process buffer is copied to the socket send buffer. Therefore, the successful return of the write call from writing a TCP socket simply means that we can reuse the original application process buffer and does not indicate that the peer TCP or application process has received the data. TCP extracts the data from the socket send buffer and sends it to the peer-to-peer TCP, which can discard the confirmed data from the send buffer after the peer Ack arrives. TCP data is passed through IP to the data link, each data link has an output queue, if the queue is full, the new packet will be discarded, and the protocol stack up to return an error to TCP,TCP will notice this error, and at a later time to retransmit the corresponding sub-section, the process is transparent to the application process.

UDP is unreliable, it does not have to save a copy of the application process data, so the UDP socket does not have a send buffer, but there is a send buffer size, which indicates the upper limit of the size of the UDP datagram that can be written to the socket, if an application process writes a datagram that is larger than the socket sending buffer size, The kernel returns a emsgsize error to the process. Since no TCP-like MSS,UDP application process is sending large datagrams, it is more likely to be fragmented than TCP. A write call that writes a UDP socket successfully returns a written datagram or all its fragments have been paid into the data-link-layer output queue, and if the queue does not have enough space to store the datagram or a fragment of it, the kernel usually returns an application process that enobufs to it.

Socket address Structure

The IPV4 socket structure is sockaddr_in, defined in header file <netinet/in.h>.

Socket function in order to support any protocol family socket address structure, using a generic socket address structure pointer as a parameter, header file <sys/socket.h> defines this generic socket address structure sockaddr.

Here is the function that the network address translates between the dotted decimal number string and the network byte-order binary value.

INET_ADDR returns Inaddr_none when an error occurs (typically a value of 32 for both 1), which means that 255.255.255.255 cannot be handled by the function. Now that the inet_addr has been discarded, the new code should use the Inet_aton function instead.

The string that the Inet_ntoa return value points to resides in static memory, which means that the function is non-reentrant.

TCP Socket Programming

The socket functions for basic TCP client/server programs are used as follows:

Socket function

In order to perform network IO, the first thing a process must do is call the socket function.

Where family indicates the protocol domain:

The type parameter indicates the type of socket:

Connect function

The TCP client uses the Connect function to establish a link to the TCP server.

In the case of a TCP socket, calling the CONNECT function triggers the TCP three-way handshake process and returns only if the connection is successful or an error occurs, possibly in the following cases.

    1. If the TCP client does not receive a response from the SYN section, a etimedout error is returned after a certain retry.
    2. If the response to the customer's SYN is RST, then the server host does not have a process waiting to be connected to the port we specify, and the customer receives the RST return econnrefused error.
    3. If a customer sends a SYN that throws a "Destination Unreachable" ICMP error on one of the routers in the middle, it returns a Ehostunreach or Enetunreach error after a certain retry.
Bind function

The BIND function assigns a local protocol address to a socket.

Bind can specify either a port number or a specified IP address, or both, or none of the specified.

If a TCP client or server does not call bind to bind a port, the kernel will select a temporary port for the appropriate socket when you call connect or listen.

The TCP server binds to an IP, which qualifies the socket to receive only the connection of the destination address for that IP. TCP clients typically do not bind IP, and when connecting sockets, the kernel chooses the source IP based on the Egress network interface. If the TCP server is not bound to the IP, the kernel sends the client's SYN destination address as the server's source IP.

A common error that is returned from the BIND function is Eaddrinuse ("address aready in use", which is already used).

Listen function

The Listen function converts an unbound socket into a passive socket, indicating that the kernel should accept connection requests to that socket.

The backlog parameter specifies the maximum number of connections that the kernel should queue for the corresponding socket. The kernel maintains a queue for any given listener socket:

    1. The connection queue is not completed, each of which corresponds to one of the following: A customer has been issued and reached the server, and the server is waiting to complete the corresponding TCP three-way handshake process.
    2. The link queue has been completed, and one of the customers who have completed the TCP three-way handshake process corresponds.

When a client SYN arrives, if these queues are full, TCP ignores the sub-section, that is, does not send the RST.

Accept function

The Accept function is used to return the next completed connection from the completed connection queue.

If Accpet succeeds, it returns the descriptor of the connected socket, which is a new socket different from the listener socket. We call it the first parameter to be a listener socket descriptor, which says its return value is a connected socket descriptor.

If we are not interested in returning the client agreement address, we can place both the cliaddr and the Addrlen as null pointers.

Close function

The close function is used to close the socket and terminate the TCP connection.

The default behavior of the close one TCP socket is to mark the socket as closed and then immediately return to the calling process. TCP will attempt to send any data that has been queued for sending to the peer, and a normal TCP connection termination sequence occurs after the send is completed.

Closing a connected socket only causes the reference count of the corresponding descriptor to be reduced by 1 if the reference count value is still greater than the 0,close call and does not raise the TCP four packet connection termination sequence. If you do want to send a fin on a TCP connection, you can use the shutdown function instead of close.

Recv and send functions (305)

GetSockName and Getpeernanme functions

These functions either return the local protocol address (GETSOCKNAME) associated with a socket, or return the field protocol address (getpeername) associated with a socket.

Recv and send functions

TCP exception Condition server process terminated

After the server process terminates, all open descriptors in the process are closed, which causes a fin to be sent to the client, while the client TCP responds with an ACK.

If the customer continues to send data to the server and the server receives data from the customer, it responds to an RST because the process that opened the socket previously terminated.

However, the client process does not see this RST, and the call to read will immediately return 0 due to the previous received Fin. If the process ignores the error and continues to send data to the server, a epipe error is returned.

When a process performs a write operation to a socket that has received an RST, the kernel sends a SIGPIPE signal to the process, and the write operation will epipe the error.

Server Host Crashes

When the server host crashes, the client TCP continues to retransmit the data in a sub-section, attempting to receive an ACK from the server. Returns an error to the client process when the client TCP finally discards. Assuming that the server host has crashed to not respond to the customer's data sub-section, the error returned is Etimedout, and if an intermediate router determines that the server host is unreachable, in response to a "Destination unreachable" ICMP message, Then the error returned is Ehostunreach or Enetunreach.

Server host crashes after restarting

When the server host crashes and restarts, its TCP loses all connection information for the crash money, so server TCP responds to a single RST for the data received from the customer in a sub-section.

When the client TCP receives the RST, the client's read call returns a Econnreset error.

Server Main Office Machine

When the system shuts down, the INIT process usually sends a sigterm signal to all processes before sending a sigkill signal to all processes that are still running. When the server process terminates, all its open descriptors are closed.

I/O model
    • Blocking I/O. By default, all sockets are blocked.
    • Non-blocking I/O. The process sets a socket to non-blocking in the notification kernel: when all requested I/O operations have to put the process to sleep, do not put the process into sleep, but return an error.
    • I/O multiplexing model. I/O multiplexing calls Select or poll, blocking on one of these two system calls, rather than blocking on a real I/O system call.
    • Signal-driven I/O model. Let the kernel send the SIGIO signal notification process when the descriptor is ready.
    • asynchronous I/O model. The function works by telling the kernel to initiate an action and letting the kernel notify us when the entire operation is complete.
Socket options

The following methods are available to get and set options that affect sockets:

    • GetSockOpt and setsockopt functions
    • Fcntl function
    • IOCTL function
GetSockOpt and setsockopt functions

GetSockOpt and setsockopt are used only for sockets.

The socket options that you can get and set are as follows:

So_keepalive

After you set the Keep Alive option for a TCP socket, TCP automatically sends a Keep Alive detection section to the peer if there is no data exchange in either direction of the socket within 2 hours.

So_linger

This option specifies how the close function operates on a connection-oriented protocol. The default action is close to return immediately, but if there is data remaining in the socket send buffer, the system will try to send the data to the peer. This option passes the following structure between the user process and the kernel, which is defined in the header file <sys/socket.h>:

This option has the following scenarios:

    1. If L_onoff is 0, then this option is turned off, and the TCP default setting takes effect, that is, close returns immediately.
    2. If L_onoff is not 0 and L_linger is 0, TCP terminates the connection when a connection is close. That is, TCP discards any data that remains in the socket send buffer and sends an RST to the peer, without the usual four packet termination sequence, thus avoiding the time_wait state of TCP.
    3. If the L_onoff is non-0 and L_linger is not 0, the kernel will delay for a period when the socket is closed. That is, if the data remains in the socket send buffer, the process is put to sleep until the data has been sent and the peer confirms or delays the time.
So_rcvbuf and So_sndbuf

For TCP, the size of the available space in the socket receive buffer limits the size of the TCP Advertisement peer window. For UDP, when the received datagram is not loaded into the socket receive buffer, the datagram is discarded.

Because the TCP window sizing option is used when establishing a connection, the SYN section is swapped with the peer. For the client, the SO_RCVBUF option must be set before calling connect, and for the server, this option must be set for the listening socket before calling listen.

So_rcvtimeo and So_sndtimeo

These two options allow us to set a timeout value for the receive and send of the socket.

So_reuseaddr

This option can play the following 4 different roles:

    1. Allows a listening server to be started and bound to a port, even if a previously established connection to use the port as their local port still exists.
    2. Allows multiple instances of the same server to be started on the same port, as long as each instance binds to a different local IP address.
    3. Allows a single process to bind the same port to multiple sockets, whenever a different local IP address is bound.
    4. Allow a fully duplicated binding, that is, the same IP address and port can be bound to multiple sockets, this feature only supports UDP sockets.
Tcp_nodelay

Turning on this option disables the TCP Nagle algorithm, which is initiated by default. The purpose of the Nagle algorithm is to reduce the number of small packets on the WAN. The idea of this algorithm is that if the data to be confirmed on a given connection is to be acknowledged, the behavior of sending the corresponding small group immediately on the connection as a response to the user's write operation will not occur.

Fcntl function

The FCNTL function performs various descriptor control operations.

FCNTL provides network programming-related features mainly set up non-blocking I/O, by using the F_SETFL command to set the O_nonblock file status flag, a socket can be set to non-blocking type. The typical code is as follows:

UDP socket Programming

The socket functions used by the UDP client/server program are as follows:

Recvfrom and SendTo functions

The flags are always set to 0.

It is possible to write a datagram of length 0, which will form an IP datagram that contains only the IP header and UDP header without data. Recvfrom returns 0 is acceptable: it does not resemble a TCP socket on read return 0 to indicate that the connection is closed to the end.

If the from parameter of the recvfrom is a null pointer, then the corresponding Addrlen must also be a null pointer, indicating that it does not care about the protocol address of the sender of the data.

The client's temporary port is selected at the first call to SendTo and cannot be changed, but the client's IP address can be varied with each UDP datagram sent by the customer.

The server process is not running

If the server process is not running, the client datagram is emitted, and the server host responds to a "Port unreachable" ICMP message, but this ICMP error is not returned to the client process. We call this ICMP error an asynchronous error, caused by sendto, but the sendto itself returns successfully. One basic rule is that for a UDP socket, the asynchronous error thrown by it is not returned to it unless it is connected.

The Connect function of UDP

We can call connect for the UDP socket, and the kernel simply checks for an immediately known error (such as an apparently unreachable destination), logs the IP address and port number of the peer, and then immediately returns to the calling process.

The UDP client process or server process can call connect only if it uses its own UDP socket to communicate with the identified unique peer.

For connected UDP sockets, the following are different than the default non-connected UDP sockets:

    1. You cannot specify the destination IP address and port number for the output operation, that is, write or send instead of using SendTo.
    2. Instead of using recvfrom to learn about the sender of the datagram, use read, recv, or recvmsg instead. On a connected UDP socket, datagrams returned by the kernel for an input operation are only those datagrams from the protocol address specified by connect.
    3. Asynchronous errors raised by connected UDP sockets are returned to the process they are in, and the UDP socket is not connected without receiving any asynchronous errors.

For TCP sockets, connect can only be called once. For a connected UDP socket, you can call connect again for one of these two purposes: Specify a new IP address and port number, and disconnect the socket.

UNIX Domain protocol

The UNIX domain protocol is not an actual protocol family, but rather a method of performing client/server communication on a single host. The UNIX domain provides two types of sockets: a byte-stream socket (similar to TCP) and a datagram socket (similar to UDP). UNIX domain sockets are used for the following reasons:

    1. In some systems, UNIX domain sockets communicate faster than TCP sockets that reside on the same host.
    2. UNIX domain sockets can be used to pass descriptors between different processes on the same host.
    3. UNIX domain sockets The client's credentials (user ID and group ID) are provided to the server to provide additional security checks.
Address structure

The UNIX domain socket address structure is defined in header file <sys/un.h>:

Socketpair function

The Socketpair function creates two connected sockets.

The family parameter must be af_local,protocol parameter must be 0,type parameter can be either Sock_stream, or SOCK_DGRAM, the newly created two socket descriptor as sockfd[0] and sockfd[1] return.

Socket functions

The pathname specified in the Connect call must be a path name that is currently bound on an open UNIX domain socket, and their socket types must also be identical.

UNIX domain byte-stream sockets are similar to TCP sockets: They all provide a byte-stream interface with no record boundaries for the process; UNIX domain datagram sockets are similar to UDP sockets: They all provide an unreliable datagram service that preserves record boundaries.

Sending datagrams on an unbound UNIX domain socket does not automatically bundle a pathname to the socket, which differs from the UDP socket.

"UNIX Network Programming" reading notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.