The TCP/IP programming interface provides a variety of system calls to help you effectively use the protocol. A wide range of TCP stack code, deep into the kernel level of the complete call sequence can help you understand the TCP stack. In this article, you will review and learn more about the TCP call sequence, including references to FreeBSD, and important function calls that occur on the TCP stack after a system call at the user level.
Introduction
A typical TCP client and server application obtains certain functions by publishing a sequence of TCP system calls. These system calls include sockets (), bind (), listen (), accept (), send (), and receive (). This article describes what happens at a lower level when an application publishes a TCP system call, as shown in Figure 1.
Figure 1. Common call sequences for TCP applications
Figure 2 shows the layers that TCP system calls propagate before they are sent on a physical link.
Figure 2. Each layer of a TCP system call
The socket layer receives any TCP system calls made. The sockets layer verifies the correctness of the parameters passed by the TCP application. This is a layer that is separate from the Protocol because the Protocol has not been connected to the call.
Below the sockets layer is the protocol layer, which contains the actual implementation of the Protocol (TCP in this case). When the socket layer calls the protocol layer, it ensures exclusive access to the data structures that are shared between the two tiers. This is done to avoid any data structure corruption.
Various network device drivers run at the interface layer, which receives data from the physical link and transmits data to the physical link.
Each socket has a socket queue, and each interface has an interface queue for data communication. However, for the entire protocol layer, there is only one protocol queue called an IP input queue. The interface layer enters data into the protocol layer through this IP input queue. The protocol layer outputs data to an interface using the appropriate interface queues.
In this article, you will learn the following system calls: Socket Bind Listen Accept Connect Shutdown close Send Receive
Socket
Socket (struct proc *p, struct Socket_args *uap, int retval)
struct Sock_args
{
int domain,
int type,
int protocol;
};
|
In the socket system call: P is a pointer to the proc structure of the process that makes the socket call. UAP is a pointer to a SOCKET_ARGS structure that contains parameters that are passed to the process in the socket system call. RetVal is the return value of the system call.
The socket system call creates a new socket by assigning a new descriptor. Returns a new descriptor to the calling process. Any subsequent system calls use the socket identifier that was created. The socket system call also assigns a protocol to the created socket descriptor.
The domain, type, and protocol parameter values specify the series, types, and protocols to assign to the created socket. Figure 3 shows the call sequence.
Figure 3. Call sequence for the socket system call
After retrieving the parameters from the process, the socket function calls the Socreate function. The Socreate function discovers pointers to the protocol switching PROTSW structure based on parameters specified by the process. The Socreate function then assigns a new socket structure. The protocol-specific call Pr_usrreq is then switched to the corresponding protocol-specific request associated with the socket descriptor. The prototype of the Pr_usrreq function is:
int pr_usrreq (struct sockets *so, int req, struct mbuf *m0 , *M1, *m2);
|
In the Pr_usrreq function: So is a pointer to the socket structure. The function of the Req is to identify the request. This example is Pru_attach. M0, M1, and M2 are pointers to the MBUF structure. The value varies depending on the request.
The Pr_usrreq function provides services for approximately 16 requests.
The Tcp_usrreq () function calls Tcp_attach () to handle the Pru_attach request. To assign an Internet protocol control block, call In_pcballoc (). In In_pcballoc, the kernel's memory allocator function is called, which allocates memory to the Internet control block. After all necessary Internet control block structure Pointer initialization is completed, the control is returned to Tcp_attach ().
Assigns a new TCP control block and initializes it in TCP_NEWTCPCB (). It also initializes all the TCP timer variables and controls the return to Tcp_attach (). The socket state is now initialized to CLOSED. When returned to the Tcp_usrreq function, a socket descriptor is created to point to the TCP control block of the socket.
The Internet control block is a two-way linked circular list whose pointers point to the socket structure, while the SO_PCB portion of the socket structure points to the Internet control block structure. The Internet control block also has a pointer to a TCP control block. For more detailed information about Internet control blocks and the structure of the TCP control block, see the Resources section.
Bind
Bind (struct proc *p, struct Bind_args *uap, int *retval)
struct Bind_args
{ int s;
caddr_t name;
int namelen;
};
|
In the BIND system call function: S is the socket descriptor. Name is a pointer to a buffer that contains the network transport address. Namelen is the size of the buffer.
The BIND system call associates the local network transport address with a socket. For client processes, it is not mandatory to publish bind calls. When the client process publishes the Connect system call, the kernel is responsible for performing implicit binding. It is usually necessary to publish an explicit binding request before a server process accepts a connection or initiates communication with the client.
The bind call replicates the local address specified by the process to MBUF and invokes Sobind, which then invokes Tcp_usrreq () on request using Pru_bind. The toggle instance in Tcp_usrreq () calls In_pcbbind (), which binds the local address and port number to the socket. The In_pcbbind function first performs some integrity checks to ensure that the socket is not bound two times, and that at least one interface assigns an IP address. In_pcbbind is responsible for implicit and explicit binding.
If the second argument in a call to In_pcbbind () (pointing to a pointer to a SOCKADDR_IN structure) is non-null, an explicit binding occurs. In other cases, an implicit binding occurs. For explicit binding, perform a check on the bound IP address and set the socket options accordingly.
Figure 4. Call sequence for the BIND system call
If the specified local port is a value other than 0, the Superuser privilege is checked to determine if the binding is on a reserved port (for example, according to the Berkley convention, port number < 1024). Then call In_pcblookup () to find the control block with the local IP address and the local port number that are mentioned. In_pcblookup () verifies that the local address and port pair are still unused. If the second parameter in In_pcbbind () is NULL, or if the local port is zero, the control fails and the temporary port is checked (for example, according to the Berkley convention, 1024 < port number < 5000). Then call In_pcblookup () to verify that the discovered port is not being used.
Listen
Listen (struct proc *p, struct Listen_args *uap, int *retval)
struct Listen_args
{int s;
int backlog;
};
|
In a listen system call: S is a socket descriptor. Backlog is the queue limit for the number of connections on a socket.
The listen call indicates the protocol that the server process is prepared to accept for any new incoming connections on the socket. There is a limit to the number of connections that can be arranged, ignoring any further connection requests after the number of connections.
Listen system calls call Solisten using the socket descriptor and the backlog value specified in the Listen call. Solisten calls the Tcp_usrreq function using only pru_listen as a request. In the toggle statement of the Tcp_usrreq () function, the Pru_listen instance checks to see if the socket is bound to the port. If the port is zero, call In_pcbbind () and bind the socket to a port (as described in the Bind section).
If a listening socket already exists on the port, the status of the socket is changed to LISTEN. Typically, all server processes listen to well-known port numbers. In_pcbbind is rarely invoked to perform an implicit binding of the server process. Figure 5 shows the sequence of calls that are listening.
Figure 5. Call sequence for listen system calls
Accept
Accept (struct proc *p, struct Accept_args *uap, int *retval);
struct accept_args
{
int s;
caddr_t name;
int *anamelen;
};
|
In a accept system call: S is a socket descriptor. Name is a buffer (out parameter) that contains the network transport address of the foreign host. Anamelen is the size of the name buffer.
The Accept system call is a blocking call waiting for incoming connections. When a connection request is processed, accept returns a new socket descriptor. Connect this new socket to the client so that another socket s remains in the LISTEN state to accept further connections.
Figure 6. Call sequence for accept system calls
The Accept call first validates the parameter and waits for the connection request to arrive. Before this, the function blocks in the while loop. The protocol layer wakes up the server process after the new connection arrives. Accept then checks for any socket errors that occur when the function is blocked. If any socket errors exist, the function returns and continues to pick up a new connection from the queue and invoke soaccept. Call the Tcp_usrreq () function in soaccept () and use the request as a pru_accept. The switch in the Tcp_usrreq function calls In_setpeeraddr (), which copies the foreign IP address and the foreign port number from the protocol control block and returns it to the server process.
Connect
Connect (struct proc *p, struct Connect_args *uap, int *retval);
struct Connect_args
{
int s;
caddr_t name;
int namelen;
};
|
In the Connect system call: S is the socket descriptor. Name is a pointer to a buffer with a foreign ip/port address pair. Namelen is the length of the buffer.
The client process typically invokes the Connect system call to connect to the server process. If the client process does not explicitly publish the BIND system call before the connection is initialized, the stack is responsible for implicit binding on the local socket.
The Connect system call copies the foreign address (the need to send the connection request to the address) from the process to the kernel and invokes Soconnect (). When returned from Soconnect (), the Connect () function enters the sleep-like body until the protocol layer wakes it up and indicates that the connection is established or there is an error on the socket. The Soconnect () function checks the valid state of the socket and invokes Pr_usrreq () using Pru_connect as the request.
The toggle instance in the Tcp_usrreq () function checks the binding of the socket to the local port. If an unbound socket is not bound, the In_pcbbind () that performs the implicit binding is invoked. Then call In_pcbconnect () to get the route to the destination, discover the interface that must output the socket, and verify that the foreign socket pair (IP address and port number) specified by connect () are unique. It then updates its Internet control block with the foreign IP address and port number, and returns to the Pru_connect sample statement.
Tcp_usrreq () now calls Soisconnecting (), which sets the state of the socket on the client host to Syn_sent. Call the function tcp_output and output the SYN packet to the network. Control now returns to the Connect () function, which is asleep until the protocol layer wakes up-indicating that the connection is now established, or that there is an error on the socket.
Figure 7. Call sequence for connect system calls
3 Handshake to TCP
Figure 8, Figure 9, and Figure 10 show the sequence of calls when client publishing connect and server publishing accept to indicate and establish a TCP connection.
Figure 8. Flow sequence for SYN packets
When the client publishes connect, the Tcp_output () function is invoked at the protocol layer to output the SYN packet to the interface. As shown in Figure 9, Soconnect now returns to the Connect () function and goes to sleep. The socket state on the client is now syn_sent. The interface layer calls If_output () (which is actually an interface-specific output function) and sends the package to the n/w.
The interface on the destination (server) receives the incoming SYN packet, puts it in the IPINTRQ queue, and causes a software outage. The package is then fetched by the IPINTR () that invoked the Tcp_input routine. The Tcp_input () executes when the s/w is interrupted, picks up the SYN package from the IPINTRQ, processes it, and places the partially completed socket connection into the completed socket queue. The server-side socket state is now SYN_RCVD. After each processing, the Tcp_input () routine calls Tcp_output () (If a response socket needs to be sent to the other end).
Figure 9. Flow sequence for SYN ACK packets
After the SYN is processed, the server sends a SYN ACK packet using the Tcp_output (), Ip_output () and If_output () sequences. The n/w interface on the client receives the package, places it in the IPINTRQ, and causes a s/w interrupt. Similarly, IPINTR () obtains the package from IPINTRQ and passes it to the tcp_input () routine on the client TCP stack. The package is now processed and calls the soisconnected (), which wakes the connection call. The socket state on the client is now established.
Figure 10. Stream sequence used for ACK packets
The Tcp_input () routine on the client processes the SYN ACK packet and calls Tcp_output () to return the ACK packages to the server. The Tcp_input () on the server side handles this ACK package and calls Soisconnected (). This function removes the socket from the socket queue that was never completed and puts it into the completed socket queue, and then calls Wakeup () to wake the accept call. The server-side socket is now established.
Shutdown
Shutdown (struct proc *p, struct Shutdown_args *uap, int *retval);
Struct Shutdown_args
{
int s;
int how;
}
|
In a shutdown system call: S is a socket descriptor. How to specify which part of the connection will be closed. How the values 0, 1, and 2 specify the read and write portions of the closed connection, the write part, and the connection close at the same time.
Shutdown system calls close either end or both ends of the connection. If you need to turn off the Read section, any data that exists in the receive buffer is discarded and the connection is closed. For the write part, TCP sends any remaining data, and then terminates the write end of the connection.
Figure 11. Call sequence for shutdown system calls