5 hidden dangers in the programming of Linux sockets
Developing reliable Web applications in heterogeneous environments
M. Tim Jones ([email protected]), Senior software engineer, Emulex
Introduction: The Socket API is a standard API for practical application development in Web applications. Although the API is simple, novice developers may experience some common problems. This article identifies some of the most common pitfalls and shows you how to avoid them.
Release date: October 08, 2005
Level: Intermediate
activity: 13,059 views
Comments: 0 (View | Add Comment-Login)
Average score (34 ratings)
Score for this article
First introduced in the 4.2 BSD UNIX® operating system, the Sockets API is now the standard feature of any operating system. In fact, it's hard to find a modern language that doesn't support the Sockets API. The API is fairly straightforward, but new developers will still encounter some common pitfalls.
This article identifies those pitfalls and shows you how to avoid them.
Hidden trouble 1. Ignore return status
The first pitfall is obvious, but it's the easiest mistake for a novice developer to make. If you ignore the return state of a function, you may get lost when they fail or partially succeed. This, in turn, can spread errors, making it difficult to locate the source of the problem.
Captures and checks each return state, rather than ignoring them. Consider the example shown in Listing 1, a socket send
function.
Listing 1. Ignore API function return status
Socket Send (sock, buffer, Buflen, msg_dontwait); if (status = =-1) {/ * Send failed * /printf ("Send failed:%s\n", strerror (errno));} else {/ * Send succeeded--or did it? */} |
Listing 1 explores a function fragment that completes a socket send
operation (sending data through a socket). The error state of the function is captured and tested, but this example ignores send
an attribute under nonblocking mode (enabled by the MSG_DONTWAIT
flag).
send
There are three possible types of return values for API functions:
- Returns 0 if the data is successfully queued to the transmission queue.
- If the queue fails, 1 is returned (by using
errno
a variable to understand the cause of the failure).
- If not all characters can be queued when the function is called, the final return value is the number of characters sent.
Because send
of the MSG_DONTWAIT
non-blocking nature of the variable, the function call returns after all the data, some data, or no data is sent. Ignoring the return status here will result in incomplete sending and subsequent data loss.
Back to top of page
Hidden Trouble 2. Peer socket closure
The funny side of UNIX is that you can almost think of anything as a file. The files themselves, directories, pipelines, devices, and sockets are treated as files. This is a novel abstraction, meaning that a complete set of APIs can be used on a wide range of device types.
Consider the read
API function, which reads a certain number of bytes from a file. read
The function returns the number of bytes read (up to the maximum value you specify), or 1, which indicates an error, or 0 if the end of the file has been reached.
If you complete an operation on a socket read
and get a return value of 0, this indicates that the peer layer on the remote socket side calls the close
API method. The indication is the same as the file read-no extra data can be read by descriptor (see Listing 2).
Listing 2. Properly handle the return value of the Read API function
Socket Read (sock, buffer, buflen); if (Status > 0) {/ * Data read from the socket */} else if (status = =-1) {/ * Error, Check errno, take action ... */} else if (status = = 0) {/ * Peer closed the socket, finish the close */ close
(sock); /* Further processing ... */} |
Similarly, you can use write
API functions to probe the closure of a peer socket. In this case, the signal is received SIGPIPE
, or if the signal is blocked, the write
function returns 1 and errno
is set to EPIPE
.
Back to top of page
Hidden Trouble 3. Address usage error (eaddrinuse)
You can use an bind
API function to bind an address (an interface and a port) to a socket endpoint. You can use this function in server settings to limit the interfaces that may come with the connection. You can also use this function in client settings to limit the interfaces that should be used for connections that should be made available. The bind
most common usage is to associate the port number and server, and use a wildcard address ( INADDR_ANY
), which allows any interface to be used for incoming connections.
bind
The common problem is trying to bind a port that is already in use. The trap is that there may not be an active socket, but it is still forbidden to bind the port ( bind
returned EADDRINUSE
), which is caused by the TCP socket state TIME_WAIT
. The status is retained for approximately 2-4 minutes after the socket is closed. TIME_WAIT
after the status exits, the socket is deleted and the address can be re-bound without problems.
Waiting for TIME_WAIT
the end can be annoying, especially if you are developing a socket server, you need to stop the server to make some changes, and then restart. Fortunately, there are ways to avoid the TIME_WAIT
state. You can apply SO_REUSEADDR
socket options to sockets so that ports can be reused immediately.
Consider the example in Listing 3. Before binding the address, I invoke it as an SO_REUSEADDR
option setsockopt
. To allow address reuse, I set the integer parameter ( on
) to 1 (otherwise, it can be set to zero address reuse).
Listing 3. Use SO_REUSEADDR socket option to avoid address usage errors
Socket (Af_inet, Sock_stream, 0):/* Enable Address reuse */setsockopt(SOCK, Sol_socket, so_reuseaddr, &on, si Zeof (on)); htonl htons Bind (sock, (struct sockaddr *) &servaddr, sizeof (SERVADDR)); |
After the option has been applied SO_REUSEADDR
, the bind
API function will allow immediate reuse of the address.
Back to top of page
Hidden Trouble 4. Send structured data
Sockets are the perfect tool for sending unstructured binary byte streams or ASCII traffic (such as HTTP pages on HTTP, or e-mail messages on SMTP). But if you try to send binary data on a socket, things will get more complicated.
For example, you want to send an integer: You can be sure that the receiver will use the same way to interpret the integer? Applications running on the same schema can rely on their common platform to make the same interpretation of that type of data. But what happens if a client running on a high-priority IBM PowerPC sends a 32-bit integer to a low-priority Intel x86? BYTE permutations will cause an incorrect explanation.
Byte swap or not?
Endianness refers to the order in which bytes in memory are arranged. The high priority (big endian) is ranked by the most significant byte, whereas the low priority (little endian) is sorted by the least significant byte in front.
High-priority architectures (such as powerpc®) have an advantage over low-priority architectures such as the Intel®pentium® family, whose network byte order is high priority. This means that for high-priority machines, the control of data within TCP/IP is natural and orderly. Low-priority architectures require byte swapping-a slight performance weakness for network applications.
What happens when you send a C structure through a socket? Here, too, there are problems, because not all compilers arrange the elements of a structure in the same way. The structure may also be compressed to minimize wasted space, which further causes the elements in the structure to be misaligned.
Fortunately, there are solutions to this problem that ensure consistent interpretation of data at both ends. In the past, the remote procedure call (Procedure Call,rpc) Suite tool provided the so-called external data representation (External data representation,xdr). XDR defines a standard representation of the data to support the development of heterogeneous Network application communications.
Now, there are two new protocols that provide similar functionality. Extensible Markup Language/Remote Procedure call (XML/RPC) arranges procedure calls on HTTP in XML format. Data and metadata are encoded in XML and transmitted as strings, and the values are separated from their physical representations through the host schema. SOAP follows Xml-rpc, extending its thinking with better features and functionality. See the Resources section for more information on each protocol.
Back to top of page
Hidden Trouble 5. Frame synchronization Assumptions in TCP
TCP does not provide frame synchronization, which makes it perfect for byte-stream-oriented protocols. This is an important difference between TCP and UDP (user Datagram Protocol, Subscriber Datagram Protocol). UDP is a message-oriented protocol that preserves message boundaries between senders and receivers. TCP is a stream-oriented protocol that assumes that the data being communicated is unstructured, as shown in 1.
Figure 1. UDP frame synchronization capability and lack of frame synchronization for TCP
The upper part of Figure 1 illustrates a UDP client and server. The left peer layer completes the write operation of two sockets, each 100 bytes. The UDP layer of the protocol stack tracks the number of writes and ensures that when the receiver on the right gets the data through the socket, it arrives in the same number of bytes. In other words, the message boundaries provided by the writer are reserved for the reader.
Now, look at the bottom of Figure 1. It demonstrates the same granularity of write operations for the TCP layer. Two separate write operations (100 bytes each) are written to the stream socket. But in this case, the reader of the stream socket gets 200 bytes. The TCP layer of the protocol stack aggregates two write operations. This aggregation can occur on either the sender or the receiver of the TCP/IP protocol stack. It is important to note that aggregations may not occur--tcp only ensure that the data is sent in an orderly manner.
For most developers, this trap can cause confusion. You want to obtain TCP reliability and frame synchronization for UDP. Application layer developers are required to implement buffering and staging functions unless other transport protocols, such as streaming Transmission Control Protocol (STCP), are used instead.
Back to top of page
Tools for debugging Socket applications
Gnu/linux provides several tools that can help you discover some of the problems in your socket application. In addition, using these tools is instructive and can help explain the behavior of the application and the TCP/IP protocol stack. Here, you will see an overview of several tools. Check out the resources below for more information.
View details of the network subsystem
netstat
The tool provides the ability to view the Gnu/linux network subsystem. netstat
, you can view the currently active connection (viewed as a single protocol), view a connection for a particular state (such as a server socket in the listening state), and many other information. Listing 4 shows netstat
some of the options provided and the attributes they enable.
Listing 4. Netstat usage patterns for utility programs
View all TCP sockets currently active$ netstat--tcpview all UDP sockets$ netstat--udpview all TCP sockets in the Listeni ng state$ Netstat--listeningview The multicast group membership information$ Netstat--groupsdisplay the list of Masquera Ded connections$ netstat--masqueradeview statistics for each protocol$ netstat--statistics |
Although there are many other utilities, netstat
the functionality is comprehensive and covers the route
ifconfig
functionality of, and other standard gnu/linux tools.
Monitor traffic
You can use several tools from Gnu/linux to check for low-level traffic on your network. tcpdump
tool is an older tool that is "sniffing" network packets from the Internet, printing to stdout
or recording in a file. This feature allows you to view the traffic generated by your application and the low-level flow control mechanism generated by TCP. A tcpflow
new tool called and tcpdump
complements it provides protocol flow analysis and methods for properly refactoring data streams, regardless of the order or the re-delivery of the packets. Listing 5 shows tcpdump
the two usage patterns.
Listing 5. Usage patterns for Tcpdump tools
Display all traffic on the Eth0 interface for the local host$ tcpdump-l-I eth0show all traffic on the network coming fro M or going to host plato$ tcpdump host platoshow all HTTP traffic for host camus$ Tcpdump host Camus and (Port HTTP) View T Raffic coming from or going to TCP port 45000 on the local host$ tcpdump TCP port 45000 |
tcpdump
and tcpflow
tools have a number of options, including the ability to create complex filter expressions. Refer to the resources below for more information on these tools.
tcpdump
And tcpflow
both are text-based command-line tools. If you prefer a graphical user interface (GUI), there is an open source tool that Ethereal
might suit your needs. Ethereal
is a professional protocol analysis software that can help debug application layer protocols. Its plug-in architecture (plug-in architecture) can decompose protocols such as HTTP and any protocol you can think of (there are 637 protocols for writing this article).
Back to top of page
Summarize
Socket programming is easy and fun, but you want to avoid introducing errors or at least making them easier to find, this requires considering the 5 common pitfalls described in this article, and using standard, error-proof programming practices. Gnu/linux tools and utilities can also help identify minor problems in some programs. Remember: keep track of relevant or "see" tools when viewing the Help manual for the utility. You may find a new tool that is necessary.
Resources
Learn
- You can refer to the original English text on the DeveloperWorks global site in this article.
- There are 11 states for a TCP state machine. See W. Richard Steven's illustration from TCP/IP illustrated, Volume 11 book.
- Explore the history and implications of Endianness on Wikipedia.
- Learn more about IBM's open, scalable, and customizable Power Architecture.
- Read RPC/XDR introduction from programming in C courseware.
- For more information on XML-RPC and how to use it in Java™ applications, read "XML-RPC in Java Programming" (developerworks,2004 January).
- SOAP is based on the characteristics of XML-RPC. Look for specifications, tools, tutorials, and articles on soapware.org.
- The SCTP features both TCP and UDP, as well as availability and reliability.
- The tutorial "Linux Socket Programming, Part One" (developerworks,2003 October) explains how to start socket programming and how to build an echo server and client that connects via TCP/IP. "Linux Socket Programming, Part II" (developerworks,2004 January) focuses on UDP and explains how to write UDP socket applications in C and Python (although the code is translated into other languages).
- The Netstat man page provides details on various ways to use Netstat.
- BSD Sockets Programming from a MultiLanguage Perspective (author M. Tim Jones) introduces the techniques of socket programming in 6 different languages.
- Find more resources for Linux developers on the DeveloperWorks Linux zone.
Access to products and technologies
- The tcpdump and Tcpflow utilities can be used to monitor network traffic.
- Ethereal Network Protocol Analyzer provides tcpdump functionality with a graphical user interface and a scalable plug-in architecture.
- Request a free SEK for Linux (two DVDs) with the latest IBM trial software from db2®, lotus®, Rational®, Tivoli®, and WebSphere® Linux versions.
- Build your next development project on Linux with IBM trial software, which can be downloaded directly from DeveloperWorks.
Discuss
- Join the DeveloperWorks community by participating in the DeveloperWorks blog.
5 hidden dangers in the programming of Linux sockets