Socket Programming Principle
1, the introduction of the problem
1 Ordinary I/O operation process :
The I/O command set for UNIX systems is evolved from commands in the Maltics and early systems, in a mode that opens a read/write Off (open-write-read-close). When a user process makes an I/O operation, it first invokes "open" to obtain the right to use the specified file or device and returns an integer called the file descriptor to describe the process of the user doing I/O on the open file or device. The user process then invokes "read/write" multiple times to transmit data. When all the transport operations are complete, the user process shuts down the call to notify the operating system that it has finished using the object.
2 The TCP/IP protocol is integrated into the UNIX kernel
When the TCP/IP protocol is integrated into the UNIX kernel, it is equivalent to introducing a new type I/O operation in the UNIX system. The interaction between the UNIX user process and the network protocol is much more complex than the interaction between the user process and the traditional I/O device. First, the two processes that perform network operations on different machines, how to establish the connection between them. Secondly, there are many network protocols, how to establish a general mechanism to support a variety of protocols. These are the network application programming interface to solve the problem.
3 requires a common network programming interface: independent of specific protocols and common network programming
In Unix systems, there are two types of network application programming interfaces: Unix-BSD Sockets and UNIX System V Tli. Since Sun uses the UNIX BSD operating system that supports TCP/IP, which makes the application of TCP/IP more developed, its Network application programming interface-socket (socket) is widely used in network software, so far it has been introduced into DOS and Windows system of microcomputer operating system. As a powerful tool for developing Web applications, this chapter will discuss the issue in detail. 2. Basic concept of Socket programming
Before you start using socket programming, you must first establish the following concepts. 2.1 Inter-network process communication
The concept of process communication originally originated from stand-alone systems. Since each process is running within its own address range, the operating system provides the appropriate facilities for process communication to ensure that the two communicating processes are not interfering with each other and work in harmony.
as UNIX BSD has: pipeline (pipe), named pipe (named pipe) soft interrupt signal (signal)
UNIX System V has: messages (message), shared memory, and semaphores (semaphore).
They are limited to communication between native processes. Network process communication is to solve the problem of communication between different host processes (the same machine process communication can be regarded as a special case). To do this, the first thing to solve is the network process identification problem. On the same host, different processes can be uniquely identified with the process number (procedure ID). However, in the network environment, the process number that each host allocates independently cannot uniquely identify the process. For example, host A is assigned a process number 5 and there can be a 5th process in the B machine, so the phrase "5th process" is meaningless. Second, the operating system supports many network protocols, different protocols work in different ways, the address format is also different. Therefore, the process of communication between the network will also solve the problem of multiple protocol identification. To solve the above problems, the TCP/IP protocol introduces the following concepts. 1) Port
A communication port in a network that can be named and addressed is a resource that the operating system can allocate.
According to the OSI Seven layer protocol, the biggest difference between the transport layer and the network layer is that the transport layer provides the process communication capability. In this sense, the final address of the network communication is not just the host address, but also some identifier that can describe the process. For this reason, the TCP/IP protocol proposes the concept of a protocol port (protocol port, abbreviated as ports) to identify the process of communication.
A port is an abstract software structure (including some data structures and I/O buffers). When an application (that is, a process) connects to a port through a system call (binding), the data passed to that port by the transport layer is received by the corresponding process, and the data sent to the transport layer by the corresponding process is exported through that port. In the implementation of TCP/IP protocol, the operation of the port is similar to the general I/O operation, the process gets a port, which is equivalent to obtaining the local only I/O files, can be accessed in the general read-write Primitives. Similar to file descriptors, each port has an integer identifier called port number, which distinguishes the different ports.
Because TCP and UDP are completely independent of two software modules for TCP/IP transport Layer Two protocols, the respective port numbers are also independent of each other, such as TCP has a port No. 255, UDP can have a No. 255 port, the two do not conflict.
The allocation of port numbers is an important issue. There are two basic methods of distribution: the first is called global distribution, which is a centralized control method, which is distributed by a recognized central authority according to the needs of the users, and the results are disclosed to the public. The second is a local assignment, also known as a dynamic connection, where a process requires access to the Transport Layer service, applies to the local operating system, and the operating system returns a local unique port number, which is then associated with the port number (lashing) through the appropriate system call. These two approaches are synthesized in the allocation of TCP/IP port numbers. TCP/IP divides the port number into two parts and a small amount as a reserved port and is allocated globally to the service process. Therefore, each standard server has a globally recognized port (known as the well-known port), and even on different machines, its port number is the same. The remainder is free port and is allocated locally. Both TCP and UDP require a port number less than 256 to be a reserved port. 2) Address
The two processes of communication in the network communication are on different machines. In an interconnection network, two machines may be located on different networks, which are connected through network interconnection devices (gateways, bridges, routers, etc.). Therefore requires level three addressing:
1. A host may be connected to multiple networks, and a specific network address must be specified;
2. Each host on the network should have its unique address;
3. Each process on each host should have a unique identifier on that host.
Typically, the host address consists of a network ID and a host ID, represented by a 32-bit integer value in the TCP/IP protocol, and both TCP and UDP use a 16-bit port number to identify the user process.
3 network byte order
Different computers hold multibyte values in different order, some machines at the starting address of the low byte (small end order), and some storage high byte (big order). To ensure the correctness of the data, the network byte order must be specified in the network protocol. The TCP/IP protocol uses a high price first-save format for 16-bit integers and 32-bit integers, both of which are included in the protocol header file. Detailed http://blog.csdn.net/hguisu/article/details/7449955#t1
4) Connection
A communication link between two processes is called a connection. The connection is internally represented by a number of buffers and a set of protocol mechanisms, which exhibit higher reliability than connectionless connections externally.
5) Semi-correlated
To sum up, the network with a ternary group can be globally unique flag a process:
(Protocol, local address, local port number) such a ternary group, called a half-association, that specifies each part of the connection. 6) All-related
A complete network process communication needs to be composed of two processes and can only use the same high-level protocol. In other words, it is impossible to communicate at one end with the TCP protocol, and the other end with the UDP protocol. Therefore, a complete network communication requires a five-tuple to identify:
(Protocol, local address, local port number, remote address, remote port number) Such a five-tuple, called a Correlation (association), that is, two protocols the same half-correlation can be combined into a suitable correlation, or fully specify the composition of a connection.
2.2 Service Mode
In the network layered structure, each layer is strictly one-way dependence, the Division of labor and cooperation at all levels is embodied in the interface between different layers. A "service" is an abstract concept that describes the relationship between different layers, that is, the set of actions that each layer in the network provides to the upper level. The lower level is the service provider, and the upper level is the user requesting the service. A service behaves as a primitive (primitive), such as a system call or library function. System calls are the service primitives that the operating system kernel provides to network applications or high-level protocols. The n layer in the network always provides more complete service to the N+1 layer than the n-1 layer, otherwise the N layer has no value. In the OSI terminology, the network layer and the following layers are called communication subnets, providing only point-to-point communication, without the concept of a program or process. And the transport layer is to achieve "end-to-end" communication, the introduction of the network process communication concepts, but also to solve error control, flow control, data sorting (message sorting), connection management and other issues, to provide different service methods:
1)-oriented connection (virtual circuit) or connectionless connection-oriented services (TCP protocol): It is the abstraction of the telephone system service mode, that is, each time the complete data transmission must be connected, use the connection, terminate the connection process. In the data transfer process, each packet does not carry the destination address, but uses the connection number (connect ID). In essence, a connection is a pipe, sending and receiving data not only in the same order, but also in the same content. The TCP protocol provides connection-oriented virtual circuits.
Connectionless Service (UDP protocol): is the abstraction of the postal system service, each group carries the complete destination address, each group transmits independently in the system. No connection service can not guarantee the order of grouping, without the recovery and retransmission of the group error, the reliability of transmission is not guaranteed. The UDP protocol provides connectionless datagram services.
The types of these services and examples of applications are given below:
2) Order
In the network transmission, two consecutive messages may pass through different paths in the end-to-end communication, so that the order of arrival at the destination may be different from the sending time. Order refers to the order in which data is received in the same order as the data sent. The TCP protocol provides this service. 3) Error control
A mechanism to ensure that the data received by the application is error-free. The method of checking the error is generally to use the method of "check and (Checksum)". And the way to ensure the transmission error-free is to use the confirmation response technique on both sides. The TCP protocol provides this service. 4) Flow control
A mechanism for controlling data transmission rate in the course of data transmission, to ensure that it is not lost. The TCP protocol provides this service.
5) Byte stream
Byte throttling means that only the message in the transmission is treated as a sequence of bytes and does not provide any boundary of the data stream. The TCP protocol provides a byte throttling service. 6) Message
The receiver wants to save the sender's message boundary. UDP protocol provides message services. 7) Full-duplex/ Half-duplex
End-to-end data is transmitted in two directions/One Direction at a time.
8) Cache/Out-of-band Data
In a byte throttling service, a user process can read or write any number of bytes at a time because there is no message boundary. Caching is necessary to ensure that the transmission is correct or that a stream-controlled protocol is used. However, some special requirements, such as interactive applications, will require the cancellation of this cache. In the process of data transfer, a class of information that is not transmitted to the user for timely processing, such as the interrupt key (delete or control-c) of the UNIX system, the terminal flow control character (Control-s and Control-q), is called Out-of-band data. Logically, it looks as if the user process is using a separate channel to transmit the data. The channel is associated with each stream connected to the connection. Since the implementation of Out-of-band data in Berkeley Software distribution is inconsistent with the host agreement specified in RFC 1122, in order to minimize interoperability problems, application writers require out-of-band data, unless they are interoperable with existing services. It's best not to use it.
2.3 Client/server mode
In TCP/IP network applications, the main mode of interaction between the two processes of communication is client/server mode (Client/server model), in which the client sends a service request to the server, and the server receives the request and provides the corresponding service. The client/server model is based on the following two points: first, the cause of the network is the network hardware and software resources, computing power and information is not equal, need to share, so that the host has a large number of resources to provide services, less resources of the customer request service This is not a reciprocal role. Second, inter-network process communication is completely asynchronous, the mutual communication between the process is not a parent-child relationship, and does not share the memory buffer, so need a mechanism for the communication between the process to establish a connection between the two data exchange to provide synchronization, which is based on different client/server mode of TCP/IP. The client/server model takes the active request approach during the work process:
Server side:
First, the server must start and provide the appropriate service on request:
1. Open a communication channel and inform the local host that it is willing to receive customer requests on a recognized address (such as FTP 21);
2. Wait for customer request to reach the port;
3. Receive a duplicate service request, process the request and send an answer signal. A concurrent service request is received, and a new process is activated to process the client request (for example, fork, exec) in UNIX systems. The new process handles this customer request and does not need to respond to other requests. When the service completes, close the communication link between this new process and the customer and terminate.
4. Return to step two and wait for another client request.
5. Shutdown Server Client side:
1. Open a communication channel and connect to a specific port on the host of the server;
2. Send the service request message to the server, wait and receive the reply, continue to make the request ...
3. Close the communication channel and terminate after the request is completed.
From the process described above:
1. The role of the client and server processes is asymmetric, so the coding is different.
2. The service process is typically initiated prior to the customer's request. As long as the system is running, the service process persists until it is normal or forced to terminate.
2.4 Socket Types
The TCP/IP socket provides the following three types of sockets. Streaming Sockets (SOCK_STREAM):
Provides a connection-oriented, reliable data transfer service with no errors, no repetition, and receives in the order of dispatch. Internal flow control
System to avoid data stream overrun; Data is considered to be byte throttling, without length restriction. FTP (FTP) is the use of streaming sockets. Datagram Sockets (SOCK_DGRAM):
Provides a connectionless service (UDP). Packets are sent in a separate package and do not provide error-free guarantees.
The data may be lost or duplicated, and the receive order is confusing. The Network File system (NFS) uses a datagram socket.
Original socket (SOCK_RAW):
This interface allows direct access to lower layer protocols, such as IP, ICMP. Often used to verify new protocol implementations or to access new devices configured in existing services.
2.4 Typical socket call process examples as mentioned above, the application of TCP/IP protocol is generally used in client/server mode, so in the actual application, must have the customer and server two processes, and first start the server, its system call sequence diagram as follows. Socket system calls for connection-oriented protocols (such as TCP) are shown in Figure 2.1:
The server must start first until it finishes the accept () call and is able to receive customer requests after it has entered the wait state. If the customer starts before this, connect () will return an error code, the connection is unsuccessful.
Socket calls without connection protocol (UDP) are as shown in Figure 2.2:
A connectionless server must also be started before the service process is not transmitted by the client request. No connection client calls connect (). So before the data is sent, the client and the server are not yet fully related, but each has a semi correlation through the socket () and bind (). When sending data, the sender specifies the receiver socket size in addition to the local socket size, thereby dynamically establishing the full correlation in the data sending and receiving process.
Instance
This example uses the client/server model for the connection protocol, and its process is shown in Figure 2.3:
Server-side program:
/* File name:streams.c/#include <winsock.h> #include <stdio.h> #define TRUE 1/* This program creates a socket
, and then start an infinite loop, printing out a message whenever it receives a connection through a loop. When the connection is disconnected, or the termination information is received, the connection ends and the program receives a new connection.
The format of the command line is: Streams/main () {int sock, length;
struct SOCKADDR_IN server;
struct SOCKADDR tcpaddr;
int msgsock;
Char buf[1024];
int Rval, Len;
* * Set socket/sock = socket (af_inet, sock_stream, 0);
if (Sock < 0) {perror ("opening stream socket");
Exit (1);
/* Use any port name socket/server.sin_family = af_inet;
Server.sin_port = Inaddr_any;
if (Bind (sock, (struct sockaddr *) &server, sizeof (server)) < 0) {perror ("binding stream socket");
Exit (1);
* * Find the specified port number and print it/length = sizeof (server);
if (getsockname (sock, struct sockaddr *) &server, &length) < 0) {perror ("Getting socket name");
Exit (1);
printf ("Socket port #%d/n", Ntohs (Server.sin_port));
/* Start receiving connection * * Listen (sock, 5);
len = sizeof (struct sockaddr); do {Msgsock = accept (sock, struct sockAddr *) &tcpaddr, (int *) &len);
if (Msgsock = = 1) perror ("accept");
else do{memset (buf, 0, sizeof (BUF));
if ((Rval = recv (Msgsock, buf, 1024)) < 0) perror ("Reading stream message");
if (Rval = = 0) printf ("Ending connection/n");
else printf ("-->;%s/n", buf);
}while (rval!= 0);
Closesocket (Msgsock);
} while (TRUE); /* Because this program already has an infinite loop, so the socket "sock" never explicitly closed. However, when a process is killed or terminated normally, all sockets are automatically closed.
* * EXIT (0); }
Client program:
/* File NAME:STREAMC.C */#include <winsock.h> #include <stdio.h> #define DATA "Half a league, H Alf A league ... "* * This program establishes the socket and then connects to the socket given by the command line, sends a message on the connection at the end of the connection, and then closes the socket.
The format of the command line is: STREAMC host name port number to the same port number as the server program/main (ARGC, argv) int argc;
Char *argv[];
{int sock;
struct SOCKADDR_IN server;
struct Hostent *hp, *gethostbyname ();
Char buf[1024];
* * Set socket/sock = socket (af_inet, sock_stream, 0);
if (Sock < 0) {perror ("opening stream socket");
Exit (1);
/* * Use the name specified on the command line to connect socket/server.sin_family = af_inet;
HP = gethostbyname (argv[1]);
if (HP = = 0) {fprintf (stderr, "%s:unknown host/n", argv[1]);
Exit (2);
} memcpy ((char*) &server.sin_addr, (char*) hp->;h_addr, hp->;h_length);
Sever.sin_port = htons (atoi (argv[2));
if (Connect (sock, (struct sockaddr*) &server, sizeof (server)) < 0) {perror ("Connecting stream socket");
Exit (3);
} if (send (sock, data, sizeof (data)) < 0) perror ("Sending on stream socket");Closesocket (sock);
Exit (0); }
2.5 A generic instance program in the previous section, we introduced a simple example of a socket program. As we can see from this example, there is almost a pattern for using socket programming, where all programs call the same function in the same order almost without exception. So we can imagine, design a middle layer, it provides a few simple functions up, the program just call these functions can realize ordinary network data transmission, programmers do not have to care too much about the details of the socket program design. In this section, we will introduce a common network program interface, which provides a few simple functions to the upper layer, so that programmers can accomplish most of the network data transfer by using these functions. These functions isolate the socket programming and the upper layer, it uses a connection-oriented streaming socket, using non-blocking work mechanism, the program simply calls these functions to query network messages and respond accordingly. These functions include:
L Initsocketsstruct: Initializes the socket structure to obtain the service port number. Client program use.
L Initpassivesock: Initializes the socket structure, obtains the service port number, establishes the main socket. Used by server programs.
L Closemainsock: Closes the primary socket. Used by server programs.
L CreateConnection: Establish a connection. Client program use.
L Acceptconnection: Receive connection. Used by server programs.
L CloseConnection: Close the connection.
L QUERYSOCKETSMSG: Query socket messages.