"Everything is socket!. ”
Although the words are slightly exaggerated, but the fact is, the current network programming is almost all using the socket.
--Thoughts on practical programming and open source project research.
We know the value of the exchange of information, how the process of communication between the network, such as when we open the browser every day to browse the Web page, how the browser process to communicate with the Web server? When you chat with QQ, how does the QQ process communicate with the server or the QQ process where your friend is located? All this depends on the socket? So what is a socket? What are the types of sockets? and the basic function of the socket, which is what this article wants to introduce. The main contents of this article are as follows:
- 1, how to communicate between processes in the network?
- 2. What is socket?
- 3. Basic operation of socket
- 3.1. Socket () function
- 3.2. Bind () function
- 3.3, listen (), connect () function
- 3.4. The Accept () function
- 3.5. Read (), write () function, etc.
- 3.6. Close () function
- 4, the socket TCP three handshake establishment connection detailed
- 5, the socket TCP Four handshake release connection detailed
- 6. An example (practice)
- 7, leave a question, welcome everybody replies reply!!!
1, how to communicate between processes in the network?
The local interprocess communication (IPC) is available in a number of ways, but can be summarized in the following 4 categories:
- Message delivery (pipeline, FIFO, Message Queuing)
- Synchronization (mutex, condition variable, read-write lock, file and write record lock, semaphore)
- Shared memory (anonymous and named)
- Remote Procedure calls (Solaris Gate and Sun RPC)
But these are not the subject of this article! What we're talking about is how the processes in the network communicate? The first problem is how to uniquely identify a process, or the communication is out of the question! A process can be uniquely identified locally through the process PID, but this is not feasible in the network. In fact, the TCP/IP protocol family has helped us solve this problem, the network layer "IP address " can uniquely identify the host in the network, and the Transport layer " protocol + port " can uniquely identify the host application (process). By using triples (IP address, Protocol, port), the process of the network can be identified, and the process communication in the network can use this flag to interact with other processes.
Applications that use the TCP/IP protocol typically use the application programming interface: The UNIX BSD socket (socket) and the Unix System v Tli (already obsolete) to enable communication between network processes. For now, almost all applications are sockets, and now it's the network era, where process communication is ubiquitous, and that's why I say "everything is a socket."
2. What is a socket?
Above we know that the process in the network through the socket to communicate, then what is the socket it? Sockets originate from UNIX, and one of Unix/linux's basic philosophies is that "all files" can be manipulated using the "open open–> Read and write write/read–> close" mode. My understanding is that the socket is an implementation of the pattern, the socket is a special kind of file, some of the socket function is the operation of it (read/write Io, open, close), these functions we introduced later.
The origin of the word socket
The first use in the area of networking was discovered in the IETF RFC33, published on February 12, 1970, by Stephen Carr, Steve Crocker, and Vint Cerf. According to the records of the American Museum of Computer History, Croker wrote: "The elements of a namespace can be called socket interfaces." A socket interface forms one end of a connection, and a connection can be specified entirely by a pair of socket interfaces. "This is about 12 years earlier than the BSD socket interface definition," added the Computer History Museum. ”
3. Basic operation of socket
Since the socket is an implementation of the "open-write/read-close" pattern, the socket provides the function interfaces for these operations. The following is an example of TCP, which introduces several basic socket interface functions.
3.1. Socket () function
Socket (int protocol);
The socket function corresponds to the open operation of the normal file. The open operation of a normal file returns a file descriptor, and the socket () is used to create a socket descriptor (socket descriptor), which uniquely identifies a socket. The socket descriptor is the same as the file descriptor, and subsequent operations are useful to it, using it as a parameter to perform some read and write operations.
Just as you can give fopen a different parameter value to open a different file. When creating a socket, you can also specify different parameters to create different socket descriptors, the three parameters of the socket function are:
- Domain: The Protocol field, also known as the Protocol Family (family). Common protocol families are af_inet,Af_inet6,af_local (or Af_unix,unix domain sockets), Af_route, andso on. The protocol family determines the socket address type, must use the corresponding address in the communication, such as Af_inet decided to use the IPv4 address (32 bits) and the port number (16 bit) combination, Af_unix decided to use an absolute path name as the address.
- Type: Specifies the socket type. Common socket types are,sock_stream,sock_dgram,sock_raw,Sock_packet, Sock_seqpacket, andso on (what are the types of sockets?). )。
- Protocol: Therefore, the name of the idea is to specify the agreement. Commonly used protocols are,ipproto_tcp,ipptoto_udp,ipproto_sctp,IPPROTO_TIPC, respectively, they correspond to TCP transport protocol, UDP Transmission protocol, STCP transmission protocol, TIPC Transfer Protocol (this agreement will be discussed separately!) )。
Note: Not the above type and protocol can be arbitrarily combined, such as sock_stream can not be combined with IPPROTO_UDP. When protocol is 0 o'clock, the default protocol corresponding to type types is automatically selected.
When we call the socket to create a socket, it returns the socket descriptor that exists in the Protocol family (address family,af_xxx) space, but does not have a specific address. If you want to assign an address to it, you must call the bind () function, or the system will automatically randomly allocate a port when you call Connect (), listen (). (Does the client also need to bind?) Here is not too clear, also need to query some information)
3.2. Bind () function
As mentioned above, the bind () function assigns a specific address in the address family to the socket. For example , the corresponding af_inet,Af_inet6 is to assign a IPv4 or IPv6 address and port number combination to the socket.
Bind (struct sockaddr *addr, socklen_t addrlen);
The three parameters of a function are:
- SOCKFD: The socket descriptor, which is created through the socket () function and uniquely identifies a socket. The bind () function is to bind a name to the description word.
- Addr: A const struct sockaddr * Pointer to the protocol address to bind to SOCKFD. This address structure differs depending on the address protocol family at which the socket was created, as IPv4 corresponds to:
struct SOCKADDR_IN { sa_family_t/ * Address family:af_inet */ in_port_t sin_port; / * port in network byte order * / struct in_addr sin_addr; / * Internet address */}; /* Internet address. *//* address in network byte order */};
ipv6 corresponds to:
struct sockaddr_in6 {sa_family_t sin6_family; /* af_inet6 */in_port_t sin6_port; /* Port number */uint32_t sin6_flowinfo; /* IPv6 Flow information */struct in6_addr SIN6_ADDR; /* IPv6 address */uint32_t sin6_scope_id; /* Scope ID (New in 2.4) */}; struct in6_addr {unsigned Char s6_addr[16]; /* IPv6 address */};
The UNIX domain corresponds to the following:
#define UNIX_PATH_MAX 108struct Sockaddr_un { sa_family_t sun_family; / * Af_unix * / Char Sun_path[unix_path_max]; / * pathname */};
- Addrlen: Corresponds to the length of the address.
Usually when the server is started to bind a well-known address (such as IP address + port number) to provide services, the client can be used to connect the server, and the client does not specify, there is a system automatically assigned a port number and its own IP address combination. This is why the server usually calls bind () before listen, and the client does not invoke it, but instead generates one randomly from the system at Connect ().
Network byte order and host byte order
host byte-order is what we normally call the big-endian and small-end patterns: Different CPUs have different byte-order types, which are the order in which integers are stored in memory, which is called the host order. The reference standard Big-endian and Little-endian are defined as follows:
A) The Little-endian is the low-bit bytes emitted at the lower address of the memory, high-bit bytes emitted in the memory of the higher address.
b) The Big-endian is the high-bit byte emitted at the low address of the memory, and the low byte is discharged at the upper address of the memory.
network byte order : The 4-byte value is transmitted in the following order: First, 0~7bit, followed by 8~15bit, then 16~23bit, and finally 24~31bit. This transmission order is called the big-endian byte order. because all binary integers in the TCP/IP header are required in this order when they are transmitted over the network, it is also referred to as the network byte order. the order of bytes, as the name implies, is greater than the order in which the data of a byte type is stored in memory, and a byte of data does not have a sequential problem.
So: When binding an address to a socket, first convert the host byte order into a network byte order, instead of assuming that the host byte order is Big-endian with the network byte order. As a result of this problem has caused a massacre! Because of this problem in the company project code, it leads to a lot of puzzling problems, so remember not to make any assumptions about the host byte-order, so be sure to convert it into a network byte order and assign it to the socket.
3.3, listen (), connect () function
If, as a server, it calls listen () after calling socket (),bind() to listen to the socket, if the client calls connect () to make a connection request, This request is received by the server side.
Listen (int backlog); Connect (struct sockaddr *addr, socklen_t addrlen);
The first parameter of the Listen function is the socket descriptor to listen to, and the second parameter is the maximum number of connections that the corresponding socket can queue. The socket created by the socket () function defaults to an active type, and the Listen function changes the socket to a passive type, waiting for the client's connection request.
The first parameter of the Connect function is the client's socket descriptor, the second parameter is the server's socket address, and the third parameter is the length of the socket address. The client establishes a connection to the TCP server by calling the Connect function.
3.4. The Accept () function
After the TCP server invokes the socket (),bind(),listen (), it listens for the specified socket address. The TCP client calls the socket (),Connect () in turn, and then wants the TCP server to send a connection request. After the TCP server hears this request, it calls the accept() function to take the receive request, so the connection is established. You can then start network I/O operations, which are similar to read/write I/O operations for normal files.
Accept (struct sockaddr *addr, socklen_t *addrlen);
The first parameter of the Accept function is the server's socket descriptor, the second parameter is a pointer to the struct sockaddr *, which returns the client's protocol address, and the third parameter is the length of the protocol address. If Accpet succeeds, then its return value is a completely new description Word generated automatically by the kernel, representing the TCP connection to the returned client.
Note: The first parameter of accept is the socket descriptor of the server, which is generated by the server calling the socket () function, which is called the listener socket descriptor, while the Accept function returns the connected socket description Word. A server typically creates only one listener descriptor, which persists throughout the lifetime of the server. The kernel creates a connected socket descriptor for each client connection accepted by the server process, and when the server has completed a service to a customer, the corresponding connected socket descriptor is closed.
3.5, read (), write () and other functions
Everything has only the East wind, the server and the customer has established a good connection. can call network I/O to read and write operations, that is, the implementation of the network of different processes between the communication! Network I/O operations have the following groups:
- Read ()/write ()
- Recv ()/send ()
- Readv ()/writev ()
- Recvmsg ()/sendmsg ()
- Recvfrom ()/sendto ()
I recommend using the recvmsg ()/sendmsg () function, which is the most general I/O function and can actually replace the other functions above. Their declarations are as follows:
#include <unistd.h> ssize_tReadint FD,void *buf, size_t count); ssize_tWriteint FD,Constvoid *buf, size_t count); #include <sys/types.h> #include <sys/socket.h> ssize_t Send (int SOCKFD,Constvoid *buf, size_t len,int flags); ssize_t recv (int SOCKFD,void *buf, size_t len,int flags); ssize_t SendTo (int sockfd, const void *buf , size_t Len, int flags, const struct sockaddr *dest_addr, socklen_t Addrlen); ssize_t recvfrom (int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen); ssize_t sendmsg (int sockfd, const struct msghdr *msg, int flags); ssize_t recvmsg (int sockfd, struct msghdr *msg, int flags );
The read function is responsible for reading the content from FD. When read succeeds, read returns the actual number of bytes read, if the returned value is 0 to indicate that the end of the file has been read, and less than 0 indicates an error occurred. If the error is eintr, the read is caused by an interrupt, if econnrest indicates a problem with the network connection.
The Write function writes the nbytes byte content in buf to the file descriptor FD. Returns the number of bytes written when successful. Returns 1 on failure and sets the errno variable. In a network program, there are two possibilities when we write to the socket file descriptor. 1) The return value of write is greater than 0, indicating that some or all of the data is written. 2) The returned value is less than 0, and an error occurs. We are going to deal with the error type. If the error is EINTR, an interrupt error occurred while writing. If the epipe indicates a problem with the network connection (the other party has closed the connection).
Other I do not introduce these several I/O functions, see the Man document or Baidu, Google, the following example will be used to SEND/RECV.
3.6. Close () function
After the server has established a connection with the client, some read and write operations are performed, and the corresponding socket descriptor is closed when the read and write operation is completed, like closing the open file by calling Fclose when the file is opened.
#include <unistd.h>Close (int fd);
Close the default behavior of a TCP socket by marking the socket as closed and then immediately returning to the calling process. The descriptor can no longer be used by the calling process, that is, no longer as the first parameter of read or write.
Note: The close operation simply makes the reference count of the corresponding socket descriptor-1, which triggers the TCP client to send a terminating connection request to the server only if the reference count is 0.
4, the socket TCP three handshake establishment connection detailed
We know that TCP establishes a connection for a "three-time handshake", which is the exchange of three groupings. The approximate process is as follows:
- The client sends a SYN J to the server
- The server responds to a SYN K to the client and acknowledges the SYN J ACK j+1
- The client then wants the server to send a confirmation ack k+1
Only three handshake is done, but this three handshake takes place in the socket of the several functions? Please see:
Figure 1, TCP three-time handshake sent in socket
As you can see, when the client calls connect , the connection request is triggered, the SYN J packet is sent to the server, then connect enters the blocking state, the server hears the connection request, receives the SYN J packet, calls the accept function to receive the request to send SYN K to the client, Ack J+1, when accept enters the blocking state, the client receives the SYN K of the server, ACK j+1, then connect returns, and the Syn K is confirmed; when the server receives an ACK k+1, the Accept returns, three times the handshake is complete, and the connection is established.
Summary: The client's connect returns in the second time of the three handshake, while the server-side accept is returned for the third time in the three-time handshake.
5, the socket TCP Four handshake release connection detailed
The above describes the three-time handshake creation process of TCP in the socket and the socket function involved. Now that we introduce the four-time handshake in the socket to release the connection, see:
Figure 2, TCP four-time handshake sent in socket
The process is as follows:
- An application process first calls close to actively close the connection, when TCP sends a FIN M;
- After the other end receives fin m, perform a passive shutdown to confirm the fin. Its reception is also passed as a file terminator to the application process, because the receive of fin means that the application process can no longer receive additional data on the corresponding connection;
- After some time, the application process that receives the file terminator calls close to close its socket. This causes its TCP to send also a fin N;
- This fin is received by the source send side TCP to confirm it.
So there is a fin and an ack in each direction.
6. An example (practice)
Say so much, hands-on practice. Write a simple server, the client (using TCP)-the server has been listening to the local port No. 6666, if you receive a connection request, will receive the request and receive the message sent by the client, the client and the server to establish a connection and send a message.
Server-side code:
Server-Side
Client code:
Client
Of course, the above code is very simple, there are many shortcomings, this is just a simple demonstration of the basic function of the socket to use. In fact, no matter how complex the network program is, these basic functions are used. The above server uses the iterative mode, that is, only processing a client request to process the next client request, such a server processing power is very weak, the real server needs to have concurrency processing power! In order to require concurrent processing, the server needs to fork () a new process or thread to process the request.
7. Moving Hands
Leave a question, welcome everybody replies reply!!! Are you familiar with network programming under Linux? If familiar, write the following program to complete the following functions:
Server-side:
Receive client information for address 192.168.100.2, such as "Client Query", to print "Receive Query"
Client:
Send the message "Client query test", "Cleint query", "Client query quit" to the server-side order of address 192.168.100.168, and then exit.
The IP address appearing in the topic can be determined according to the actual situation.
--This article simply introduces the simple socket programming.
More complex needs to go deeper.
(Unix domain socket) sending >=128k messages using UDP will report ENOBUFS errors (a problem encountered in the actual socket programming, hope to help you)
Linux Socket Programming-(go from Wu Qin (Tyler))