Linux socket programming (byte processing)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction
The rise of Linux is a miracle created by the Internet. Linux, as a free software that fully opens its original code, is compatible with various Unix standards (such as POSIX, UNIX System V, and bsd unix) multi-user, multi-task operating system with complex Kernel. In China, with the popularity of the Internet, a group of Linux enthusiasts, mainly composed of College Students and ISP technicians, have grown rapidly. More and more programmers gradually love this excellent free software. This article introduces the basic concepts and function calls of socket in Linux.

2. What is socket?
Socket is a method for communicating with other programs through standard UNIX file descriptors. Each socket is represented by a semi-correlation Description: {protocol, local address, and local port}; a complete socket is represented by a related Description: {protocol, local Address, local port, remote address, remote port}, each socket has a unique local socket number allocated by the operating system.

3. Three types of socket
(1) stream socket (sock_stream)
Stream sockets provide reliable and connection-oriented communication streams. They use the TCP protocol to ensure the correctness and sequence of data transmission.
(2) datagram socket (sock_dgram)
A datagram socket defines a connectionless service. Data is transmitted through independent packets, which is out of order and is not reliable and error-free. It uses the datagram protocol UDP
(3) original socket
The original socket allows direct access to underlying protocols such as IP addresses or ICMP. It is powerful but inconvenient to use and is mainly used for the development of some protocols.

4. Use a socket to send data
1. for streaming sockets, the system calls send () to send data.
2. For a datagram socket, you must add an information header first, and then call the sendto () function to send the data.

5. Data Structure of socket in Linux
(1) struct sockaddr {// used to store the socket address
Unsigned short sa_family; // address type
Char sa_data [14]; // 14-byte Protocol address
};
(2) struct sockaddr_in {// In indicates Internet
Short int sin_family; // Internet protocol family
Unsigned short int sin_port; // port number, which must be in the byte sequence of the Network
Struct in_addr sin_addr; // Internet address, which must be a network byte order
Unsigned char sin_zero; // Add 0 (same size as struct sockaddr
};
(3) struct in_addr {
Unsigned long s_addr;
};

6. Network byte sequence and its conversion functions
(1) network byte sequence
Each machine has a different storage order for the variable bytes, and the data transmitted over the network must be in a unified order. Therefore, for machines whose internal byte representation order is different from the network byte order, data must be converted. In terms of program portability requirements, even if the internal byte representation sequence of the local machine is the same as that of the network byte, the data conversion function should be called before data transmission, so that the program can be correctly executed after being transplanted to other machines. Whether the conversion is true or not depends on the system function.
(2) related conversion functions
* Unsigned short int htons (unsigned short int hostshort ):
The host bytes are converted to the network byte order, and the unsigned short type is operated by 4 bytes.
* Unsigned long int htonl (unsigned long int hostlong ):
Host bytes are converted to network bytes in sequence, and 8 bytes are operated on the unsigned long type.
* Unsigned short int ntohs (unsigned short int netshort ):
The Network bytes are converted to the host byte order, and the unsigned short type is operated by 4 bytes.
* Unsigned long int ntohl (unsigned long int netlong ):
The Network bytes are converted to the host byte sequence, and the unsigned long type is operated by 8 bytes.
Note: The above function is defined in netinet/in. h.

7. IP address conversion
There are three functions that convert string IP addresses expressed in the form of digits to binary IP addresses in the 32-bit network byte order.
(1) unsigned long int inet_addr (const char * CP): This function converts a string of IP addresses represented by numbers and points into an unsigned long integer, such as struct sockaddr_in Ina
Ina. sin_addr.s_addr = inet_addr ("202.206.17.101 ")
If the function is successful, the conversion result is returned. If the function fails, the constant inaddr_none is returned. The constant =-1. The unsigned integer-1 in binary is equivalent to limit 255. This is a broadcast address, therefore, when calling iner_addr () in a program, you must manually handle the call failure. Because this function cannot process broadcast addresses, you should use the inet_aton () function in the program ().
(2) int inet_aton (const char * CP, struct in_addr * indium): This function converts an IP address in string format to a binary IP address. If it succeeds, 1 is returned; otherwise, 0 is returned, the converted IP address is stored in the parameter "p.
(3) char * inet_ntoa (struct in-ADDR in): converts a 32-bit binary IP address to an IP address in the numerical point format. The result is returned in the return value of the function, returns a pointer to a string.

8. byte processing functions
The socket address is multi-byte data and does not end with a null character, which is different from the string in C language. Linux provides two groups of functions to process multi-byte data. One group starts with B (byte) and is compatible with the BSD system. The other group starts with MEM (memory, is a function provided by ansi c.
Functions starting with B include:
(1) void bzero (void * s, int N): sets the first n Bytes of the memory specified by parameter S to 0, which is usually used to clear the socket address 0.
(2) void bcopy (const void * SRC, void * DEST, int N): copy the specified number of bytes from the memory area specified by the SRC parameter to the memory area specified by the Dest parameter.
(3) int bcmp (const void * S1, const void * S2, int N ): compare the content of the first n Bytes of the memory region specified by parameter S1 and the memory region specified by parameter S2. if the content is the same, 0 is returned; otherwise, non-0 is returned.
Note: The prototype of the above function is defined in strings. h.
Functions starting with mem include:
(1) void * memset (void * s, int C, size_t N): set the first n Bytes of the memory area specified by parameter S to the content of parameter C.
(2) void * memcpy (void * DEST, const void * SRC, size_t N): function is the same as bcopy (), difference: function bcopy () if the regions specified by the SRC and DEST parameters overlap, memcpy () cannot.
(4) int memcmp (const void * S1, const void * S2, size_t N): Compares the content of the first n Bytes of the specified region of S1 and S2, if the values are the same, 0 is returned. Otherwise, non-0 is returned.
Note: The prototype of the preceding function is defined in string. h.

9. Basic socket Functions
(1) socket ()
# Include <sys/types. h>
# Include <sys/socket. h>
Int socket (INT domain, int type, int Protocol)
The parameter domain specifies the protocol family of the socket to be created, which can be the following values:
Af_unix // Unix domain protocol family, used for inter-process communication between local machines
Af_inet // Internet protocol family (TCP/IP)
Af_iso // ISO protocol family
The parameter type specifies the socket type, which can be the following values:
Sock_stream // stream socket, connection-oriented and reliable communication type
Sock_dgram // datagram socket, non-connection-oriented and unreliable communication type
Sock_raw // The original socket, which is only valid for the Internet protocol and can be used to directly access the IP protocol
The parameter protocol is usually set to 0, indicating that the default protocol is used. For example, the streaming socket in the Internet protocol family uses the TCP protocol, while the datagram socket uses the UDP protocol. When the socket is of the original socket type, you must specify the Protocol Parameter because the original socket is valid for multiple protocols, such as ICMP and IGMP.
In Linux, the main operation to create a socket is to create a socket data structure in the kernel, and then return a socket descriptor to identify the socket data structure. The socket data structure contains various information about the connection, such as the Peer address, TCP status, and sending and receiving buffer. The TCP protocol controls the connection based on the content of the socket data structure.
(2) function connect ()
# Include <sys/types. h>
# Include <sys/socket. h>
Int connect (INT sockfd, struct sockaddr * servaddr, int addrlen)
The parameter sockfd is the socket descriptor returned by the socket function. The parameter servaddr specifies the socket address of the remote server, including the Server IP address and port number. The parameter addrlen specifies the length of the socket address. If the call succeeds, 0 is returned. Otherwise,-1 is returned, and the global variable is set to any of the following error types: etimeout, econnrefused, ehostunreach, or enetunreach.
Before calling function connect, the client needs to specify the socket address of the server process. Generally, the client does not need to specify its own socket address (IP address and port number). The system automatically selects an unused port number from the port number range from 1024 to 5000, then fill in the socket address with the port number and the local IP address.
The client calls function connect to establish a connection. This function starts the three handshakes of the TCP protocol. Returns the function after a connection is established or when an error occurs. Possible connection errors include:
(1) If the client TCP protocol does not receive confirmation of its SYN data segment, the function returns an error with the error type etimeout. Generally, after the SYN Data Segment fails to be sent, the TCP protocol sends the SYN Data Segment multiple times. After all the sends fail in high school, the function returns an error.
Note: SYN (synchronize) bit: Request connection. TCP uses this data segment to establish a connection to the other party's TCP request. In this data segment, TCP notifies the other party of the selected initial serial number and negotiates the maximum data segment size with the other party. The serial number of the SYN data segment is the initial serial number, which can be confirmed. When the Protocol receives confirmation of this data segment, establish a TCP connection.
(2) If the remote TCP protocol returns an rst data segment, the function immediately returns an error with the error type econnrefused. When no service process is waiting for a connection between the remote machine and the target port number specified by the SYN data segment, the TCP protocol of the remote machine sends an rst data segment and reports this error to the client. The client's TCP protocol no longer sends SYN data segments after receiving the RST data segment, and the function immediately returns an error.
Note: The RST (reset) Bit indicates the request to reset the connection. When the TCP protocol receives a data segment that cannot be processed, it sends this data segment to the peer TCP protocol, indicating that the connection identified by this data segment has encountered an error and requests the TCP protocol to clear the connection. In three cases, the TCP protocol may send the RST Data Segment: (1) the receiving process is not waiting at the destination port specified by the SYN data segment; (2) the TCP protocol wants to discard an existing connection; (3) TCP receives a data segment, but the connection identified by this data segment does not exist. The TCP protocol that receives the RST data segment immediately disconnects the connection abnormally and reports an error to the application.
(3) If the SYN data segment of the client causes a router to generate an ICMP message of the "Destination inaccessible" type, the function returns an error and the error type is ehostunreach or enetunreach. Generally, after receiving the ICMP message, the TCP protocol records the message and sends the SYN Data Segment several times. After all the sending fails, the TCP protocol checks the ICMP message, function returns an error.
Note: ICMP: Internet message control protocol. The operation of the Internet is mainly controlled by the Internet Router. The router sends and receives IP data packets. If an error occurs when sending data packets, the router uses the ICMP protocol to report these errors. ICMP data packets are transmitted in the data section of IP data packets in the following format:
Type
Code
Checksum
Data
0 8 16 24 31
Type: indicates the ICMP data packet type.
Code: provides further information about ICMP packets.
Checksum: provides a checksum for the entire ICMP packet content.
ICMP data packets are classified into the following types:
(1) the destination cannot be reached: A. The destination host is not running; B. The destination address does not exist; C. There are no entries corresponding to the destination address in the routing table, therefore, the router cannot find the route to the target host.
(2) Timeout: the router will subtract 1 from the TTL domain of the received IP packet. If the value of this domain changes to 0, the router will discard this IP packet, and send this ICMP message.
(3) parameter error: an error occurs when an IP packet contains an invalid domain.
(4) redirection: notifies the host of a new path.
(5) echo request and echo answer: the two message terms test whether the target host can arrive. The requester sends an echo request ICMP packet to the target host. After receiving the ICMP packet, the target host returns an echo to answer the ICMP packet.
(6) timestamp request and timestamp answer: the ICMP protocol uses these two messages to get the current time of the clock from other machines.

When the client TCP protocol sends the confirmation of the SYN data segment, the TCP status changes from the closed status to the syn_sent status. After receiving the confirmation of the SYN data segment, the TCP status is converted to the established status, and the function is returned successfully. If function connect fails to be called, close the socket descriptor. You cannot use this socket descriptor to call function connect again.

Note: TCP status transition diagram:

Passive open close active open
(Create TCB) (delete TCB) (create TCB,
Send SYN)
Receive SYN send
(Send SYN, ACK) (send SYN)

Ack for receiving Syn (no action)
Ack receiving SYN receives SYN, Ack
(No action) (send ACK)
Close
(Send fin) close to receive fin
(Send fin)

Receive fin
Ack receiving fin (no action) (send ACK) Close (send fin)

The ack that receives the fin receives the ACK of the fin.
(Send ACK) (no action)

2msl timeout (delete TCB)
(3) function BIND ()
The BIND function binds the local address and socket. Its definition is as follows:
# Include <sys/types. h>
# Include <sys/socket. h>
Int BIND (INT sockfd, struct sockaddr * myaddr, int addrlen );
The sockfd parameter is the socket descriptor returned by the function sockt, The myaddr parameter is the local address, and the addrlen parameter is the length of the socket address structure. If the execution succeeds, 0 is returned. Otherwise,-1 is returned, and the global variable errno is set to the error type eaddrinuser.
Both the server and client can call the function bind to bind the socket address, but generally the server calls the function bind to bind its own recognized port number. The binding operation can be combined in the following ways:
Table 1
Program type
IP address
Port Number
Description
Server
Inaddr_any
Non-zero value
Specifies the accepted Port Number of the server.
Server
Local IP Address
Non-zero value
Specify the Server IP address and recognized port number
Client
Inaddr_any
Non-zero value
Specify the connection Port Number of the Client
Client
Local IP Address
Non-zero value
Specify the IP address of the client to connect to the port number.
Client
Local IP Address
Zero
IP address of the specified Client
They are described as follows:
(1) The server specifies the accepted Port Number of the socket address, do not specify the IP Address: that is, when the server calls bind, set the socket IP address as a special INADDE-ANY, indicates that it is willing to receive client connections from any network device interfaces. This is the most common binding method for servers.
(2) The accepted port number and IP address of the socket address specified by the server: when the server calls bind, if the IP address of the socket is set to a local IP address, this means that this machine only receives client connections from the interfaces of specific network devices corresponding to this IP address. When the server has multiple NICs, this method can be used to limit the server's receiving range.
(3) connection Port Number of the socket address specified by the Client: Generally, the client does not need to specify the port number of its socket address when calling the connect function. The system automatically selects an unused port number for it and fills in the corresponding items in the socket address with a local IP address. But sometimes the client needs to use a specific port number (such as a reserved port number), and the system does not automatically assign a reserved port number to the client. Therefore, you need to call the function bind to bind an unused reserved port number.
(4) Specify the IP address and port number of the client: the client uses the specified network device interface to communicate with the port number.
(5) Specify the IP address of the client: the client uses the specified network device interface to communicate with the port number. The system automatically selects an unused port number for the client. It is generally used only when the host has multiple network device interfaces.
We generally do not use a fixed client port number on the client unless it is required. Using a fixed port number on a client has the following Disadvantages:
(1) take the initiative to close the server: the server finally enters the time_wait status. When the client connects to the server again, the same client port is still used, so the connection is exactly the same as the socket pair of the previous connection, but the first connection is in the time_wait status, the connection request is rejected. The function connect returns an error and the error type is econnrefused.
(2) take the initiative to close the client: the client enters the time_wait status. When the client program is executed again immediately, the client will continue to bind the fixed client port number, but the previous connection is in the time_wait status and does not disappear. The system will find that the port number is still in use, therefore, this binding operation failed. The BIND function returns an error and the error type is eaddrinuse.
(4) function listen ()
The listen function converts a socket to an listening socket, which is defined as follows;
# Include <sys/socket, h>
Int listen (INT sockfd, int backlog)
The sockfd parameter specifies the socket descriptor to be converted. The backlog parameter sets the maximum length of the Request queue. If the execution is successful, 0 is returned. Otherwise,-1 is returned. The listen function has two functions:
(1) convert an unconnected active socket (a socket created by function socket that can be used for active connection but cannot accept connection requests) into a passive connection socket. After listen is executed, the TCP status of the server is changed from closed to listen.
(2) The Connection Request queue to which the TCP protocol will arrive. The second parameter of the listen function specifies the maximum length of the queue.
Note: The function of the backlog parameter is as follows:
TCP maintains two queues for each listening socket:
(1) Unfinished Connection queue: Each TCP connection that has not completed three handshakes occupies one of the queues. TCP wants the instrument to create a new entry in this queue after receiving a client SYN data segment, then, confirm the client SYN data segment and its own SYN data segment (ACK + SYN data segment), and wait for the client to confirm its own SYN data segment. The socket is in the syn_rcvd state. This entry will be stored in this queue until the client returns confirmation of the SYN data segment or connection timeout.
(2) complete connection queue: Each TCP connection that has completed three handshakes but has not been received by the application (call the function accept) occupies one of the queues. After a connection in the unfinished connection queue receives a confirmation of the SYN data segment, it completes three handshakes, And the TCP protocol moves it from the unfinished connection queue to the completed connection queue. The socket is in the established state. This entry will be stored in this queue until the application calls the function accept to receive it.
The backlog parameter specifies the maximum length of the complete connection queue of a listening socket, indicating the maximum number of unreceived connections that the socket can receive. If the Completion queue of the listening socket is full when the SYN data segment of a client arrives, the TCP protocol ignores this SYN data segment. For SYN data segments that cannot be received, TCP does not send rst data segments,
(5) function accept ()
The function accept receives a established TCP connection from the Completion queue of the listening socket. If the connection queue is empty, the process goes to sleep.
# Include <sys/socket. h>
Int accept (INT sockfd, struct sockaddr * ADDR, int * addrlen)
The sockfd parameter specifies the feature socket descriptor. the ADDR parameter is a pointer to an internet socket address structure, and the addrlen parameter is a pointer to an integer variable. When the execution is successful, three results are returned: the return value of the function is a new socket descriptor that identifies the received connection; the client address is stored in the Structure Variable pointed to by the ADDR parameter; the length of the client address is stored in the integer variable pointed to by the addrlen parameter. -1 is returned if the request fails.
The snlistening socket is used to receive client connection requests and complete three handshake operations. Therefore, the TCP protocol cannot use the snlistening socket descriptor to identify the connection, therefore, TCP creates a new socket to identify the connection to be received and play its descriptor to the application. Now there are two sockets, one is the feature listening socket used to call the function accept, and the other is the connection socket (connected socket) returned by the function accept ). A server usually only needs to create a listening socket. During the entire activity of the server process, it is used to receive connection requests from all clients and close this listening socket before the server process ends; for any received (accepted) connection, a new connection socket is created for TCP to identify the connection. The server uses this connection socket to communicate with the client, when the server completes the client request, close the connection socket.
When the function accept blocks and waits for the established connection, if the process captures the signal, the function returns an error with the error type eintr. In this case, the accept function is called again to receive connections.
(6) function close ()
Function close closes a socket descriptor. Definition:
# Include <unistd. h>
Int close (INT sockfd );
If the execution is successful, 0 is returned; otherwise,-1 is returned. Like the close of the Operation file descriptor, function close reduces the reference counter of the socket descriptor by 1. If the reference count of the descriptor is greater than 0, there are other processes that reference this descriptor, and the function close returns normally; if the value is 0, the socket descriptor clearing operation is started, and the function close returns normal immediately.
After close is called, the process will no longer be able to access this socket, but the TCP protocol will continue to use this socket, transfer unsent data to the other party, then send the fin data segment, and perform the close operation, the socket will not be deleted until the TCP connection is completely closed.
(7) functions read () and write ()
Used to read and write data from a socket. Definition:
Int read (int fd, char * Buf, int Len)
Int write (int fd, char * Buf, int Len)
When the function is successfully executed, the size of the read or write data is returned, and-1 is returned if the function fails.
Each TCP socket has two buffers: the socket sending buffer and the socket receiving buffer, which process the sending and receiving tasks respectively. Data Reading and Writing from the network is completed by the TCP protocol in the kernel: the TCP protocol saves the data received from the network in the receiving buffer of the corresponding socket, wait for the user to call the function to copy the data from the receiving buffer to the user buffer; copy the data to be sent to the sending buffer of the corresponding socket, and then process the data by the TCP protocol according to certain algorithms.
Similar to reading and writing files, you can also use the Read and Write Functions. The function read completes copying data from the socket receiving buffer to the user buffer. When the socket receiving buffer has data readable, 1: The readable data volume is greater than the value specified by the function read, returns the data volume specified by the function parameter Len. 2: If the measured data volume is smaller than the value specified by the function read, the function read immediately returns the actual data volume without waiting for all data in the request to arrive; when no data is readable, the function read will block and not return, waiting for the data to arrive.
When the TCP protocol receives the fin data segment, it is equivalent to a file Terminator for the read operation. At this time, the READ function returns 0 and all subsequent read operations on this socket will return 0, this is the same as a file terminator in a common file.
When the TCP protocol receives the RST data segment, it indicates that the connection has encountered an error. The function read will return an error with the error type econnereset. In addition, all subsequent read operations on this socket will return errors. The returned value is less than 0.
Function write completes the task of copying data from the user buffer to the socket sending Buffer: When the socket sending buffer has enough space to copy all user data, function write copies the data to this buffer, and return the size of the number of elders. If the available space is smaller than the size specified by the write parameter Len, the function write will block and not return, waiting for the buffer to have enough space.
When the TCP protocol receives the RST data segment (when the other party has closed this connection, continuing to send data to this socket will cause the other Party to return the RST data segment to the TCP protocol ), when the TCP protocol receives the RST data segment, the function write will return an error and the error type is eintr. You can continue writing data on this socket later.
(8) functions getsockname () and getpeername ()
The getsockname function returns the local address of the socket. The getpeername function returns the remote address of the socket.

10. Conclusion
Network Programming relies entirely on sockets to receive and send information. The preceding section describes the basic concepts of socket in Linux, The Sockets API, And the TCP knowledge related to socket.

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/hairetz/archive/2009/05/29/4223222.aspx

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux socket programming (byte processing)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux socket programming (byte processing)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support