Linux Programming Design-sockets

Last Update:2016-10-22 Source: Internet

Author: User

Tags readable set socket truncated

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sockets

sockets, another way of communicating between processes. Prior IPC mechanisms can only be limited to a single computer system for resource sharing. The socket interface allows the process on one machine to communicate with the process on another machine.

What is a socket

Sockets are a communication mechanism whereby a client/server system can work on a local machine or across a network.
sockets and pipe types are also operations that read and write class file descriptors. The difference is that the socket explicitly separates the client from the server. The socket mechanism enables multiple clients to connect to a single server.

Socket connections

First, the server application uses the socket to create a socket, which is a resource of a similar file identifier that the system assigns to the server process, and it cannot be shared with other processes. PS: Threads do not seem to work.
Next, the server process will give the socket a name. The name of the local socket is the file name in the Linux file system. For a network socket, its name is the service identifier (port number or access point) that is related to the specific network to which the customer is connected. This identifier will allow Linux to move into the correct server process for a connection to a specific port number. For example, a Web server typically creates a socket on port 80, which is an identifier dedicated to this purpose. The Web browser knows that for the Web node that the user wants to access, the HTTP connection should be established using port 80来.
We use the system to call bind to name the socket (associated with a local file) and then the server process begins to wait for the client to connect to the named socket. The role of system call Listen is to create a queue and use it to hold request connections from customers. The server accepts the client's connection through a system call to accept. When the server calls accept , it creates a new socket that is different from the original named socket. This socket is used only to communicate with this particular customer, while the named socket is retained to continue processing connections from other customers.
The client base of the socket system is simpler. The customer first calls the socket to create an unnamed socket, and then calls connect with the server as an address (or identifier) to establish a connection.
Once the connection is established, we can use the socket as the underlying file descriptor to achieve bidirectional data communication.

Socket properties

The properties of the socket are determined by 3 properties: domain, type, and protocol (protocol)

The
socket domain
field specifies the network media used in socket communication. The most common socket field for
- is af_inet, which refers to an Internet network. Many Linux LANs use this network.
- There is also a domain that is a UNIX file system domain Af_unix, even if a socket on a networked computer can use this domain. The underlying protocol for this domain is file input/output, and its address is the filename. When you run this program, you can see the address in the current directory. There may be more than one server running on the
- server computer. Customers can specify a specific service on a machine via an IP port. Inside the system, the port is identified by assigning a unique 16-bit integer, which, outside the system, needs to be determined by a combination of IP address and port number. The socket acts as the endpoint of the communication, and it must bind a port before starting the communication. Well-known services usually have some ports, such as FTP (21) and httpd (80). Do not arbitrarily select ports to avoid port occupancy. In general, a port number less than 1024 is reserved for system services.
Socket type
A socket field may have many different modes of communication, and each communication method has its own different characteristics. However, the Af_unix domain socket does not have such a problem, it provides a reliable two-way communication path. In the network domain, we need to pay attention to the characteristics of the underlying network, and how different communication mechanisms are affected by them.
The Internet provides two communication mechanisms: stream and packet (datagram). They have different levels of service.
- Stream sockets
  A stream socket (in some respects similar to a standard input/output stream) provides an ordered, reliable, bidirectional byte stream connection. PS: Use the ordinal mechanism in TCP to ensure that large messages will be fragmented, transferred, and re-reorganized. This is much like a file stream, which accepts large amounts of data and writes them to the underlying disk as small chunks of data.
  Flow sockets are specified by the underlying sock_stream, which are implemented through TCP/IP connections in the af_inet domain.
  
  TCP/IP stands for the transmission Control Protocol (transmission-PROTOCOL)/Internetwork Protocol (Internet Protocol). The IP protocol is the underlying protocol for a packet that provides a route for a computer to reach another computer over the network. The TCP protocol provides sequencing, streaming, and retransmission to ensure that the transmission of big data can reach the destination completely or report an appropriate error condition.
- Datagram sockets
  In contrast to a stream socket, a datagram socket specified by type SOCK_DGRAM does not establish and maintain a connection. It has a limit on the length of datagrams that can be sent. Datagrams are transmitted as a separate network message, which may be lost, copied, or unordered.
  Datagram sockets are implemented through UDP/IP connections in the Af_inet domain, which provides an unordered, unreliable service. From a resource perspective, however, they are relatively inexpensive and therefore do not need to maintain network connectivity. And it doesn't take time to make connections, so they're fast.
  The packet is intended for a "one-time (single-shot)" Query in the information service, which is used primarily to provide daily state information or to perform low-priority logging. The crash of the server will not cause inconvenience to the customer and will not require the customer to restart.
Socket protocol
Temporarily use the default value.

Creating sockets

The socket system call creates a socket and returns a descriptor that can be used to access the socket.

intsocket(intintint protocol);

The socket created is an endpoint of a communication line. DOMIN Specifies protocol family, type specifies the type of communication for this socket, protocol specifies the protocol used.
-The value of the parameter dome includes: Af_unix and af_inet The former is used for UNIX and Linux file systems to implement local sockets, which are used for UNIX network sockets, which communicate through TCP/IP networks, including the Internet.
-parameter type values include: Sock_stream and Sock_dgram
-Protocol: Usually no selection is required, setting this parameter to 0 means using the default protocol.

Socket address

Each socket field has an address format between them.

In the Af_unix domain, the address of the socket is represented by the structure Sockaddr_un, which is defined in the header file Sys/un.h.
```
struct  sockaddr_un {Sa_ family_t sun_family;  /* af_inet*/     char  sun_path[]; /*pathname*/};  
```
In the Af_unix domain, the address of the socket is specified by Char sun_path[] .

In the Af_inet domain, the address of the socket is specified by sockadd_in. The structure is defined in Netinet/in.h, which includes at least the following members:

struct sockaddr_in {    shortint   /*AF_INET*/    unsignedshortint  sin_prot;   /*Port number*/    struct in_addr  sin_addr;   /*Internet address*/};struct in_addr {    unsignedlongint s_addr；}；

Named sockets

For sockets created by the socket to be used by other processes, the server program must name the socket and associate the socket with the pathname of a file system.

int bind(int sockfd,conststruct sockaddr * address,siz_t address_len);

The address value of the BIND system call addressing is associated with the unnamed socket of the file descriptor socket. The length of the address structure body is passed by Address_len. A specific address struct pointer (struct sockaddr_in/un*) needs to be converted to perform a common address type (struct sockaddr*) when parameters are passed in. Successfully returned 0, failed to return 1 and set errno

Create a socket queue

In order to accept multiple sockets (client) connections, the server program must create a queue to hold the unhandled requests. It uses the listen system call to do this.

intlisten(intsocket,int backlog);

Arg1: Named Sockets, Arg2: The maximum length of the queue, that is, the number of clients waiting to be processed, exceeding which causes the client request to fail. Same as bind return, successful return 0, failure return-1 and set errno

Accept Connection

Accept the system call to wait for the client to establish a connection to the socket.

intaccept(intsocket*address_len);

The Accept system call is returned only if the client tries to connect to the socket specified by the socket parameter. The customer here refers to the first unhandled connection that is queued in the socket queue.
A socket must first be associated with a file system's pathname (that is, named for a socket) by a bind call, which is then assigned a connection queue by the listen call. The address of the connecting customer will be placed in the SOCKADDR structure pointed to by the address parameter. If you do not care about the address of the client, set its value to null.
The parameter Address_len specifies the length of the customer structure. If the length of the customer address exceeds this value, it will be truncated.
If there are no unhandled connections in the socket queue. The accept will block until a customer establishes a connection. We can change this behavior by setting the O_NONBLOCK flag on the socket descriptor, using the function Fcntl.

intfcntl(socket,F_GETFL,0);fcntl(socket,F_SETFL,O_NONBLOCK | flags)

When there is an unhandled client connection, the Accept function returns a new socket descriptor. When sending an error, return-1 and set errno

Request Connection

The client program connects to the server by establishing a connection between an unnamed socket and a server listener socket. Using the Connect call

int connect(intconststruct sockaddr *address, size_t address_len);

The socket specified by the socket will be connected to the server socket specified in the parameter address, and the length of the structure to which the address points is specified with the parameter Address_len. The socket specified by the parameter sockfd must be called by the socket to obtain a valid file descriptor. Successfully returned 0, failed to return-1, set errno.
If the connection cannot be established immediately, the connect call will be blocked for an indeterminate time-out. Once the timeout period arrives, the connection is discarded and the connect call fails. However, if the connect call is interrupted by a signal and the signal is processed, connect will fail.

Close socket

Use the close function to terminate the socket connection on the server and the client as if the underlying file descriptor were closed. You should always close the socket on both ends of the connection. For the server, the socket should be closed when the read call returns 0, i.e. no data readable, but if a socket is a connection-oriented type and the Sock_linger option is set, the close call will block when the socket has no data to transmit. To do this, you need to set socket options.

Socket communication

The disadvantage of file system sockets is that, unless the programmer uses an absolute pathname, the socket is created under the server's working directory. In order to accept the client's connection, you need to create a server and a global directory accessible to the customer (e.g./tmp directory). For network sockets, you only need to select a port number that is not used. Other port numbers and services provided by them are usually listed in the System file/etc/services, select the port number, and be careful not to select the port number in the configuration file. PS: Actually do not see, not so good luck can meet, as long as the choice of port number in 3000 or more, should be OK. (╯▽╰)
For demonstration purposes, we will use this back road network (a circuit that contains only its own loop (loopback)). The Loop network is useful for debugging network applications because it excludes any external network problems. The Back road network contains only one computer, traditionally known as localhost, standard address: 127.0.0.1.
Every network that communicates with a computer has a hardware interface associated with it. A computer may have a different network name in each of the networks. There are, of course, several different IP addresses.

Host byte order and network byte order

When running a new version of the server and client on a Linux machine, we can use the Netstat command to view the network connection.

-A inetActive Internet connections (w/o servers)Proto Recv-Q Send-QLocal Address           Foreign Address         Statetcp        0      0 localhost:9741          localhost:51246         TIME_WAIT

You can now see this corresponding server and customer's port number. The address and port number of the server displayed in the Local address column, Foreign the address and port number of the remote client displayed in the Address column.
However, when you select a port of 3366 in the program, the output in the command is 9741. Why is it different? The answer is: the port number and address passed through the socket are binary digits. Different computers use different byte sequences to represent integers. For example: the Inter processor uses the small endian byte sequence and the network transmission uses the big end byte order. Big endian byte order means that the address of the data becomes larger with the address of the memory.
In order for the different types of computers to agree on the value of multiple byte integers that can be transmitted over the network, you need to define a network byte order. The client and server programs must convert their internal integer representations to network byte order before they are transmitted. implemented by the following functions:

unsigned Long intHTONL (unsigned Long intHostlong);unsigned  Short intHtonsunsigned  Short intHostshort);unsigned Long intNtohl (unsigned Long intNetlong);unsigned  Short intNtohs (unsigned  Short intNetshort);

These functions convert both 16-bit and 32-bit integers between the host byte order and the standard network byte order. The function name is a shorthand for the conversion operation that corresponds to it. If the computer itself has the same host byte order and network byte order, these operations are actually empty operations.
The netstat operation after the conversion.

-A inetActive Internet connections (w/o servers)Proto Recv-Q Send-QLocal Address           Foreign Address         Statetcp        0      0 localhost:3366          localhost:34411         TIME_WAIT

In order for clients and servers on different architectures to operate correctly, it is always necessary to use these conversion functions in a network program.

Network Information

So far, our client and server programs have been compiling addresses and port numbers inside the program. For a more general purpose server and client program, we can use the network information function to determine the address and port that should be used.
If you have sufficient permissions, you can add your own service to the list of known services in the/etc/services file and assign a name to the port number in this file, allowing the user to use the symbolic service name instead of the number of the port number.
Similarly, if given the name of a computer, you can determine its IP address by calling the host database function that resolves the address. These functions do this by querying a network configuration file, such as/ect/hosts files or network information services. Commonly used network information Services are NIS (Network Information Service, Web information Services, formerly called Yellow Pages, Yellow Pages Service) and DNS (Domain Name service, nameservers)
The host database function is declared in the interface header file Netdb.h.

struct hostent *gethostbyaddr(constvoidint type);struct hosten *gethostbyname(constchar *name);

The data structures returned by these functions will contain at least the following members:

struct hostent {    char  *h_name;            /* official name of host */    char **h_aliases;         /* alias list */    int    h_addrtype;        /* host address type */    int    h_length;          /* length of address */    char **h_addr_list;       /* list of addresses */}

If there are no data items for the host or address that we are querying, these information functions will return a null pointer.
Similarly, information related to the service and its associated port number can also be obtained through some service information functions.

struct servent *getservbyname(constchar *name,constchar *proto);struct servent *getservbyport(int port,constchar *proto);

The proto parameter specifies the protocol used to connect to the service, which has two values tcp and udp that is used for TCP connections of type Sock_stream, which is used for UDP datagrams of type Sock_dgram.
The data structure that is returned contains at least the following members:

struct servent {    char * s_name;  /* name of the service*/    char ** s_aliases;  /* list of aliases*/    int/* The IP port number*/    char *s_proto;  /* The service type, usually "tcp" or "udp" */};

If you want to get the host database information on a computer, you can call that Gehostbyname function and print the results. Note that to convert the returned address list information to the correct IP address type, use the function Int_ntoa to print them from the network byte order to the host byte-order string.

char *inet_ntoa(structin);

The function is to convert the Internet host address into a string of four-tuple format. Returns 1 on failure, but the POSIX specification does not define an error type.
Last function: GetHostName, get the name of the host

int gethostname(charint legth);

The function that writes the name of the current host's host to the string that the name points to. The host name will be null-terminated. The length of the parameter specifies the string name, which is truncated if the host name returned is too long. The call successfully returns 0, the failure returns 1, and the appropriate setting errno

Write a program that connects to a standard service: Use the gethostbyname IP address of the host, getservbyname get the port of the service, and finally use Connect to request the service.

Multi-Client

Goal: How to allow a single server process to process multiple customers without clogging and waiting for a customer request to arrive.

Select System Call

When writing a Linux application, we often encounter situations where several inputs need to be checked to determine the next action. If it is in a single-user system, running a "busy wait" loop is acceptable, it constantly scans the input device to see if there is data, and reads it if there is data arriving. However, this practice consumes CPU time.
The select System call allows the program to wait for input arrival (or completion of output) on multiple underlying file descriptors at the same time. This means that the program can always be blocked until something has to be done. Similarly, a server can handle multiple customers by waiting for the request to arrive on multiple open sockets at the same time.
The SELECT function operates on the data structure Fd_set, which is a collection of open file descriptors. There is a set of defined macros to control this collection.

*fdset);void FD_CLR(int*fdset);void FD_SET(int*fdset);int FD_ISSET(int*fdset);

As the name implies, Fd_zero is used to initialize Fd_set to an empty collection, and fd_set,fd_clr is to add and remove file descriptors that are passed into the collection by FD. If the parameter FD in the Fd_isset macro belongs to an element in Fd_set, the Fd_isset macro returns a value other than 0. The maximum number of file descriptors that can be accommodated in a FD_SET structure is specified by a constant fd_setsize.
The SELECT function prevents an indefinite blockage by a timeout value. This timeout value is given by a timeval structure. This structure is defined in the header file Sys/time.h, which consists of the following members:

struct timeval {    time_t  tv_sec;     /* seconds */    long    tv_usec;    /* microseconds */};

Type time_t is defined as an integer type in the header file sys/types.h.
The prototype for the select system call is:

intselect(int*readfds*writefds*except*timeout);

The select call is used in the test file descriptor collection, if a file descriptor is already in a readable, writable, or other state, after which it is blocked to wait for a file descriptor to enter the above state.
The parameter Nfds specifies the number of file descriptors that need to be tested, and the test descriptor ranges from 0 to nfds-1. 3 file descriptors are set to NULL, indicating that the corresponding test is not performed.
The Select function returns when: Readfds readable, Writefds writable, Exceptfds corresponding (not knowing when the following occurs, when an error is defined in the old version. If none of the three cases are sent, the function returns after the specified time-out period of timeval, and if the Timeval parameter is empty and the socket does not enter those three states, the call will continue to block.
When select returns, the Descriptor collection is modified to indicate which descriptors are in a readable, writable, or other state. Then use Fd_isset to test the descriptor to find the descriptor that needs to be processed. If select is returned because of a timeout, all descriptor collections will be emptied.
The success of the Select call returns the total number of descriptors that have changed, returns 1 with an error, and sets errno.

Multi-Client

The server can have the select call check both the listener socket and the client's connection socket. Once the select call indicates that there is an activity occurring, you can use Fd_isset to iterate through all the attached file descriptors to check which socket on which the activity occurred.
If the socket is readable, this indicates that a client is requesting a connection, that is, the server uses the socket function to create a socket to describe the have activity, at which point the Accept function can be called accepting the client's connection, and if a client socket descriptor is active, a client needs to read and write on the service side. If the read operation returns 0 indicates that a client process has ended, you can close the socket and remove it from the Descriptor collection.

Data package

In some cases, it is not necessary to spend time in a program to establish and maintain a socket connection.
When a user needs a short data query and expects to receive a short response, we generally use the services provided by UDP. For example, the daytime service on the host.
Because UDP provides unreliable services, you can find that packets or responses are lost. If the packet is important to you, you need to be careful about writing a UDP program to check for errors and to re-transmit them if necessary.
With UDP packets, you need to use SendTo and recvfrom instead of the original read and write calls on the socket.

The SendTo system call sends a packet from the buffer buffers to the destination server that uses the specified socket address.

int sendto(intvoidintstruct sockaddr *to, socklen tolen);

In a normal call, the flags parameter is generally set to 0.
The RECVFROM system calls the packet waiting on the socket to come from a specific address and puts it into the buffer buffers.

int recvfrom(intvoidintstruct sockaddr *to, socklen fromlen);

In a normal call, the flags parameter is generally set to 0.
Two function calls, number of characters successfully returned, error returned-1, and set errno

Through the above function can create a UDP server, a use SendTo function to send data, a use of recvfrom function to connect data. In addition setsocketopt , the time-out of the socket descriptor is set by using a function. It can also be sigaction implemented by using the signal access method.

Linux Programming Design-sockets

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More