Socket principle Detailed

Source: Internet
Author: User
Tags bind data structures semaphore port number

1. How processes communicate across the network

The concept of process communication originated from a stand-alone system. Since each process operates within its own address range, it is guaranteed that two communication

and coordinated work between the processes, the operating system provides the corresponding facilities for process communication, such as

UNIX BSD: Piping (pipe), named pipe (named pipe) soft interrupt signal (signal)

UNIX System V has: message, shared memory, and semaphore (semaphore).

They are limited to communicating between native processes. Inter-network process communication is to solve the problem of communication between different host processes (the same machine process communication can be regarded as a special case). To do this, the first thing to solve is the inter-network process identification problem. On the same host, different processes can be uniquely identified by the process ID. However, in the network environment, the process number assigned by each host individually cannot uniquely identify the process. For example, host A is assigned to a process number 5, and the number 5th process can exist in machine B, so the phrase "process 5th" is meaningless. Second, the operating system supports a large number of network protocols, different protocols work in different ways, address format is also different. Therefore, the inter-network process communication also solves the problem of multi-protocol recognition.

In fact, the TCP/IP protocol family has helped us solve this problem, the network layer "IP address " can uniquely identify the host in the network, and the Transport layer " protocol + port " can uniquely identify the host application (process). By using triples (IP address, Protocol, port), the process of the network can be identified, and the process communication in the network can use this flag to interact with other processes.

Applications that use the TCP/IP protocol typically use the application programming interface: The UNIX BSD socket (socket) and the Unix System v Tli (already obsolete) to enable communication between network processes. For now, almost all applications are sockets, and now it's the network era, where process communication is ubiquitous, and that's why I say "everything is a socket."

2. What is TCP/IP, UDP

TCP/IP (transmission Control protocol/internet Protocol) is a protocol/inter-network protocol, an industry-standard set of protocols designed for wide area networks (WANs).

The TCP/IP protocol exists in the OS and the Network service is provided through the OS, adding TCP/IP-enabled system calls to--berkeley sockets, such as SOCKET,CONNECT,SEND,RECV, in the OS

UDP (User data Protocol, Subscriber Datagram Protocol) is the protocol that corresponds to TCP. It is a part of the TCP/IP protocol family. As shown in figure:


The TCP/IP protocol family includes the transport layer, the network layer, the link layer, and the socket location as the figure, the socket is the application layer and the TCP/IP protocol Family Communication intermediate software abstraction layer.


3. What is socket 1 socket socket:

Sockets originate from UNIX, and one of Unix/linux's basic philosophies is that "all files" can be manipulated using the "open open–> Read and write write/read–> close" mode. Socket is an implementation of this pattern, the socket is a special kind of file, some of the socket function is to do it (read/write Io, open, close).
The socket is the intermediate software abstraction layer that the application layer communicates with the TCP/IP protocol family, which is a set of interfaces . In design mode, the socket is actually a façade mode, it is the complex TCP/IP protocol family hidden behind the socket interface, for the user, a set of simple interface is all, let the socket to organize data to meet the specified protocol.

Note: In fact, the socket does not have the concept of a layer, it is only a facade design mode of application, making programming easier. is a software abstraction layer. In the network programming, we use a lot of is through the socket implementation. 2. Socket Descriptor

is actually an integer, we are most familiar with the handle is 0, 1, 23, 0 is the standard input, 1 is the standard output, 2 is the standard error output. 0, 1, 2 is an integer representation, the corresponding file * structure is stdin, stdout, stderr

The socket API was originally developed as part of the UNIX operating system, so the socket API is integrated with other I/O devices in the system. In particular, when an application creates a socket (socket) for Internet communication, the operating system returns a small integer as a descriptor (descriptor) to identify the socket. The application then takes the descriptor as a pass parameter and invokes the function to accomplish something (such as transmitting data over the network or receiving input data).

In many operating systems, socket descriptors and other I/O descriptors are integrated, so applications can perform socket I/O or I/OS read/write operations on files.

When an application wants to create a socket, the operating system returns a small integer as a descriptor, and the application uses this descriptor to refer to the socket that requires an I/O request for an application that requests the operating system to open a file. The operating system creates a file descriptor that is provided to the application to access the file. From the application's perspective, a file descriptor is an integer that the application can use to read and write files. The following illustration shows how the operating system implements a file descriptor as a pointer array that points to an internal data structure.

There is a single table for each program system. To be precise, the system maintains a separate file descriptor table for each running process. When a process opens a file, the system writes a pointer to the internal data structure of the file to the file descriptor table, and returns the index value of the list to the caller. The application only needs to remember this descriptor and use it later when manipulating the file. The operating system uses the descriptor as an index to access the process descriptor, and the pointer finds the data structure that holds all the information for the file.

system data structures for sockets:

1), Socket API has a function socket, it is used to create a socket. The general idea of socket design is that a single system call can create any socket because the socket is fairly general. Once the socket is created, the application also needs to call other functions to specify the specifics. For example, calling the socket will create a new descriptor entry:

2), although the internal data structure of the socket contains many fields, most of the word fields are not filled in after the system creates the socket. After an application creates a socket, you must call other procedures to populate these fields before the socket can be used. 3. The difference between a file descriptor and a file pointer:

file Descriptor: opening a file in a Linux system will get the file descriptor, which is a small positive integer. Each process holds a file descriptor table in the PCB (process Control Block), which is the index of the list, and each table entry has a pointer to the open file.

File Pointers: The C language uses a file pointer as a handle to I/O. The file pointer points to a data structure in the process user area called the file structure. The file structure includes a buffer and a file descriptor. The file descriptor is an index to the file descriptor, so in a sense the file pointer is the handle to the handle (on the Windows system, the file descriptor is referred to as a file handle). For more information, see Linux file system: HTTP://BLOG.CSDN.NET/HGUISU/ARTICLE/DETAILS/6122513#T7


4. Basic socket interface function in life, a to call B,a dial, b to hear the phone ring after the call, then A and B to establish a connection, A and B can speak.  When the communication is over, hang up the phone and end the conversation. The call is a simple explanation of how this works: "Open-write/read-close" mode.



The server-side initializes the socket, then binds to the port (BIND), listens to the port (listen), calls the Accept block, waits for the client to connect. At this point if a client initializes a socket and then connects to the server (connect), the client-server connection is established if the connection is successful. The client sends the data request, the server receives the request and processes the request, then sends the response data to the client, the client reads the data, closes the connection, and ends the interaction at the end.

The implementation of these interfaces is done by the kernel. How to implement it, you can see the Linux kernel

4.1. Socket () function

        int  socket(int protofamily, int type, int protocol);//Return SOCKFD

SOCKFD is a descriptor.

The socket function corresponds to the open operation of the normal file. The open operation of a normal file returns a file descriptor, and the socket () is used to create a socket descriptor (socket descriptor), which uniquely identifies a socket. The socket descriptor is the same as the file descriptor, and subsequent operations are useful to it, using it as a parameter to perform some read and write operations.

Just as you can give fopen a different parameter value to open a different file. When creating a socket, you can also specify different parameters to create different socket descriptors, the three parameters of the socket function are: protofamily: That is, the protocol domain, also known as the Protocol Family (family). Common protocol families are af_inet (IPV4), Af_inet6 (IPV6), af_local (or Af_unix,unix domain sockets), Af_route, and so on. The protocol family determines the socket address type, must use the corresponding address in the communication, such as Af_inet decided to use the IPv4 address (32 bits) and the port number (16 bit) combination, Af_unix decided to use an absolute path name as the address. Type: Specifies the socket type. Common socket types are sock_stream, Sock_dgram, Sock_raw, Sock_packet, Sock_seqpacket, and so on (the type of socket). )。 Protocol: Therefore, the name of the idea is to specify the agreement. Commonly used protocols are, IPPROTO_TCP, IPPTOTO_UDP, IPPROTO_SCTP, IPPROTO_TIPC, respectively, they correspond to TCP transport protocol, UDP Transmission protocol, STCP transmission protocol, TIPC Transfer Protocol (this agreement will be discussed separately.) )。

Note : Not the above type and protocol can be arbitrarily combined, such as sock_stream can not be combined with IPPROTO_UDP. When protocol is 0 o'clock, the default protocol corresponding to type types is automatically selected.

When we call the socket to create a socket, it returns the socket descriptor that exists in the Protocol family (address family,af_xxx) space, but does not have a specific address. If you want to assign an address to it, you must call the bind () function, or the system will automatically randomly allocate a port when you call Connect (), listen (). 4.2. Bind () function

As mentioned above, the bind () function assigns a specific address in the address family to the socket. For example, the corresponding af_inet, Af_inet6 is to assign a IPv4 or IPv6 address and port number combination to the socket.

int bind (int sockfd, const struct SOCKADDR *addr, socklen_t Addrlen);

The three parameters of the function are: SOCKFD: The socket descriptor, which is created by the socket () function and uniquely identifies a socket. The bind () function is to bind a name to the description word. Addr: A const struct SOCKADDR * Pointer that points to the Protocol address to bind to SOCKFD. This address structure differs depending on the address protocol family at which the socket was created, as IPv4 corresponds to:

struct SOCKADDR_IN {
    sa_family_t    sin_family;/* Address family:af_inet */
    in_port_t      sin_port;   /* port in Network byte order */
    struct in_addr sin_addr;   /* Internet address */
};

/* Internet address. */
struct IN_ADDR {
    uint32_t       s_addr;     /* address in network byte order */
};
IPv6 corresponds to:
struct SOCKADDR_IN6 { 
    sa_family_t     sin6_family;   /* AF_INET6 */ 
    in_port_t       sin6_port;     /* Port number */ 
    uint32_t        sin6_flowinfo;/* IPV6 Flow information */ 
    struct in6_addr sin6_addr;     /* IPV6 Address 
    *        /uint32_t sin6_scope_id;/* Scope ID (new in 2.4) */ 
};

struct IN6_ADDR { 
    unsigned char   s6_addr[16];   /* IPV6 address */ 
};
The UNIX domain corresponds to the following:
#define UNIX_PATH_MAX    108

struct Sockaddr_un { 
    sa_family_t sun_family;               /* Af_unix */ 
    char        Sun_path[unix_path_max];  /* Pathname */ 
};
Addrlen: Corresponds to the length of the address.

Usually when the server is started to bind a well-known address (such as IP address + port number) to provide services, the client can be used to connect the server, and the client does not specify, there is a system automatically assigned a port number and its own IP address combination. This is why the server usually calls bind () before listen, and the client does not invoke it, but instead generates one randomly from the system at Connect (). network byte order and host byte order

host byte-order is what we normally call the big-endian and small-end patterns: Different CPUs have different byte-order types, which are the order in which integers are stored in memory, which is called the host order. The reference standard Big-endian and Little-endian are defined as follows:

A) The Little-endian is the low-bit bytes emitted at the lower address of the memory, high-bit bytes emitted in the memory of the higher address.

b) The Big-endian is the high-bit byte emitted at the low address of the memory, and the low byte is discharged at the upper address of the memory.

network byte order : The 4-byte value is transmitted in the following order: First, 0~7bit, followed by 8~15bit, then 16~23bit, and finally 24~31bit. This transmission order is called the big-endian byte order. because all binary integers in the TCP/IP header are required in this order when they are transmitted over the network, it is also referred to as the network byte order. the order of bytes, as the name implies, is greater than the order in which the data of a byte type is stored in memory, and a byte of data does not have a sequential problem.

So: When binding an address to a socket, first convert the host byte order into a network byte order, instead of assuming that the host byte order is Big-endian with the network byte order. As a result of this problem has caused a massacre. Because of this problem in the company project code, it leads to a lot of puzzling problems, so remember not to make any assumptions about the host byte-order, so be sure to convert it into a network byte order and assign it to the socket. 4.3, listen (), connect () function

If, as a server, the socket (), bind () is called after the Listen () is invoked to listen to the sockets, the server will receive this request if the client calls connect () to make a connection request.

int listen (int sockfd, int backlog);
int connect (int sockfd, const struct SOCKADDR *addr, socklen_t Addrlen);

The first parameter of the Listen function is the socket descriptor to listen to, and the second parameter is the maximum number of connections that the corresponding socket can queue. The socket created by the socket () function defaults to an active type, and the Listen function changes the socket to a passive type, waiting for the client's connection request.

The first parameter of the Connect function is the client's socket descriptor, the second parameter is the server's socket address, and the third parameter is the length of the socket address. The client establishes a connection to the TCP server by calling the Connect function. 4.4. The Accept () function

T

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.