Introduction to socket programming principles

Source: Internet
Author: User
Tags ftp protocol

 

 

1. How to Implement socket programming

The concept of Process Communication was initially derived from a standalone system. Since each process runs within its own address range, in order to ensure that two processes communicate with each other without mutual interference and coordination, the operating system provides relevant facilities for process communication, such as pipe in unix bsd, named pipe, singal, and message in UNIX System V) shared Memory and semaphore are all used for communication between local processes. Inter-network process Communication solves the communication problem between processes on different hosts (Process Communication on the same machine can be considered as a special case ). To solve this problem, we must first solve the problem of inter-network process identification. On the same host, different processes can be uniquely identified by a process ID (PID. However, in a network environment, the process numbers assigned by each host cannot uniquely identify the process. For example, if host a assigns process number 5 to host B, process number 5 can also exist on host B. Therefore, process number 5 is meaningless. The operating system supports a large number of network protocols. Different protocols work in different ways and have different address formats. Therefore, inter-network process communication must solve the problem of multi-protocol identification. To solve the above problem, the TCP/IP Protocol introduces the following concepts.

Port
The communication port that can be named and addressable in the network is a resource that can be allocated by the operating system. According to the description of the OSI Layer-7 protocol, the biggest difference between the transport layer and the network layer is that the transport layer provides process communication capabilities. In this sense, the final address of network communication is not only the host address, but also an identifier that can describe the process. For this reason, the concept of protocol port is proposed for TCP/IP protocol to identify the communication process.
A port is an abstract software structure, including some data structures and I/O buffers. After a process establishes a binding with a port through a system call, the data transmitted by the transport layer to the port is received by the corresponding process, the data sent from the corresponding process to the transport layer is output from this port. In TCP/IP implementation, port operations are similar to General I/O operations. A process obtains a port, which is equivalent to obtaining a local unique I/O file, it can be accessed using common read/write primitives.
Similar to file descriptors, each port has an integer descriptor called a port number to distinguish different ports. The TCP and UDP protocols of the TCP/IP transport layer are two completely independent software modules, so their respective port numbers are also independent of each other. For example, TCP has port 255 and UDP can have port 255. The two do not conflict.
Port number allocation is an important issue. There are two basic allocation methods: the first is global allocation, which is a centralized allocation method, A recognized central organization uniformly allocates the results according to user needs and publishes the results to the public. The second is local allocation, also known as dynamic connection, that is, when a process needs to access the transport layer service, submit an application to the local operating system. The operating system returns a unique local port number. Then, the process connects itself to the port through a suitable system call ). The TCP/IP Port Number is allocated in two ways. TCP/IP divides the port number into two parts. A small number of reserved ports are allocated to the service process globally. Therefore, each standard server has a globally recognized port called a secure port, which is the same even on different machines. The remaining ports are free ports and allocated locally. TCP and UDP stipulate that a port less than 256 can be used as a reserved port.

Address
The two processes in network communication are on two different machines. In an interconnected network, two machines can be located in different networks. These networks are connected through Internet interconnection devices (gateways, bridges, and routers. Therefore, three levels of addressing are required.
1. A host is connected to multiple networks. A specific network address must be specified;
2. A host on the network should have a unique address;
3. Each process on meiyi host should have a unique identifier on the host.
The host address is an IP address. The unique identifier of a process is a sixteen-digit integer port number.

Network byte sequence
Different computers store multi-byte values in different order. Some machines store low-byte values at the starting address, while others store the opposite values. To ensure data correctness, you must specify the network byte sequence in the network protocol. The TCP/IP protocol uses the high-priced prefix format of 16-bit integers and 32-bit integers, which are included in the header file of the Protocol.

Connection
The communication link between two processes is called a connection. The connection is represented by some buffer and a set of Protocol Mechanisms internally, and the reliability is higher than that of the connectionless connection on the outside.

Semi-correlation

 

 

2. socket programming

 
To sum up, a triple in the network can be globally unique as a process (protocol, local address, local port number). A triple is called a semi-correlation, it specifies each half of the connection. Full correlation
A complete inter-network process communication requires two processes, and only the same high-level protocol can be used. That is to say, TCP and UDP cannot communicate. Therefore, a complete inter-network process communication requires a quintuple:
(Protocol, local address, local port number, remote address, remote port number) Such a quintuple is called a full correlation.

In TCP/IP network applications, the main mode of interaction between two processes is the Client/Server mode, that is, the client sends a request to the server, after the server receives the request, the corresponding service Client/Server mode is established based on the following two points: first, the reason for establishing the network is that the network's software and hardware resources, computing power, and information are not equal, sharing is required to create a host with many resources to provide services, and the customer request service with few resources is not equivalent. Second, inter-network process communication is completely asynchronous. The Inter-communication process does not have a parent-child relationship, but does not share the memory buffer, therefore, a mechanism is required to establish a connection between processes that want to communicate with each other and provide synchronization for data exchange between them. This is the TCP/IP Based on the client/server mode.
The Client/Server mode adopts the active request method during the operation:
First, the server needs to start and provide corresponding services as requested:
1. Open a channel and inform the local host that it is willing to accept the customer's request on a recognized address port (port, for example, HTTP is 80.
2. Wait for the customer request to reach this port.
3. Receives a duplicate service request, processes the request, and sends a response signal. To receive concurrent service requests, activate a new process to process this customer request. The new process processes this customer request and does not need to respond to other requests. After the service is completed, close the communication link between the new process and the customer and terminate the process.
4. Return step 2, wait for another customer request
5. Disable the server.
Customer:
1. Open a channel and connect to the specific port of the host where the server is located.
2. Sends a service request message to the server, waits for and receives the response, and continues to send the request.
3. The communication channel is closed and terminated after the request ends.
From the process described above, we can see that:
1. The role of client and server processes is asymmetric. Therefore, the encoding is different.
2. A service process is generally started prior to a customer request. As long as the system is running, the process persists until it is terminated normally or forcibly.

In the Unix world, there are two types of network application programming interfaces: BSD Socket socket and System v tli. since sun adopts a BSD system that supports TCP/IP, TCP/IP applications have been greatly improved. Its network application programming interface socket has become a standard in network programming. And has already entered the MS world.
TCP/IP socket provides the following three types of sockets:
1. Stream socket (socket_stream)
It provides a connection-oriented and reliable data transmission service, with data error-free, non-repeated transmission, and ordered delivery. Internal traffic control to avoid data flow exceeding the limit. Data is considered as a byte stream with no length limit. The FTP protocol uses streaming sockets.
2. Datagram socket (socket_dgram)
Provides a connectionless service. Data packets are sent in the form of independent packets. No error guarantee is provided. data may be lost or duplicated, and the receiving order is unordered. Network File System NFS uses a datagram socket.
3. Original socket (socket_raw)
This interface allows direct access to lower-level protocols, such as IP and ICMP. It is often used to verify new protocol implementations or access new devices configured in existing services.

Basic SOCKET call

Create a socket -- socket ();
Bind the local port -- BIND ();
Establish a connection -- connect (), accept ();
Listener port -- Listen ();
Data transmission-send (), Recv ();
Multiplexing of input/output -- select ();
Close socket -- closesocket ();

Regardless of the language, it is only slightly different in the format of the call group that deals with socket. I have only used C and Perl, and I don't want to show anything unrelated to Perl here. I will mainly discuss Perl socket programming below:

Create a socket:

Socket (soc_variable, domain_flag, connecttype, num) # The C language calls sockid = socket (AF, type, Protocol)

   
   
   

3 How to Implement socket programming)
 
The parameter description is as follows:
Soc_variable is used to create a socket handle, which is equivalent to the sockid in C; domain_flag is called a domain tag, and in C It is equivalent to af -- address family, the address family. The address family and domain are a concept, which is actually a domain. UNIX supports the following domain types:
AF-UNIX; UNIX internal address
AF-INET; TCP/IP address
AF-NS; Xerox NS address
AF-APPLETALK; Apple's appletalk address
The domain address family supported by DOS/Windows only has AF-INET, so most socket programming only uses it.
Connecttype (type in C) is the three socket types mentioned above. Num is equivalent to the Protocol in C. once you see it, you can see that this is the Protocol number used to specify the Protocol that the socket request expects. This parameter does not necessarily work, currently, the two parameters can be set to zero.
Therefore, the establishment of a complete Perl socket is as follows:
Socket (thesck, AF-INET, socket_stream, getprotocolbyname ('tcp '));
# C language: int sockid;
# Sockid = socket (AF-INET, socket_stream, 0 );

Step 2: BIND () -- bind to the local address.
In the first step, the socket () call only specifies the Protocol element of the relevant quintuple. Additional calls are required for the other four elements. The creation of socket can be considered as creating a namespace (address family), but it is not named. BIND () binds the socket address to the socket handle (socket number in c) created on the local machine, that is, the name is assigned to the socket (handle) to specify the local semi-correlation. According to the standard socket (in the Unix world, the so-called "Standard Interface" is no different from the "C Programming Interface ), the socket address is a data structure that describes the socket address. The structure of the TCP/IP protocol address (af_inet) is as follows:
Struct sockaddr_in {
Short sin_family; // af_inet
U_short sin_port; // 16-bit port number, network byte sequence
Struct in_addr sin_addr; // 32-bit IP address, in bytes
Char sin_zero [8]; // Reserved
}
Other structures include sockaddr_ns and sockaddr_un, which are used for other protocol addresses. Basically, we cannot use it. Therefore, a standard binding is:

BIND (socket sockid, struct * localaddr_name, int addrlen );
// Sockid is the socket Number of An untitled socket.
// Localaddr_name is the pointer to the sockaddr_in structure used to name sockid.
// Addrlen is the byte length of localaddr_name

When using Perl BIND (), you must first call inet_aton ('localhost'); or use the inaddr_any function to obtain the IP address string, and then call
$ Localaddr_port = sockaddr_in ($ port, inet_aton ('localhost'); # $ port is the port number.
Or $ localaddr_port = sockaddr_in ($ port, inaddr_any); get the TCP/IP address, and finally BIND (server, $ localaddr_port );

Complete binding! Here, you do not need to specify the length of $ localaddr_port in bytes, which is the benefit of Perl.

These two system calls are used to complete a fully related establishment, and connect () is used to establish a connection. Accept () causes the server to wait for the actual connection from a client process. The call format is as follows:
Connect (socket sockid, struct sockaddr * destaddr, int addrlen );
// Sockid is the local socket number to establish the connection
// Destaddr is a pointer to the structure of the socket address (sink address) of the other party.
// Addrlen is the socket address length of the other party

The CONNECT () call format in Perl is:
Connect (soc_variable, name_variable)
The specific call process is as follows:
$ Remoteaddr_port = sockaddr_in ($ port, inet_aton ('abc .efg.com '));
Connect (client, $ remoteaddr_port); # semi-related triple (protocol, remote address, remote port number ).
It can be found that the connect () and bind () calls are exactly the same, but the server is changed to the client, and the local is changed to remote. That's right, they have the same principle, and their functions are complementary. They have established semi-correlation between servers and clients. In this case, accept () is required to be mixed with the full correlation of a complete inter-network process communication! (In fact, standard connect () can also be used for connectionless socket calls. However, this method is relatively left-side, and people are often confused, so I won't talk about it)
Standard accept () call:
Socket newsock = accept (socket sockid, struct sockaddr * clientaddr, int addlen)
// Sockid, the local socket Number of the server
// Clientaddr, a pointer to the customer's socket structure. Its initial value is null.
// Addlen: The Byte Length of the client socket structure. Its initial value is 0.
// Newsock, the return value of accept (), is a new socket number, which can be used by the server // to process concurrent requests. The server fork is a sub-server process that uses this socket number to return
// Answer the customer request received by accept ()
It can be seen that accept () is a connection-oriented server call. It also places the client's socket address and Its byte length in clientaddr and addlen to specify the sink address for other connectionless calls. However, the connectionless usage that is flexible but not principled does not exist in Perl. The usage of Perl deems that socket must be connection-oriented. Please refer to the accept () in Perl ():

Accept (new_soc_variable, current_soc_variable );

You can see that accept () connects a client from the current socket handle to the new socket handle. The return value is the client address (sink address ). In fact, once a connection is established, the service provider does not need to know the sink address, as long as bit stream is instilled in the set. The advantage of this is that the Protocol is more transparent and easy to understand to applications.

Before calling accept (), you should call listen (). Listen () is used to listen on the port and receive connections. If you do not call listen (), accept () cannot be connected back to the client from the current socket. Standard listen ():

Listen (sockid, Quelen );
// Socket number. The server is willing to receive requests from it
// Quelen, the length of the Request queue, and listen () Limit the number of queued requests

Perl listen ():

Listen (soc_variable, num); # similar to the C language version

Soc_variable is the socket handle, and num is the length of the Request queue.
So far, the quintuple of a connection is all in one.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.