Socket Programming Combat

Source: Internet
Author: User
Tags ack unix domain socket
Socket in English means "(Connect two items) Groove", like the eye socket, meaning "eye socket", in addition to "socket" meaning. In computer science, sockets usually refer to the two endpoints of a connection, where the connection can be on the same machine, like a UNIX domain socket, or a different machine, like a network socket.

This article focuses on the most current network sockets, including their location in the Web model, API programming paradigm, common errors and so on, finally using the Python language socket API to implement a few practical examples. Socket Chinese is generally translated as "socket", I have to say that this is a person can not touch the head of the translation, I did not think of what "Letter Tatsu Ya" translation, so this article directly in its English expression. All of the code in this article can be found in the socket.py repository.

Overview

The Socket, as a generic technical specification, was first provided by Berkeley University in 1983 for 4.2BSD Unix, and later evolved to the POSIX standard. The socket API is a programming interface provided by the operating system, allowing applications to control the use of socket technology. There is a Unix philosophy in which everything is a file, so the socket and file APIs are very similar: they can be read, write, open, close, and so on.

Now the network system is layered, theoretically has the OSI model, the industry has the TCP/IP protocol cluster. The comparison is as follows:

Each layer has its corresponding protocol, socket API is not a TCP/IP protocol cluster, but the operating system provides a network programming interface, working between the application layer and the Transport layer:

The HTTP protocol that we normally use to browse the site, SMTP and IMAP for sending and receiving mail, are built on the socket API.

A socket that contains two essential components:

Address, consisting of IP and port, like 192.168.0.1:80.

Protocol, the socket is used by the transmission protocol, there are currently three kinds: TCP, UDP, Raw IP.

Address and protocol can determine a socket; On a single machine, only one socket is allowed to exist. The socket for TCP Port 53 and UDP Port 53 is two different sockets.

Depending on how the socket transmits the data (using different protocols), it can be divided into the following three types:

Stream sockets, also known as a "connection-oriented" socket, uses the TCP protocol. Before the actual communication needs to be connected, the transmitted data has no specific structure, so the high-level protocol needs to define the delimiter of the data, but its advantage is that the data is reliable.

Datagram sockets, also known as a "no connection" socket, uses the UDP protocol. The actual communication does not need to connect before, an advantage when the UDP packet itself is divisible (self-delimiting), that is, each packet indicates the beginning and end of the data, the disadvantage is that the data is unreliable.

Raw sockets, typically used in routers or other network devices, does not go through the Transport Layer (transport layer) in the TCP/IP protocol cluster, directly from the network layer (the Internet layers) to the application layer (application Layer), so the packet does not contain TCP or UDP header information.

Python Socket API

Python uses the (IP, port) tuple to represent the address attribute of the socket, using af_* to represent the protocol type.

Data communication has two sets of verbs to choose from: Send/recv or Read/write. The read/write approach is also the way Java is used, and there is no explanation for this way too much, but it is important to note that:

The read/write operation has a buffer of "file", so after reading and writing needs to call the Flush method to actually send or read data, otherwise the data will remain in the buffer.

TCP sockets

TCP sockets need to establish a connection before they lead, so their mode is more responsible than the UDP socket. Specific as follows:

The specific meaning of each API is not detailed here, you can view the manual, which gives the implementation of the Python language echo server.

# echo_server.py # Coding=utf8 Import socket  sock = Socket.socket (socket.af_inet, socket. SOCK_STREAM) # After setting so_reuseaddr, you can immediately use the time_wait status of the socket sock.setsockopt (socket. Sol_socket, SOCKET. SO_REUSEADDR, 1) sock.bind ((' ', 5500) Sock.listen (5)
def handler (Client_sock, addr):     print (' New client from%s:%s '% addr)     msg = CLIENT_SOCK.RECV (1024x768)     Client_ Sock.send (msg)     client_sock.close ()     print (' client[%s:%s] socket closed '% addr)  if __name__ = = ' __main__ ':     while 1:         client_sock, addr = sock.accept ()         handler (Client_sock, addr)
# echo_client.py # Coding=utf8 Import socket  sock = Socket.socket (socket.af_inet, socket. Sock_stream) Sock.connect (("", 5500) sock.send (' Hello Socket world ') print SOCK.RECV (1024)

One thing to note in the simple Echo server code above is that the server-side socket has a so_reuseaddr of 1, which is intended to immediately use the socket in the TIME_WAIT state, so what does time_wait mean? This is explained in detail later when explaining the TCP status change diagram.

UDP socket

UDP Socket server-side code after bind, you do not need to call the Listen method.

# udp_echo_server.py # Coding=utf8 Import socket  sock = Socket.socket (socket.af_inet, socket. SOCK_DGRAM) # After setting so_reuseaddr, you can immediately use the time_wait status of the socket sock.setsockopt (socket. Sol_socket, SOCKET. SO_REUSEADDR, 1) sock.bind ((' ', 5500) # did not call listen  if __name__ = = ' __main__ ': While     1:         data, addr = Sock.rec Vfrom (1024x768)          print (' New client from%s:%s '% addr)         sock.sendto (data, addr)  # udp_echo_client.py # coding= UTF8 import socket  udp_server_addr = (", 5500)  if __name__ = = ' __main__ ':     sock = Socket.socket (socket.af_ INET, Socket. SOCK_DGRAM)     data_to_sent = ' Hello udp socket '     try:         sent = Sock.sendto (data_to_sent, UDP_SERVER_ADDR)         data, Server = Sock.recvfrom         print (' Receive data:[%s] from%s:%s '% ((data,) + server))     finally:< C16/>sock.close ()

Common pitfalls

Ignore return value

The Echo server example in this article also ignores the return value because of space constraints. Network communication is a very complex problem, usually can not guarantee the network status of both sides of the communication, it is possible to send/Receive data failure or partial failure. Therefore, it is necessary to check the return value of the Send/receive function. When the TCP echo client in this article sends data, the correct wording should be as follows:

Total_send = 0 content_length = Len (data_to_sent) while Total_send < content_length:     sent = Sock.send (data_to_sent [Total_send:])     If sent = = 0:         raise RuntimeError ("Socket Connection broken")     total_send + = Total_send + Sent

The SEND/RECV operates on the data of the network buffers, which do not have to process all incoming data.

In general, when the network buffer fills up, the Send function returns, and when the network buffer is emptied, the recv function returns.

When the recv function returns 0 o'clock, it means that the peer is closed.

You can set the buffer size in the following way.

S.setsockopt (socket. Sol_socket, SOCKET. SO_SNDBUF, Buffer_size)

Consider TCP to have framing

TCP does not provide framing, which makes it ideal for transmitting data streams. This is one of the important differences between it and UDP. UDP is a message-oriented protocol that maintains the completeness of a message between the sender and the receiving person.

code example reference: Framing_assumptions

State machine for TCP

In the previous example of Echo server, the TIME_WAIT state was mentioned, in order to formally introduce its concept, you need to understand the state machine under TCP from generation to end. (Photo source)

This diagram transfer diagram is very very critical, but also more complex, I myself for the convenience of memory, the diagram has been disassembled, carefully analyzed this diagram, can be concluded that the connection open and close both passive (passive) and active (active) two, active shutdown, involving the state of the most transfer , including Fin_wait_1, Fin_wait_2, CLOSING, time_wait.

In addition, since TCP is a reliable transport protocol, each time a packet is sent, you need to get confirmation (ACK), with the above two knowledge, then look at the following diagram:

A FIN is sent to the passive shut-off side while the socket that actively closes the connection calls the Close method

When Fin is received on the peer, an ACK is sent to the active shutdown to confirm that the passive shut-off end is in the close_wait state

When the passive shutdown calls the Close method to close and sends a FIN signal to the active shut-off side, the active shut-off end of the fin is in the time_wait state

The active shutdown does not immediately turn into a CLOSED state, but instead waits for 2MSL (max segment life, the maximum lifetime of a packet in the network transport) to ensure that the passive shut-off side receives the last ACK. If the passive shutdown does not receive the final ACK, then the passive shut-off side will resend FIN, so the active shutdown at TIME_WAIT will send an ACK again, so that (FIN) one time (ACK) is exactly two MSL. If the wait time is less than 2MSL, then the new socket can receive the previously connected data.

The previous Echo Server example also shows that in time_wait is not to say that must not be used, you can set the socket's So_reuseaddr property to achieve without waiting for 2MSL time to re-use the socket, of course, this is only applicable to the test ring Normal circumstances, do not modify this property.

Actual combat

HTTP UA

The HTTP protocol is the cornerstone of today's world Wide Web, and the socket API allows you to simply simulate how a browser (UA) parses HTTP protocol data.

#coding =utf8 Import Socket  sock = Socket.socket (socket.af_inet, socket. SOCK_STREAM) baidu_ip = Socket.gethostbyname (' baidu.com ') sock.connect ((BAIDU_IP)) print (' connected to%s '% baidu_ IP)  req_msg = [     ' get/http/1.1 ',     ' user-agent:curl/7.37.1 ',     ' Host:baidu.com ',     ' Accept: */* ',] delimiter = ' \ r \ n '  sock.send (Delimiter.join (req_msg)) Sock.send (delimiter) sock.send (delimiter)  print ('% sreceived%s '% ('-'-' *20, '-' *20 ') Http_response = SOCK.RECV (4096) print (http_response)

Run the above code to get the following output

--------------------received--------------------http/1.1 date:tue, 12:16:53 GMT server:apache last-m Odified:tue, 13:48:00 GMT ETag: "51-47cf7e6ee8400" Accept-ranges:bytes content-length:81 Cache-control:max -age=86400 expires:wed, 12:16:53 GMT connection:keep-alive content-type:text/html   
 
  

Http_response is obtained by calling recv (4096) directly, what if the true return is greater than this value? As we know before, the TCP protocol is flow-oriented, it does not care about the content of the message itself, it needs the application to define the boundary of the message, for the application layer of the HTTP Protocol, there are several cases, the simplest one when the return value by parsing the Content-length property of the head, so that the size of the body, for the HTTP 1.1 version, support transfer-encoding:chunked transmission, for this format, This is not the start of the explanation, you just need to know that the TCP protocol itself can not distinguish the message body on it. Interested in this piece can view CPython core module http.client

Unix_domain_socket

A mechanism used by UDS to communicate with different processes on the same machine, whose API is similar to the network socket. Only its connection address is a local file.

code example reference: uds_server.py, uds_client.py

Ping

The ping command is the most commonly used tool for detecting network connectivity, and its applicable transport protocol is neither TCP nor UDP, but ICMP, using raw sockets, we can apply pure Python code to implement its function.

code example reference: ping.py

Netstat vs SS

Netstat and SS are commands to view the Socket information on Unix-like systems. Netstat is a relatively old-style command, I often choose to have

-T, show only TCP connections

-U, only UDP connections are displayed

-N, do not parse hostname, with IP display host, can speed up the execution speed

-P to view the process information for the connection

-l, only the monitored connections are displayed

SS is an emerging command, with options similar to Netstat, the main difference being the ability to filter (via State and Exclude keywords).

$ Ss-o State Time-wait-n | Head recv-q send-q Local address:port Peer address:port 0 0 10.200.181.220:                 2222 10.200.180.28:12865 Timer: (timewait,33sec,0) 0 0 127.0.0.1:45977    127.0.0.1:3306 timer: (timewait,46sec,0) 0 0 127.0.0.1:45945 127.0.0.1:3306 Timer: (timewait,6.621ms,0) 0 0 10.200.181.220:2222 10.200.180.28:12280 timer: (Timewa      it,12sec,0) 0 0 10.200.181.220:2222 10.200.180.28:35045 timer: (timewait,43sec,0) 0                      0 10.200.181.220:2222 10.200.180.28:42675 Timer: (timewait,46sec,0) 0 0 127.0.0.1:45949 127.0.0.1:3306 Timer: (timewait,11sec,0) 0 0 127.0.0.1:45           954 127.0.0.1:3306 Timer: (timewait,21sec,0) 0 0:: ffff:127.0.0.1:3306:: ffff:127.0.0.1:45964 Timer: (timewait,31sec,0) 

More usage of these two commands can be consulted:

SS Utility:quick Intro

Ten basic examples of Linux netstat command

Summarize

Our life has been inseparable from the network, peacetime development is also flooded with a variety of complex network applications, from the most basic database, to a variety of distributed systems, regardless of its application layer how complex, its underlying transmission data of the protocol cluster is consistent. Socket This concept we seldom deal directly with, but when our system problems, often is the bottom of the understanding of the protocol caused by the lack of knowledge, I hope this article can be useful for programming the network program.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.