Do not know the socket and TCP connection process, do not know the sockettcp connection

Source: Internet
Author: User
Tags epoll

Do not know the socket and TCP connection process, do not know the sockettcp connection

Directory:
1. Background
2. Detailed connection Process Analysis
2.1 socket () function
2.2 bind () function
2.3 listen () and connect () Functions
2.3.1 in-depth analysis of listen ()
2.3.2 influence of syn flood
2.4 accept () function
2.5 send () and recv () Functions
2.6 close () and shutdown () Functions
3. Address/port Reuse Technology

This article mainly describes the socket operations at various stages in the TCP connection process, hoping to help people without the foundation of network programming understand what the socket is and assume a role. If any error is found, please note

1. Background

1. Complete socket format {protocol, src_addr, src_port, dest_addr, dest_port}.

This is often calledQuintuple of sockets. The protocol specifies the TCP or UDP connection, and the others specify the source address, source port, target address, and target port respectively. But how does this content come from?

2. the TCP protocol stack maintains two socket buffers: send buffer and recv buffer..

The data to be sent through the TCP connection is first copied to the send buffer, which may be written from the app buffer handler of the user space process, or from the kernel buffer handler, the write () function can also be used to write data. Therefore, this process is called writing data, the corresponding send buffer has a write buffer. However, the send () function is more efficient than the write () function.

The final data flows out through the NIC, so the data in the send buffer needs to be copied to the NIC. Because one end is memory and the other end is a NIC device, you can directly use DMA to copy data without the involvement of CPU. That is to say, the data in the send buffer is copied to the NIC through DMA and transmitted to the other end of the TCP connection through the network: the receiving end.

When receiving data through a TCP connection, the data must first flow in through the NIC, and then be copied to the recv buffer in the same DMA mode, and then through recv () the function imports data from the recv buffer into the app buffer of the user space process.

The general process is as follows:

3. two sockets: listening socket and connected socket.

When the service process reads the configuration file, the listening socket parses the address and port to be listened on from the configuration file, creates the socket () function, and then uses bind () the function binds the listener socket to the corresponding address and port. Then, the process/thread can use the listen () function to listen to this port (strictly speaking, monitoring this listening socket ).

A connected socket is a socket returned by the accept () function after listening to TCP connection requests and handshaking three times. Subsequent processes/Threads can use this connected socket to communicate with the client over TCP.

To distinguish the two socket descriptors returned by the socket () function and the accept () function, some people use listenfd and connfd to indicate the listening socket and connected socket respectively, this is also occasionally used in the following sections.

The following describes the roles of various functions. analyzing these functions is also a process of connection and disconnection.

2. Detailed connection Process Analysis

For example:

2.1 socket () function

The socket () function is used to generate a socket file descriptor sockfd (socket () creates an endpoint for communication and returns a descriptor ). This socket descriptor can be used as the binding object of the bind () function later.

2.2 bind () function

The service program parses the address and port to be listened to by analyzing the configuration file, and adds the socket sockfd generated by the socket () function, you can use bind () the function binds the socket to the address and port combination "addr: port" to be listened on. A port-bound socket can be used as the listener object of the listen () function.

The socket bound with the address and port has the source address and source port (for the server itself), plus the protocol type specified in the configuration file, there are three tuples In the quintuple. That is:

{protocal,src_addr,src_port}

However, it is common that some service programs can configure to listen to multiple addresses and ports to implement multiple instances. This is actually achieved through multiple socket () + bind () system calls to generate and bind multiple sockets.

2.3 listen () and connect () Functions

As the name suggests, the listen () function listens to sockets that have been bound to addr + port through bind. After the listener, the socket changes from the CLOSE status to the LISTEN status, so the socket can provide a TCP connection window.

The connect () function is used to initiate a connection request to a listening socket, that is, to initiate a TCP three-way handshake process. It can be seen from this that the connection requestor (such as the client) will use the connect () function. Of course, before initiating connect (), the connection initiator also needs to generate a sockfd, it is likely that the socket bound to the random port is used. Since the connect () function initiates a connection to a socket, the connection destination, namely, the target address and the target port, must be included when the connect () function is used, this is the address and port bound to the listening socket of the server. At the same time, it also carries its own address and port. For the server side, this is the source address and source port of the connection request. Therefore, the sockets at both ends of the TCP connection have become the complete format of the quintuple.

2.3.1 in-depth analysis of listen ()

Let's talk about the listen () function. If you listen to multiple addresses and ports, that is, you need to listen to multiple sockets, then the process/thread responsible for listening will use select (), poll () (Of course, you can also use the epoll () mode). In fact, when only one socket is monitored, this mode is also used for polling, but select () or poll () is only interested in one socket descriptor.

Whether the select () or poll () mode is used (you do not need to talk about epoll in different monitoring methods ),During process/thread (listener) listening, it is blocked on select () or poll. Until data (SYN information) is written to the sockfd (recv buffer) It listens ), the listener is awakened and the SYN data is copied to the app buffer managed by the user space for some processing. Then, the listener sends SYN + ACK, this data also needs to be written from the app buffer into the send buffer (using the send () function), and then transferred from the ENI. In this case, a new project is created for the connection in the connection unfinished queue and set it to the SYN_RECV status. Then, the select ()/poll () method is used again to monitor the socket listenfd, And the listener is awakened until data is written into this listenfd again. If the data written this time is ACK information, after the data is written into the app buffer for some processing, the corresponding items in the connection unfinished queue are moved into the Connection completed queue and set to the ESTABLISHED status. If this receiving is not ACK, it must be SYN, that is, a new connection request, so it is put into the connection unfinished queue like the above processing process. This is the loop process in which the listener processes the entire TCP connection..

That is to say, the listen () function also maintains two queues: the connection is not completed and the connection is completed. When the listener receives a SYN from a client and replies with SYN + ACK, an entry about the client is created at the end of the connection queue, and set its status to SYN_RECV. Obviously, this entry must contain the client address and port information (it may have been hashed and I am not sure ). After the server receives the ACK message sent by the client, the listener thread analyzes the data to know which message is returned to the unfinished connection queue, then, move this item to the completed connection queue and set its status to ESTABLISHED.

When the unfinished connection queue is full, the listener is blocked and no longer receives new connection requests, and waits for two queues to trigger writable events through select ()/poll. When the completed connection queue is full, the listener will not receive new connection requests. Meanwhile, the listener is blocked when it is preparing to move to the completed connection queue. Before Linux 2.2, the listen () function has a backlog parameter, which is used to set the maximum length of the two queues, starting from Linux 2.2, this parameter only indicates the maximum length of the completed queue, while/proc/sys/net/ipv4/tcp_max_syn_backlog is used to set the maximum length of the unfinished queue. /Proc/sys/net/core/somaxconn is hard to limit the maximum length of completed queues. The default value is 128. if the backlog is greater than somaxconn, the backlog is truncated to equal to this value.

The Send-Q column below is the backlog column, that is, the maximum number of queues for unfinished connections. Recv-Q indicates the number of entries in the current unfinished connection queue. Visibleman netstat.

[root@xuexi ~]# ss -tnlState      Recv-Q Send-Q        Local Address:Port        Peer Address:PortLISTEN     0      128                       *:80                     *:*   LISTEN     0      128                       *:22                     *:*   LISTEN     0      100               127.0.0.1:25                     *:*   LISTEN     0      128                      :::22                    :::*   LISTEN     0      100                     ::1:25                    :::*

2.3.2 influence of syn flood

In addition, if the listener fails to receive the ACK message returned by the client after sending SYN + ACK, the listener will be awakened by the timeout time set by select ()/poll, and re-Send the SYN + ACK message to the client to prevent the message from being lost in the network. However, this re-launch will cause a problem. If the client spoofs the source address when it calls connect (), the SYN + ACK message replied by the listener cannot reach the host of the other party. That is to say, the listener will fail to receive the ACK message, so it resends the SYN + ACK message. However, whether the listener is awakened again and again because of the time-out period set by select ()/poll (), or the data is pushed into the send buffer again and again, CPU participation is required during this period, in addition, the SYN + ACK in the send buffer must be written into the NIC again (this is a DMA copy, and no CPU is required ). If the client is an attacker and continuously sends thousands or tens of thousands of SYN packets, the listener will crash almost directly and the NIC will be blocked seriously. This is the so-called syn flood attack.

There are multiple methods to solve syn flood. For example, the maximum length of the two queues maintained by listen () is reduced, the number of times of re-sending syn + ack is reduced, and the interval of re-sending is increased, reduce the timeout time for ack reception and use syncookie. However, any method that directly modifies the tcp option does not take both performance and efficiency into account. Therefore, it is extremely important to filter data packets before they are connected to the listener thread.

2.4 accept () function

The accpet () function is used to read the first item in the completed connection queue (removed from the queue after reading ),And generate a socket descriptor for subsequent connections.Assume that connfd is used for representation. With a new connection socket, the Worker Process/thread (called a worker) can transmit data with the client through this connection socket, and the aforementioned listening socket (sockfd) the listener is still listening.

For example, for httpd in prefork mode, each sub-process is both a listener and a worker. When each client initiates a connection request, the sub-process receives the request during the listener, and releases the listening socket so that other sub-processes can listen to the socket. After multiple rounds, a new connection socket is generated through the accpet () function, so this sub-process can establish interaction with the client through this socket. Of course, it may be blocked or sleep for multiple times due to various io waits. This efficiency is really low. Just consider the stages from when the sub-process receives the SYN message and finally generates a new connection socket. This sub-process is blocked again and again. Of course, you can set the listening socket to non-blocking I/O mode, but it must constantly check the status even if it is not blocking.

Consider the worker/event processing mode. Each sub-process uses a dedicated listening thread and N working threads. The listening thread is responsible for listening and establishing a new connection socket descriptor, which is put into the apache socket queue. In this way, the listener and the worker are separated. during the monitoring process, the worker can still work freely. From the perspective of listening, the performance of worker/event Mode is higher than that of prefork mode, which is not half past one.

When the listener initiates an accept () system call, if there is no data in the completed connection queue, the listener will be blocked. Of course, you can set the socket to non-blocking mode. In this case, accept () will return an EWOULDBLOCK or EAGAIN error when no data is available. You can use select (), poll (), or epoll to wait for readable events in the completed connection queue. You can also set the socket to the signal-driven I/O mode, so that the newly added data in the connection queue is notified to the listener to copy the data to the app buffer and use accept () for processing.

I often hear the concepts of synchronous connections and asynchronous connections. How do they differentiate them? Synchronous connection means that, starting from the listener listening to the SYN data sent by a client, it must wait until the connection socket is established and the interaction with the client data ends, no connection requests from other clients are received before the connection with this client is closed. Usually, the listener and the worker are in the same process in synchronous connection mode, such as the prefork model of httpd. Asynchronous connections can receive and process other connection requests at any stage of connection and data interaction. Generally, listeners and workers use Asynchronous connections when they are not the same process. For example, in the httpd event model, although listeners and workers are separated in the worker model, synchronous connections are still used, the listener immediately submits the connection request to the working thread after it creates a connection socket. During the processing, the working thread only serves the client until the connection is disconnected, the asynchronous mode of event can only be handed over to the listener thread for normal connections when the worker thread processes special connections (such as connections in the persistent connection state, it is still equivalent to synchronous connection.In layman's terms, synchronous connection is a process/thread that processes a connection, while asynchronous connection is a process/thread that processes multiple connections..

2.5 send () and recv () Functions

The send () function copies data from the app buffer to the send buffer (of course, it may also be copied directly from the kernel's kernel buffer), recv () the function copies the data in the recv buffer to the app buffer. Of course, there is nothing wrong with using the write () and read () functions to replace them, but sending ()/recv () is more targeted.

Both functions involve socket buffer, but when sending () or recv () is called, whether there is data in the source buffer to be copied, or whether the target buffer to be copied is full, leading to non-writable data is a problem to be considered. No matter which side, as long as the conditions are not met, the process/thread will be blocked when sending ()/recv () is called (assuming that the socket is set to a blocking IO model ). Set the socket to a non-blocking IO model. When the buffer does not meet the conditions, call the send ()/recv () function, the process/thread that calls the function will return the error status information EWOULDBLOCK or EAGAIN. Whether there is data in the buffer, whether it is full, resulting in non-writable, in fact, you can use select ()/poll () /epoll monitors the corresponding file descriptor (the socket descriptor is monitored for the corresponding socket buffer). When the conditions are met, call send ()/recv () to operate normally. You can also set the socket to a signal-driven I/O or asynchronous I/O model, so that you do not have to call send ()/recv () until the data is ready and copied.

2.6 close () and shutdown () Functions

The general close () function can close a file descriptor, including the network socket descriptor for connection. When close () is called, all data in the send buffer will be sent. However, the close () function only removes the socket reference count by 1. Like rm, When deleting a file, it only removes the number of hard links, only when all the reference counts of this socket are deleted will the socket descriptor be closed and the subsequent four waves start. For concurrent service programs with parent-child process shared sockets, calling close () to close the socket of the sub-process does not actually close the socket because the socket of the parent process is still in the open state, if the parent process does not call the close () function, the socket will remain in the open state.

The shutdown () function is used to close the connection of the network socket. Unlike close (), which reduces the reference count by one, it directly disconnects all connections of the socket, this triggered four waves. You can specify three closing methods:

1. Disable writing. At this time, data cannot be written to the send buffer, and the existing data in the send buffer will be sent until the completion.
2. Disable read. In this case, data cannot be read from the recv buffer, and the existing data in the recv buffer can only be discarded.
3. Disable reading and writing. At this time, data cannot be read or written. The existing data in the send buffer is sent until it is completed, but the existing data in the recv buffer is discarded.

Whether it is shutdown () or close (), each time you call them, they send a FIN when they actually enter the four waves.

3. Address/port Reuse Technology

Under normal circumstances,One addr + port can only be bound to one socket. In other words, addr + port cannot be reused. Different sockets can only be bound to different addr + ports.. For example, if you want to enable two sshd instances, you must not configure the same addr + port in the sshd instance configuration file. Similarly, when configuring web virtual hosts, the two virtual hosts must not be configured with the same addr + port unless they are based on domain names, the reason why a domain name-based VM can bind the same addr + port is that the http request message contains the host name information. In fact, when such connection requests arrive, it is still listening through the same socket, but after listening, the httpd Worker Process/thread can allocate the connection to the corresponding host.

Since the above is a normal situation, of course there is an abnormal situation, that is, address reuse and port reuse technology, the combination is socket reuse. In the current Linux kernel, there are socket options SO_REUSEADDR that support address reuse and socket options SO_REUSEPORT that support port reuse. After the port reuse option is set, you can bind the socket to avoid any errors. In addition, after an instance is bound with two addr + ports (you can bind multiple ports. Here we take two as an example), you can use two listening processes/threads to listen to them at the same time, connections sent from the client can also be received in turn through the round-robin balancing algorithm.

For the listener process/thread, each reused socket is called a listener bucket (listener bucket), that is, each listener socket is a listener bucket.

Take the worker or event model of httpd as an example. Assume there are currently three sub-processes, each of which has one listening thread and N working threads.

In the absence of address reuse, the various listening threads compete for the type of listening. At a certain time point, only one listening thread can be listening on this listening socket (obtain the mutex lock mutex to obtain the listening qualification). When the listening thread receives the request, so other listening threads can get the listening qualification and only one thread can get it. For example:

When address reuse and port reuse are used, multiple sockets can be bound to the same addr + port. For example, when one bucket is used more than one, there are two sockets, so two listening threads can listen at the same time. When a listening thread receives the request, it is qualified, allow other listening threads to compete.

If you bind another socket, the three listening threads do not have to pass the listening qualification and can listen infinitely. For example.

It seems that the performance is good, which not only reduces the competition for listening qualifications (mutex lock), but also avoids the "hunger problem", and can monitor more efficiently, and because it can be load balancing, this reduces the pressure on listening threads. But in fact, the monitoring process of each listening thread consumes CPU resources. If there is only one core CPU, the reuse advantage is not shown even if it is reused, but the performance is reduced by switching the listening thread. Therefore, to use port reuse, you must consider whether the listening processes/Threads have been isolated from their respective CPUs, that is to say, whether to reuse or reuse multiple times requires consideration of the number of cpu cores and whether processes and CPUs are bound to each other.

So much is written for the moment.

Go back to the Linux series article outline: workshop!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.