Introduction to the Linux kernel--Network: TCP efficiency model and security issues

Last Update:2016-01-01 Source: Internet

Author: User

Tags ack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

TCP Speed and bandwidth

Some people think that TCP can send data at a bandwidth speed, at least bandwidth deduction TCP header loss is the maximum speed the TCP transmission can achieve. This theory is correct, but many times the speed of TCP does not reach the bandwidth.
Due to the congestion avoidance algorithm, TCP does not always transmit data at the actual bandwidth size. Especially when it comes to sharing bandwidth, it's not a problem to say how big the bandwidth is.
TCP is constrained by the system resources, but also need to set the cache size, the upper-level application is not timely to receive the cache full, TCP would like to fast and fast.
For a large number of short connections, a large bandwidth overhead is used to maintain TCP connections, rather than transmitting data.
In many cases, TCP may not be able to build, such as memory, backlog, number of ports, etc. So although TCP is intuitively used to efficiently transmit large amounts of data, it must be recognized that TCP is not bandwidth in many cases.

UDP Kernel code path

UDP Read Process

From top to bottom invoke part:
Sys_read fs/read_write.c
Sock_read net/socket.c
Sock_recvmsg net/socket.c
Inet_recvmsg net/ipv4/af_inet.c
Udp_recvmsg net/ipv4/udp.c
Skb_recv_datagram NET/CORE/DATAGRAM.C

From the bottom up interrupt section:
SOCK_QUEUE_RCV_SKB include/net/sock.h
UDP_QUEUE_RCV_SKB net/ipv4/udp.c
UDP_RCV net/ipv4/udp.c
Ip_local_deliver_finish net/ipv4/ip_input.c
Ip_local_deliver net/ipv4/ip_input.c
Ip_recv net/ipv4/ip_input.c
Net_rx_action net/dev.c

UDP Write Process
From the top down section
Sys_write fs/read_write.c
Sock_writev net/socket.c
Sock_sendmsg net/socket.c
Inet_sendmsg net/ipv4/af_inet.c
Udp_sendmsg net/ipv4/udp.c
Ip_build_xmit net/ipv4/ip_output.c
Output_maybe_reroute net/ipv4/ip_output.c
Ip_output net/ipv4/ip_output.c
Ip_finish_output net/ipv4/ip_output.c
Dev_queue_xmit net/dev.c
Http://www.cnblogs.com/better-zyy/archive/2012/03/16/2400811.html

Full utilization of current TCP design

What can you tell from these two paths? Yes, in Linux receives both from bottom up and from top to bottom, while sending only from top to bottom. It's easy to understand, but it's also easy to overlook. When using Linux, people often encounter network speed is unsatisfactory, and is in the case of sufficient bandwidth. Most of the kernel is able to fully use the bandwidth, can not generally be the user's program problems. such as the collection of less time, received a long time before sending, this is the business level of data Idle window.
The receiving section is where collaboration is most challenging, because it requires 3 units of precise collaboration. Soft interrupt, Kernel socket code, user-side code. Where the soft terminal is still asynchronous, and the kernel socket is completely dictated by whether the user program is called. Since there is no way to fully synchronize the user program with the soft interrupt, this step blocking or rotating wait is almost the only solution for the client program. However, when a connection is received, the process is not receiving a new connection, causing the channel to be unavailable again.
Modern solutions to Nginx as the representative, using multiple threads at the same time non-blocking monitoring, there must be more than one listener at a time, there can be more than one in the processing. The benefit of listening without blocking is that one thread can also handle an existing connection while listening. Traditional fork multiple sub-processes, or the use of multiple servo threads in Nginx such all worker threads are server and client's efficient thought before almost useless.
So, you can also see that the user side of the programmer is more diligent, they are the most effective solution to the collaboration problem. Does the kernel side not work? Not too, lazy. It's better to transplant this high-concurrency idea into the kernel. But the ATM is better than IP ...
But Nginx has a problem with this idea: It's a surprise group effect. Multiple processes simultaneously listen, although not many, but on the multicore machine more than 10 processes at the same time the group still have a lot of loss. In order to solve this problem, Nginx will be locked in the listening time, to ensure that only one process in the accept, this is another problem, the other process if there is no work at hand will be idling waiting. But in theory, it's a very good solution to keep all the processes busy.
In addition, an efficient concurrency model is found in client programming, which is traffic server's single process listener (traffic manager), and multiple work distribution processing (traffic server). Compared with the traditional thread pool model, traffic server has the concept of using the coprocessor, which encapsulates the context of serving a connection as a co-process, handing it over to worker processes, and processing unlimited connections in a limited worker process. This also coincides with the soft interrupt mechanism of the kernel. Most importantly, this model of traffic server allows for expansion into the cloud, because the process itself carries complete execution information.

More efficient TCP

Combining the two above, the best way to use the kernel TCP infrastructure is to single-listen, and even multiple processes can queue up to listen, asynchronous distribution context to worker process processing. The idea of concurrency and Asynchrony is the primary choice for clients.
However, all current solutions, constantly improving the QPS approach, revolve around how to use the infrastructure provided by the current kernel in user space. Until Sina Open source its fast socket. Fast socket discovers the key in the kernel network is wasted in the network part of the code lock. Therefore, a separate data structure is created for each CPU to form a lock-free programming. According to the experimental results, nginx efficiency can be increased by more than 100%. Also, since it is possible to accept multiple queues at the same time (the original only one, each access must be locked), then the user space can have multiple worker processes simultaneously accept without causing conflict. This also solves the problem of Nginx accept lock, to a large extent eliminate the large load server surprise group problem.
A better solution is TCPCP, which connects the migration technology. Because the high-volume of the server will generally form a cluster, but no matter what kind of cluster, if only using a machine as the external unified interface is necessary, but when the amount of QPS is large, a machine service will be full, so how to fully utilize the processing capacity of a single machine? Connection migration This kernel technology allows a server to simply handle TCP connection problems without being responsible for the business, thereby greatly improving the QPS of a single machine.

Security issues with TCP

There are many security issues for the TCP protocol itself. For example, the agreement stipulates that if you receive a SYN when listening, you must reply to Syn/ack. This lets the attacker send a SYN sniff to see if the port is open. The agreement also stipulates that if someone does not send a syn. Instead of sending an ACK directly, the server in listen should return the RST, which also provides a sniffer method. and also the agreement, the server in reply Sync/ack also need to provide sequnce number, the next user must use this sequence number+1 as its ordinal, this also provides a method of source address authentication for server, Can effectively prevent the tampering of the source address of the Dos attack.
There is the implementation of the problem, Linux when receiving a SYN request will immediately allocate memory and other resources, this is the core idea of TCP flooding: exhausted the resources available to the server. Linux implements a SYN cookie in the kernel by calculating a sequnce number for source address verification, but still allocating resources on the server side (there are scenarios TCPCP, etc. that can make the SYN not be allocated resources, but the CPU burden is too heavy), And the method of calculating Sequnce number determines its CPU overload, and introduces another attack mode while solving syn. is to constantly send an ACK, so that the server is busy with its calculation and validation sequnce number (because the server does not store the sequnce number that is generated for an IP connection, otherwise it is resource overhead, which is to be based on the ACK of the client sequnce Number in the figure is deduced that this sequnce numbers is correct, this derivation process is a performance short board)
A more serious problem is fin and RST, in which the middleman can break operations by forging the two packages. Because this disconnect allows the server to generate a lot of close_wait and time_wait sockets, this state of the socket is not particularly good to handle until the full number of available sockets, resulting in depletion of resources. However, these two are in the connection after the establishment of the attack, the difficulty requires sequnce number must fall in the scope of the window, but with the development of Internet speed, this window is getting bigger and easier to guess, so the increase in speed is actually increased the feasibility of this attack. We see that this is the TCP protocol itself problem, in the design process is not too much consideration of security issues. Over a long period of time, TCP is still being used and improved, because his presence has kidnapped technical workers around the world, and the cost of replacement is high.

Summarize

From the above analysis, it can be seen that the user space can almost be used in the kernel to provide the most fully utilized mechanism, but the kernel space code is really progressive speed to keep up with the user space. This is undoubtedly related to the hard-to-cut kernel code, which greatly hindered the development of the kernel speed. But to step back, the kernel of this infrastructure, can not have advanced ideas in the kernel experiment, instead, should be in the user space has fully proved its effectiveness, and put forward the urgent needs of the time, the kernel should be implemented. So the high threshold of the kernel also makes sense.

Introduction to the Linux kernel--Network: TCP efficiency model and security issues

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More