Comparison of TCP stack implementation in Linux and FreeBSD

Source: Internet
Author: User

As the two most famous open-source operating systems, Linux and FreeBSD are the first choice for network administrators. Linux is known for its openness and support of many drivers. FreeBSD has an excellent UNIX tradition and is recognized as the most stable operating system. So how should we choose between these two operating systems? Fortunately, we have the source code to find the answer from the implementation of the protocol stack.

TCP/IP protocol stack is a widely used factual network communication standard in the network. The initial TCP implementation originated from 4.4 BSDlite, which was inevitably supported after the rise of Linux. However, the implementation of Linux is self-built and only compatible with traditional implementations. Next we will analyze the similarities and differences between the two based on the source code implementation. However, for an excellent system like Linux and FreeBSD, there is no difference between the advantages and disadvantages. Some are just different implementation strategies and focuses.

From the process perspective, you can call send, sendto, and sendmsg to send a piece of data and use write and writev in the file system to send data. Similarly, you can use the corresponding recv, recvmsg, and recvfrom to receive data. You can also use the read and readv provided by the file system to receive a piece of data. For receiving, this is asynchronous, that is, it is interrupt-driven. We should pay attention to this in future analysis. For the sake of simplicity, we will analyze the entire process of TCP input and output, and compare the implementations of LINUX and FreeBSD.

First, let's look at the implementation of the FreeBSD protocol, which is also the most orthodox implementation. The complete input/output path is shown below.

First, let's look at the output on the left. No matter which output function is called by the application, you must call sosend to complete the output. Sosend copies data from the user space to the m_buf data structure managed by the kernel. m_buf is the data buffer structure used by FreeBSD's TCP implementation. After sosend completes data replication, it will call the TCP output function. What tcp_output needs to do is allocate a new m_buf to save the tcp header and calculate the corresponding data verification code, in the next step of ip_output, you also need to perform data verification and select data routing. In the end, ether_output calls the specific hardware driver through if_start to complete data transmission. In the driver of a nic, ex_start will copy data from the m_buf buffer of the kernel into the hardware's own buffer to complete data transmission. During this process, the data is replicated twice and the verification code is also traversed twice), which is also the main factor affecting efficiency.

Next we will discuss the input on the right. When the NIC receives the data, the interrupt handler ex_intr will be called. The driver uses ex_rx_intr to copy data from the hardware buffer into the m_buf data structure and calls ether_input for further processing. Ether_input is divided by ether_demux. If it is an IP packet, ip_fastforward is called through Soft Interrupt for data verification and whether to forward the packet. If it fails, ip_input is complete. In in_input, you also need to determine whether to forward data. If not, call tcp_input for further processing. In tcp_input, after data verification and verification, an optimization called the first prediction algorithm can speed up data processing. After all the operations are performed, if user data is used, the user process will be awakened for processing. Similarly, you can use multiple functions to receive data, while soreceive is responsible for transferring data from m_buf to user process buffering.

It can be seen that in FreeBSD, the operations for sending and receiving data are similar. Both data replication and data traversal are required, which is the biggest impact on efficiency. The two data copies seem inevitable. Let's take a look at how Linux works.

We can see that the implementation on LINUX is a little complicated. Let's start from sending. In LINUX, the socket is implemented as a file system, which can be called through the write of vfs, or directly by using send. All of them ultimately call sock_sendmsg. Sock_sendmsg uses its kernel version _ sock_sendmsg to directly call tcp_sendmsg to send data. In tcp_sendmsg, data complexity and data verification are completed at the same time, which saves a traversal operation, which is different from FreeBSD. Linux uses the skb structure to manage data buffering, which is similar to the m_buf of FreeBSD. After the data is copied, use tcp_push to send the data in the next step. Tcp_push uses _ tcp_push_pending_frames to call tcp_write_xmit to fill data in the tcp sending buffer. The fill here is only a pointer reference. Next, tcp_transmit_skb puts the data into the sending queue of the IP address. The Ip_queue_xmit function sets the IP address header and verifies the data, and CALLS ip_output to go to the next step. If no sharding is required, ip_finish_output will be used to continue sending. Here, the dev_queue_xmit function is called after the Ethernet packet header of the data is filled for further processing. The Dev_queue_xmit function transfers data to the waiting queue at the network core layer, and calls the specific driver cp_start_xmit to complete the final data sending. The final cp_start_xmit is similar to the corresponding function of freebsd. It checks the data and copies the data to the hardware buffer.

When a packet is received, the NIC will be interrupted, so that the cp_interrupt driven by the NIC will be called. Cp_interrupt does very few things. It only returns the result after necessary checks. More things are done through cp_rx_poll, and cp_rx_poll is called in Soft Interrupt, this is done to improve the processing efficiency of the driver. Cp_rx_poll applies for and copies data into a skb buffer. The netif_rx function transfers data from this queue to the network core layer queue. netif_receive_skb receives data from this queue and calls ip_rcv for processing. Ip_rc and ip_rcv_finish check data together to obtain the packet route and call the corresponding input Function to complete the route. Here, after ip_local_deliver and ip_local_deliver reorganize the IP packet, use ip_local_deliver_finish to enter the tcp processing process. tcp_v4_rcv completes data verification and some simple checks. The main work is completed in tcp_v4_do_rcv. Tcp_v4_do_rcv first checks whether user data is normal. If yes, tcp_rcv_established is used for processing. Otherwise, tcp_rcv_state_process is used to update the connected state machine. Tcp_rcv_established also has the first prediction. By the way, the user process waiting in tcp_recvmsg will be awakened. The latter copies data from the skb buffer into the user process buffer. And return them step by step.

The above analysis shows that the Linux code is chaotic and the readability is not as good as that of FreeBSD. For example, Linux skips the Ethernet layer and has multiple asynchronous operations on received data, this may affect the kernel stability. FreeBSD's code is clearer, the program processing is clear, the readability is also high, and the most stable operating system name is not false. This can also be explained from the origins of the two operating systems. Linux originated from the Internet era and was completed by many fans. There was no complete plan, and the code had been changed many times. The author's level was also not consistent, which made it look like today. FreeBSD has been well-known and maintained by an independent group. It has not been updated much for many years and has only a few optimizations. Therefore, the code is highly readable. But on the other hand, the constantly updated Linux code is more radical. For example, Linux uses skb buffering more efficiently than FreeBSD uses m_buf. Here, we will not analyze it in detail. In addition, when sending data in linux, tcp verification is completed while copying data, which saves a data traversal operation. It also improves the efficiency.

Through the above comparison, we can easily conclude that if efficiency is the first, FreeBSD should be the best choice for Linux. However, these two operating systems are both excellent and tested over time, and the difference exists only in the literal analysis. No matter which one you choose, you will not regret it, will you?

Edit recommendations]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.