Linux "zero copy" sendfile Function Description and Actual Operation Analysis

Source: Internet
Author: User
Tags sendfile
Linux & quot; zero copy & quot; sendfile function Chinese description and actual operation analysis-Linux general technology-Linux programming and kernel information, the following is a detailed description. Sendfile Function Description
# Include
Ssize_t sendfile (int out_fd, int in_fd, off_t * offset, size_t count );

Sendfile () is an operation function used to copy data between two file descriptors. this copy operation is operated in the kernel, so it is called "zero copy ". the sendfile function is much more efficient than the read and write Functions, because the read and write functions need to copy data to the user's application layer.

Parameter description:
Out_fd is the file descriptor that has been enabled and used for write operations;
In_fd is the file descriptor that has been enabled for read operations;
Offset: the offset in in_fd that the sendfile function reads data. if it is zero, it indicates reading from the beginning of the file, otherwise it will be read from the corresponding cheap amount. for cyclic reading, the next offset value should be the value of this offset added to the return value of the sendfile function.
Count is the number of bytes copied between two Descriptors (bytes)

Return Value:
If a successful copy is made, the number of bytes from the write operation to out_fd is returned,-1 is returned, and the error information is set accordingly.

When I/O is set to O_NONBLOCK, the write operation is blocked.
The file descriptor output or input by EBADF is not open.
The EFAULT address is incorrect.
The EINVAL descriptor is unavailable or locked, or the in_fd operated by the mmap () function is unavailable.
An unknown error occurs when EIO reads (read) in_fd.
When ENOMEM reads (read) in_fd, the memory is insufficient.

To speed up the file transfer module in the original system and reduce system resource usage, a performance test of sendfile () was conducted, but failed. however, it is still used in the module. record the failed fine-tuning test.

Operating Platform: the client and server are all P4 computers, IDE hard drives, ora5 releases, and mb lan;

The receiver program is as follows:
FILE * fp = fopen (FILENAME, "wb ");

While (len = recv (sockfd, buff, sizeof (buff), 0)> 0)
{
Fwrite (buffer, 1, len, fp );
}
Fclose (fp );

A. The traditional sending method code segment is as follows:
Fd = open (FILENAME, O_RDONLY );
While (len = read (fd, buff, sizeof (buff)> 0)
{
Send (sockfd, buff, len, 0 );
}
Close (fd );

Because the block size specified during disk partitioning is 4096, the buff size is set to 4096 bytes for optimal Disk Data Reading. however, we found that setting 1024 or 8192 does not affect the transmission speed.

File Size: 9 M; time consumed: 0.71-0.76 seconds;
File Size: 32 M; time consumed: 2.64-2.68 seconds;
File Size: 64 MB; time consumed: 5.36-5.43 seconds;

B. Use sendfile () to transfer the code segment.
Off_t offset = 0;
Stat (FILENAME, & filestat );

Fd = open (FILENAME, O_RDONLY );
Sendfile (sockfd, fd, & offset, filestat. st_size ));
Close (fd );

File Size: 9 M; time consumed: 0.71-1.08 seconds;
File Size: 32 M; time consumed: 2.66-2.74 seconds;
File Size: 64 MB; time consumed: 5.43-6.64 seconds;

There seems to be a slight decrease. According to the man manual of sendfile, I called this function before using it

Int no = 1;
Printf ("% d \ n", setsockopt (sockfd, IPPROTO_TCP, TCP_CORK, (char *) & no, sizeof (int )));

File Size: 9 M; time consumed: 0.72-0.75 seconds;
File Size: 32 M; time consumed: 2.66-2.68 seconds;
File Size: 64 MB; time consumed: 5.38-5.60 seconds;

Does this seem to have reached the speed of the traditional approach ?! In either case, I use ethereal to capture packets and display that the playload of each tcp packet is usually up to 1448 bytes.

It seems that my test does not reflect the saying that "two copies of application layer data bring a lot of consumption. if the existence is justified, I want sendfile () to show its advantages in two cases, but I do not have the environment test:

1. Large number of concurrent file servers or HTTP servers;
2. Embedded Systems with insufficient memory resources;

In addition, a large number of TCP_CORK descriptions about tcp options on the network are outdated. this parameter can be used with TCP_NODELAY as mentioned in the man manual. however, after the TCP_NODELAY option is set, the package will be sent immediately regardless of whether or not TCP_CORK is set.

Supplement:
TCP_NODELAY and TCP_CORK basically control the "Nagle" of the package. The meaning of Nagle here is that the Nagle algorithm is used to assemble a smaller package into a larger frame. John Nagle is the inventor of the Nagle algorithm. The latter is named by his name, he used this method for the first time in 1984 to solve the network congestion problem of Ford Motor Corporation (for more information, see ietf rfc 896 ). The problem he solved is the so-called silly window syndrome, which is called the "stupid window syndrome" in Chinese. The specific meaning is that every time a universal Terminal application generates a key operation, it will send a packet, in typical cases, a packet has a data load of one byte and a 40-byte long packet header, resulting in 4000% overload, which can easily cause network congestion ,. Nagle became a standard and was immediately implemented on the Internet. It has now become the default configuration, but in our opinion, it is also necessary to turn this option off in some cases.

Now let's assume that an application sends a request to send small pieces of data. We can choose to send data immediately or wait for more data to be generated and then send it again. If we send data immediately, our interactive and customer/server applications will be greatly benefited. For example, when we are sending a short request and waiting for a large response, the associated overload will be lower than the total amount of data transmitted, and, if the request is sent immediately, the response time will be faster. You can set the TCP_NODELAY option of the socket to disable the Nagle algorithm.

In another case, we need to wait until the data size reaches the maximum to send all the data through the network. This data transmission method is beneficial to the communication performance of a large amount of data. A typical application is the file server. The application of the Nagle algorithm causes problems in this case. However, if you are sending a large amount of data, you can set the TCP_CORK option to disable Nagle. The method is exactly the same as that of TCP_NODELAY (TCP_CORK and TCP_NODELAY are mutually exclusive ). Next let's take a closer look at its working principles.

Assume that the application uses the sendfile () function to transfer a large amount of data. Application Protocols usually require sending certain information to pre-interpret the data, which is actually the header content. In typical cases, the header is small and TCP_NODELAY is set on the socket. Packets with headers will be transmitted immediately. In some cases (depending on the internal package counter), this packet is successfully received by the other party and needs to be confirmed by the other party. In this way, the transmission of a large amount of data will be postponed and unnecessary network traffic exchange will occur.

However, if we set the TCP_CORK option on the socket (which may be equivalent to inserting a "plug-in" on the pipeline), a packet with a header will fill in a large amount of data, all data is automatically transmitted through the package according to the size. When the data transmission is complete, it is best to cancel the TCP_CORK option setting to "Remove the plug" for the connection so that any part of the frames can be sent out. This is equally important for "congested" network connections.

All in all, if you can certainly send multiple data sets together (such as the HTTP response header and body), we recommend that you set the TCP_CORK option so that there is no latency between the data. It can greatly benefit the performance of WWW, FTP, and file servers, while also simplifying your work.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.