How to allocate fd in Linux

Source: Internet
Author: User

How to allocate fd in Linux

In the past few days, many network communication codes have been written in the company, which naturally involves the issue of IO event monitoring methods. I was surprised to find that the select round training method was still quite popular. I told them that select should be discarded in both Linux and windows, the reason is that the system invocation of select on both platforms has a fatal pitfall.

In windows, the number of socket handle contained in a single fd_set cannot exceed FD_SETSIZE (which is defined as 64 in win32 winsock2.h, based on VS2010 ), in addition, the fd_set structure uses an array to hold the socket handle. Each time the FD_SET macro puts a socket handle into this array, and in this process, the value cannot exceed FD_SETSIZE, for more information, see the FD_SET macro definition in winsock2.h.

The problem here is:

If the number of socket handle in fd_set has reached FD_SETSIZE, subsequent FD_SET operations will not work, and the IO event corresponding to socket handle will be missed !!!

In Linux, the problem is actually in the fd_set structure and FD_SET macro. In this case, the fd_set structure uses the bit sequence to record the fd of each IO event to be detected. The record method is slightly complicated, as shown below:

In/usr/include/sys/select. h

 
 
  1. typedef long int __fd_mask;  
  2. #define __NFDBITS    (8 * sizeof (__fd_mask))  
  3. #define    __FDELT(d)    ((d) / __NFDBITS)  
  4.  
  5. #define    __FDMASK(d)    ((__fd_mask) 1 << ((d) % __NFDBITS))  
  6.  
  7. typedef struct  
  8.   {  
  9.     /* XPG4.2 requires this member name.  Otherwise avoid the name  
  10.        from the global namespace.  */  
  11. #ifdef __USE_XOPEN  
  12.     __fd_mask fds_bits[__FD_SETSIZE / __NFDBITS];  
  13. # define __FDS_BITS(set) ((set)->fds_bits)  
  14. #else  
  15.     __fd_mask __fds_bits[__FD_SETSIZE / __NFDBITS];  
  16. # define __FDS_BITS(set) ((set)->__fds_bits)  
  17. #endif  
  18.   } fd_set;  
  19.  
  20. #define    FD_SET(fd, fdsetp)    __FD_SET (fd, fdsetp) 

/Usr/include/bits/select. h

 
 
  1. 1 # define __FD_SET(d, set)    (__FDS_BITS (set)[__FDELT (d)] |= __FDMASK (d)) 

We can see that in the above process, the position of each bit in the bit sequence of fd_set corresponds to the value of fd. In the fd_set structure, the number of bits is defined by _ FD_SETSIZE, where __fd_setsize is/usr/include/bits/typesize. h (the inclusion relationship is as follows sys/socket. h-> bits/types. h-> bits/typesizes. h) is defined as 1024.

The problem is,When fd> = 1024, The FD_SET Macro will actually cause out-of-memory write.. In fact, there are clear descriptions in man select, as follows:

NOTES

 

An fd_set is a fixed size buffer. Executing FD_CLR () or FD_SET () with a value of fd that is negative or is equal to or
Larger than FD_SETSIZE will result in undefined behavior. Moreover, POSIX requires fd to be a valid file descriptor.

This includes what I did not notice before and is also described in the blog "crash caused by a select statement" published by cloud wind.

It can be seen that the select statement is not safe in Linux. If you want to use it, you must carefully check whether the fd reaches 1024, but this is difficult to achieve, otherwise, use poll or epoll.

This article introduces the topic of this article, that is, how to determine the distribution of fd values in Linux. We all know that fd is of the int type, however, how does the value increase? In the following content, I have analyzed this. Taking the kernel version 2.6.30 as an example, we are welcome to make a brick.

First, you need to know which function is used for fd allocation. For this purpose, we take pipe as an example. It is a typical syscall for fd allocation, in fs/pipe. c defines the syscall Implementation of pipe and pipe2, as follows:

 
 
  1. SYSCALL_DEFINE2(pipe2, int __user *, fildes, int, flags)  
  2. {  
  3.     int fd[2];  
  4.     int error;  
  5.  
  6.     error = do_pipe_flags(fd, flags);  
  7.     if (!error) {  
  8.         if (copy_to_user(fildes, fd, sizeof(fd))) {  
  9.             sys_close(fd[0]);  
  10.             sys_close(fd[1]);  
  11.             error = -EFAULT;  
  12.         }  
  13.     }  
  14.     return error;  
  15. }  
  16.  
  17. SYSCALL_DEFINE1(pipe, int __user *, fildes)  
  18. {  
  19.     return sys_pipe2(fildes, 0);  

After further analyzing the implementation of do_pipe_flags (), we find that get_unused_fd_flags (flags) is used to allocate fd. It is a macro.

# Define get_unused_fd_flags (flags) alloc_fd (0, (flags), located in include/linux/fs. h

Now we have found the main character: alloc_fd (), which is the function for actually executing the fd allocation in the kernel chapter. It is located in fs/file. c, and the implementation is also very simple, as follows:

 
 
  1. int alloc_fd(unsigned start, unsigned flags)  

  2. {  

  3.     struct files_struct *files = current->files;  

  4.     unsigned int fd;  

  5.     int error;  

  6.     struct fdtable *fdt;  

  7.  

  8.     spin_lock(&files->file_lock);  

  9. repeat:  

  10.     fdt = files_fdtable(files);  

  11.    fd = start;  

  12.     if (fd < files->next_fd)  

  13.         fd = files->next_fd;  

  14.  

  15.     if (fd < fdt->max_fds)  

  16.         fd = find_next_zero_bit(fdt->open_fds->fds_bits,  

  17.                        fdt->max_fds, fd);  

  18.  

  19.     error = expand_files(files, fd);  

  20.     if (error < 0)  

  21.         goto out;  

  22.  

  23.     /*  

  24.      * If we needed to expand the fs array we  

  25.      * might have blocked - try again.  

  26.      */  

  27.     if (error)  

  28.         goto repeat;  

  29.  

  30.     if (start <= files->next_fd)  

  31.         files->next_fd = fd + 1;  

  32.  

  33.     FD_SET(fd, fdt->open_fds);  

  34.     if (flags & O_CLOEXEC)  

  35.         FD_SET(fd, fdt->close_on_exec);  

  36.     else  

  37.         FD_CLR(fd, fdt->close_on_exec);  

  38.     error = fd;  

  39. #if 1  

  40.     /* Sanity check */  

  41.     if (rcu_dereference(fdt->fd[fd]) != NULL) {  

  42.         printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);  

  43.         rcu_assign_pointer(fdt->fd[fd], NULL);  

  44.     }  

  45. #endif  

  46.  

  47. out:  

  48.     spin_unlock(&files->file_lock);  

  49.     return error;  

In pipe system calls, the start value is always 0, while the key expand_files () function in the middle is based on the fd value given, determine whether to expand the open file table of the process. The function header comment is as follows:

 
 
  1. /*  
  2.  * Expand files.  
  3.  * This function will expand the file structures, if the requested size exceeds  
  4.  * the current capacity and there is room for expansion.  
  5.  * Return <0 error code on error; 0 when nothing done; 1 when files were  
  6.  * expanded and execution may have blocked.  
  7.  * The files->file_lock should be held on entry, and will be held on exit.  
  8.  */ 

We will not go into its implementation here. Back to alloc_fd (), we can see that its principle of allocating fd is

If the allocation fails, the error code of EMFILE is returned, indicating that there are too many fd instances in the current process.

This also confirms that the fd corresponding to the client link is worth changing every time in the server program written by the company (whose kernel is 2.6.18, if the fd value assigned to a new connection is 8, after the connection is closed, the fd value assigned to the new link is 8, the fd value of the new link is gradually increased by 1.

For this reason, I continued to find the fd allocation method corresponding to the socket and found that it was also alloc_fd (0, (flags). The call sequence is as follows:
Socket (sys_call)-> sock_map_fd ()-> sock_alloc_fd ()-> get_unused_fd_flags ()
Get_unused_fd_flags () is also used for open system calls.

Now I want to go back and talk about the select statement in the beginning. Due to the Distribution Rules of fd in Linux, the fd value is always kept as small as possible. Generally, in a non-IO system, it is true that the probability of fd value reaching 1024 in a process is relatively small. Therefore, whether to discard the select statement is not an absolute conclusion. If the design system does have other measures to ensure that the fd value is less than 1024, it is understandable to use select.

However, in the case of network communication programs, we should never make this assumption, so we should try not to use select as much as possible !!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.