昨晚分析了poll,通過代碼的閱讀可以發現,poll操作有很多可以最佳化的地方。epoll是eventpoll的簡稱,他的效率是非常高的,我們今天來看看他的實現。他的實現在FS/Eventpoll.c,代碼有1500多行,呵呵,怕了吧。
大家都知道,epoll有三個系統調用,C庫封裝成以下三個:
1. int epoll_create(int size);
2.
int epoll_ctl(int epfd, int op, int fd, struct epoll_event
*event);
3. int epoll_wait(int epfd, struct epoll_event
*events,int maxevents, int timeout);
epoll的源碼這麼多,我們就乾脆跟著他們三個走著瞧。今天先搞定第一個---epoll_create
第一個是
/** It opens an eventpoll file descriptor by suggesting a storage of "size"* file descriptors. The size parameter is just an hint about how to size* data structures. It won't prevent the user to store more than "size"* file descriptors inside the epoll interface. It is the kernel part of* the userspace epoll_create(2).*/asmlinkage long sys_epoll_create(int size){int error, fd;struct inode *inode;struct file *file;DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d)\n",current, size));/* Sanity check on the size parameter */error = -EINVAL;if (size <= 0)goto eexit_1;/** Creates all the items needed to setup an eventpoll file. That is,* a file structure, and inode and a free file descriptor.*/error = ep_getfd(&fd, &inode, &file); //(1)if (error)goto eexit_1;/* Setup the file internal data structure ( "struct eventpoll" ) */error = ep_file_init(file); //(2)if (error)goto eexit_2;DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %d\n",current, size, fd));return fd;eexit_2:sys_close(fd);eexit_1:DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %d\n",current, size, error));return error;}
(1)這裡用到了一個ep_getfd函數,從注釋我們知道,這個函數建立eventpoll相關的file,當然,一個file要包括檔案描述符、inode、還有檔案對象,這也是我們傳的三個參數。廢話不說,看源碼:
/** Creates the file descriptor to be used by the epoll interface.*/static int ep_getfd(int *efd, struct inode **einode, struct file **efile){struct qstr this;char name[32];struct dentry *dentry;struct inode *inode;struct file *file;int error, fd;/* Get an ready to use file */error = -ENFILE;file = get_empty_filp();if (!file)goto eexit_1;/* Allocates an inode from the eventpoll file system */inode = ep_eventpoll_inode();error = PTR_ERR(inode);if (IS_ERR(inode))goto eexit_2;/* Allocates a free descriptor to plug the file onto */error = get_unused_fd();if (error < 0)goto eexit_3;fd = error;/** Link the inode to a directory entry by creating a unique name* using the inode number.*/error = -ENOMEM;sprintf(name, "[%lu]", inode->i_ino);this.name = name;this.len = strlen(name);this.hash = inode->i_ino;dentry = d_alloc(eventpoll_mnt->mnt_sb->s_root, &this);if (!dentry)goto eexit_4;dentry->d_op = &eventpollfs_dentry_operations;d_add(dentry, inode);file->f_vfsmnt = mntget(eventpoll_mnt);file->f_dentry = dentry;file->f_mapping = inode->i_mapping;file->f_pos = 0;file->f_flags = O_RDONLY;file->f_op = &eventpoll_fops;file->f_mode = FMODE_READ;file->f_version = 0;file->private_data = NULL;/* Install the new setup file into the allocated fd. */fd_install(fd, file);*efd = fd;*einode = inode;*efile = file;return 0;eexit_4:put_unused_fd(fd);eexit_3:iput(inode);eexit_2:put_filp(file);eexit_1:return error;}
這個函數的注釋都比較全,這裡簡單提一下,況且因為涉及到的函數太多,要深究起來涉及的知識太多,也不可能逐一去列代碼。不過這個函數個人覺得比較經典,這函數就是建立一個檔案的流程。
首先,我們得拿到一個file結構體,通過核心分配給我們;然後我們要拿到inode,調用這個ep_eventpoll_inode()就可以了;接著是get_unused_fd()拿到檔案描述符;接著d_alloc()函數為我們拿到一個dentry;d_add(dentry,
inode)函數把dentry建立hash裡面並且綁定inode;後面是繼續填充檔案對象file;fd_install(fd,
file)向進程註冊檔案,並通過這樣的方式把檔案描述符和檔案對象關聯起來。
(2)在跟蹤ep_file_init函數之前,我們先來看一下eventpoll結構體:
/** This structure is stored inside the "private_data" member of the file* structure and rapresent the main data sructure for the eventpoll* interface.*/struct eventpoll {/* Protect the this structure access */rwlock_t lock;/** This semaphore is used to ensure that files are not removed* while epoll is using them. This is read-held during the event* collection loop and it is write-held during the file cleanup* path, the epoll file exit code and the ctl operations.*/struct rw_semaphore sem;/* Wait queue used by sys_epoll_wait() */wait_queue_head_t wq;/* Wait queue used by file->poll() */wait_queue_head_t poll_wait;/* List of ready file descriptors */struct list_head rdllist;/* RB-Tree root used to store monitored fd structs */struct rb_root rbr;};
注釋也是相當清楚。這個eventpoll可以看得出來,是epoll的核心,它將會儲存你想要監聽的檔案描述符,這也是為什麼epoll高效之所在。
好,我們回到sys_epoll_create函數,開始跟蹤ep_file_init函數:
static int ep_file_init(struct file *file){struct eventpoll *ep;if (!(ep = kmalloc(sizeof(struct eventpoll), GFP_KERNEL)))return -ENOMEM;memset(ep, 0, sizeof(*ep));rwlock_init(&ep->lock);init_rwsem(&ep->sem);init_waitqueue_head(&ep->wq);init_waitqueue_head(&ep->poll_wait);INIT_LIST_HEAD(&ep->rdllist);ep->rbr = RB_ROOT;file->private_data = ep;DNPRINTK(3, (KERN_INFO "[%p] eventpoll: ep_file_init() ep=%p\n",current, ep));return 0;}
其實也就是eventpoll結構體的初始化。
sys_epoll_create函數大概就這樣了,明天接著看sys_epoll_ctl。