標籤:源碼 epoll 網路 演算法
網上看了很多Redis事件庫的解讀,自己也研究了好幾遍,還是記錄下來,雖然水平有限,但是進步總會是有的
網路事件庫封裝了Epoll的操作(當然是指Linux下的多工了),並且實現一個定時器,定時器也是服務端程式的基石,很多問題都需要靠定時器解決
(一)資料結構+演算法構成一個完整的程式,要一窺Redis網路程式庫,需要先從資料結構開始學習
1.整個事件迴圈是用一個全域的資料結構描述的,aeEventLoop
/* State of an event based program */typedef struct aeEventLoop { int maxfd; /* highest file descriptor currently registered */ int setsize; /* max number of file descriptors tracked */ long long timeEventNextId; time_t lastTime; /* Used to detect system clock skew */ aeFileEvent *events; /* Registered events */ aeFiredEvent *fired; /* Fired events */ aeTimeEvent *timeEventHead; int stop; void *apidata; /* This is used for polling API specific data */ aeBeforeSleepProc *beforesleep;} aeEventLoop;
maxfd:維護的註冊事件的最大fd
setsize:事件數目的個數,這也是檔案事件數目組和就緒事件數目組的最大值。對每一個fd進行的所有操作都需要進行邊界檢查
timeEventNextId:每加入一個時間事件,都需要給它一個ID,時間事件鏈雖然不是有序的,但是這個ID是一直自增的,這個就是最大的ID
LastTime:用來修正系統時間的
events和fired:檔案事件,就緒事件
stop:開關
apidata:不同實現代表不同,epoll裡面是這樣一個資料結構
typedef struct aeApiState { int epfd; struct epoll_event *events;} aeApiState;
beforesleep:每次進入主迴圈都要執行的,這裡會做很多的事情,具體後面會遇到
2.檔案事件
/* File event structure */typedef struct aeFileEvent { int mask; /* one of AE_(READABLE|WRITABLE) */ aeFileProc *rfileProc; aeFileProc *wfileProc; void *clientData;} aeFileEvent;
可以看到檔案事件維護了回調和相應fd的標誌,這裡可能就會好奇為什麼沒有維護fd呢,因為fd是存在就緒事件結構中的
3.就緒事件
/* A fired event */typedef struct aeFiredEvent { int fd; int mask;} aeFiredEvent;
就緒事件數目組的下標就是自己維護的fd,同時通過這個fd,也可以找到對應的回調,這裡先貼上這部分代碼
aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];fe->rfileProc(eventLoop,fd,fe->clientData,mask);fe->wfileProc(eventLoop,fd,fe->clientData,mask);
4.時間事件
這裡目前我覺得redis的實現不夠完美,當然文檔的注釋中也提到了,使用鏈表去維護時間事件,尋找的複雜度就會0(n),聽別人說可以用小根堆實現,目前就簡單分析一下
/* Time event structure */typedef struct aeTimeEvent { long long id; /* time event identifier. */ long when_sec; /* seconds */ long when_ms; /* milliseconds */ aeTimeProc *timeProc; aeEventFinalizerProc *finalizerProc; void *clientData; struct aeTimeEvent *next;} aeTimeEvent;
資料結構分析完了,接下來就看它的實現了,主迴圈部分:
void aeMain(aeEventLoop *eventLoop) { eventLoop->stop = 0; while (!eventLoop->stop) { if (eventLoop->beforesleep != NULL) eventLoop->beforesleep(eventLoop); aeProcessEvents(eventLoop, AE_ALL_EVENTS); }}
邏輯都在aeProcessEvents:
int aeProcessEvents(aeEventLoop *eventLoop, int flags){ int processed = 0, numevents; /* Nothing to do? return ASAP */ if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0; /* Note that we want call select() even if there are no * file events to process as long as we want to process time * events, in order to sleep until the next time event is ready * to fire. */ if (eventLoop->maxfd != -1 || ((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) { int j; aeTimeEvent *shortest = NULL; struct timeval tv, *tvp; if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT)) shortest = aeSearchNearestTimer(eventLoop); if (shortest) { long now_sec, now_ms; /* Calculate the time missing for the nearest * timer to fire. */ aeGetTime(&now_sec, &now_ms); tvp = &tv; tvp->tv_sec = shortest->when_sec - now_sec; if (shortest->when_ms < now_ms) { tvp->tv_usec = ((shortest->when_ms+1000) - now_ms)*1000; tvp->tv_sec --; } else { tvp->tv_usec = (shortest->when_ms - now_ms)*1000; } if (tvp->tv_sec < 0) tvp->tv_sec = 0; if (tvp->tv_usec < 0) tvp->tv_usec = 0; } else { /* If we have to check for events but need to return * ASAP because of AE_DONT_WAIT we need to set the timeout * to zero */ if (flags & AE_DONT_WAIT) { tv.tv_sec = tv.tv_usec = 0; tvp = &tv; } else { /* Otherwise we can block */ tvp = NULL; /* wait forever */ } } numevents = aeApiPoll(eventLoop, tvp); for (j = 0; j < numevents; j++) { aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd]; int mask = eventLoop->fired[j].mask; int fd = eventLoop->fired[j].fd; int rfired = 0; /* note the fe->mask & mask & ... code: maybe an already processed * event removed an element that fired and we still didn't * processed, so we check if the event is still valid. */ if (fe->mask & mask & AE_READABLE) { rfired = 1; fe->rfileProc(eventLoop,fd,fe->clientData,mask); } if (fe->mask & mask & AE_WRITABLE) { if (!rfired || fe->wfileProc != fe->rfileProc) fe->wfileProc(eventLoop,fd,fe->clientData,mask); } processed++; } } /* Check time events */ if (flags & AE_TIME_EVENTS) processed += processTimeEvents(eventLoop); return processed; /* return the number of processed file/time events */}
代碼有點長,其實抽象出來就三個步驟:
1根據flag擷取epoll_wait等待的時間,有這樣幾種情況,
如果有時間事件,那麼就從事件事件中找最快逾時的時間,並等待這個時間,這個策略很巧妙
如果設定為不等待,那麼就立馬返回
如果設定為其它標誌,就永久阻塞直到觸發事件
2.等到事件發生,並根據回調處理事件
3.處理時間事件
其中aeApiPoll的實現也就是封裝了epoll_wait
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) { aeApiState *state = eventLoop->apidata; int retval, numevents = 0; retval = epoll_wait(state->epfd,state->events,eventLoop->setsize, tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1); if (retval > 0) { int j; numevents = retval; for (j = 0; j < numevents; j++) { int mask = 0; struct epoll_event *e = state->events+j; if (e->events & EPOLLIN) mask |= AE_READABLE; if (e->events & EPOLLOUT) mask |= AE_WRITABLE; if (e->events & EPOLLERR) mask |= AE_WRITABLE; if (e->events & EPOLLHUP) mask |= AE_WRITABLE; eventLoop->fired[j].fd = e->data.fd; eventLoop->fired[j].mask = mask; } } return numevents;}
既然這裡有封裝epoll_wait,必然想去看看epoll_ctl和epoll_create的封裝了,如下封裝了建立Epoll控制代碼
static int aeApiCreate(aeEventLoop *eventLoop) { aeApiState *state = zmalloc(sizeof(aeApiState)); if (!state) return -1; state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize); if (!state->events) { zfree(state); return -1; } state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */ if (state->epfd == -1) { zfree(state->events); zfree(state); return -1; } eventLoop->apidata = state; return 0;}
而epoll_ctl的封裝就是系統需要暴露給外界的介面,即建立檔案事件和時間事件的介面
例如:建立一個檔案事件
int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask, aeFileProc *proc, void *clientData){ if (fd >= eventLoop->setsize) { errno = ERANGE; return AE_ERR; } aeFileEvent *fe = &eventLoop->events[fd]; if (aeApiAddEvent(eventLoop, fd, mask) == -1) return AE_ERR; fe->mask |= mask; if (mask & AE_READABLE) fe->rfileProc = proc; if (mask & AE_WRITABLE) fe->wfileProc = proc; fe->clientData = clientData; if (fd > eventLoop->maxfd) eventLoop->maxfd = fd; return AE_OK;}
這裡首先判斷fd的值,前面有說過,然後取出fe,根據mask設定fe的值,並且調用aeApiEvent,裡面才是調用了epoll_ctl
最後還要記得擦屁股,可能要修改一下maxfd
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) { aeApiState *state = eventLoop->apidata; struct epoll_event ee; /* If the fd was already monitored for some event, we need a MOD * operation. Otherwise we need an ADD operation. */ int op = eventLoop->events[fd].mask == AE_NONE ? EPOLL_CTL_ADD : EPOLL_CTL_MOD; ee.events = 0; mask |= eventLoop->events[fd].mask; /* Merge old events */ if (mask & AE_READABLE) ee.events |= EPOLLIN; if (mask & AE_WRITABLE) ee.events |= EPOLLOUT; ee.data.u64 = 0; /* avoid valgrind warning */ ee.data.fd = fd; if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return -1; return 0;}
首先判斷fd是否已近註冊,沒有就增加,有就需要修改,然後進行註冊,這裡底層介面的調用都是在直接上層,即ae_epoll進行處理的,上上層,即ae層只是對本層資料結構的維護,這種代碼邏輯很嚴密,可見作者水平
再比如:刪除一個檔案事件
void aeDeleteFileEvent(aeEventLoop *eventLoop, int fd, int mask){ if (fd >= eventLoop->setsize) return; aeFileEvent *fe = &eventLoop->events[fd]; if (fe->mask == AE_NONE) return; aeApiDelEvent(eventLoop, fd, mask); fe->mask = fe->mask & (~mask); if (fd == eventLoop->maxfd && fe->mask == AE_NONE) { /* Update the max fd */ int j; for (j = eventLoop->maxfd-1; j >= 0; j--) if (eventLoop->events[j].mask != AE_NONE) break; eventLoop->maxfd = j; }}
首先判斷fd是否未註冊,未註冊就直接返回了,註冊了就刪除fd上的事件,然後對fe進行處理,最後也有可能要修改maxfd的值
至於aeApiDelEvent的實現
static void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int delmask) { aeApiState *state = eventLoop->apidata; struct epoll_event ee; int mask = eventLoop->events[fd].mask & (~delmask); ee.events = 0; if (mask & AE_READABLE) ee.events |= EPOLLIN; if (mask & AE_WRITABLE) ee.events |= EPOLLOUT; ee.data.u64 = 0; /* avoid valgrind warning */ ee.data.fd = fd; if (mask != AE_NONE) { epoll_ctl(state->epfd,EPOLL_CTL_MOD,fd,&ee); } else { /* Note, Kernel < 2.6.9 requires a non null event pointer even for * EPOLL_CTL_DEL. */ epoll_ctl(state->epfd,EPOLL_CTL_DEL,fd,&ee); }}
可以看到,只是對fd進行修改註冊或者刪除上面的事件
處理時間事件
static int processTimeEvents(aeEventLoop *eventLoop) { int processed = 0; aeTimeEvent *te; long long maxId; time_t now = time(NULL); if (now < eventLoop->lastTime) { te = eventLoop->timeEventHead; while(te) { te->when_sec = 0; te = te->next; } } eventLoop->lastTime = now; te = eventLoop->timeEventHead; maxId = eventLoop->timeEventNextId-1; while(te) { long now_sec, now_ms; long long id; if (te->id > maxId) { te = te->next; continue; } aeGetTime(&now_sec, &now_ms); if (now_sec > te->when_sec || (now_sec == te->when_sec && now_ms >= te->when_ms)) { int retval; id = te->id; retval = te->timeProc(eventLoop, id, te->clientData); processed++; if (retval != AE_NOMORE) { aeAddMillisecondsToNow(retval,&te->when_sec,&te->when_ms); } else { aeDeleteTimeEvent(eventLoop, id); } te = eventLoop->timeEventHead; } else { te = te->next; } } return processed;}
其實就是搜尋,然後處理,這裡處理的時候把這個節點刪除了,在aeDeleteTimeEvent中如下
int aeDeleteTimeEvent(aeEventLoop *eventLoop, long long id){ aeTimeEvent *te, *prev = NULL; te = eventLoop->timeEventHead; while(te) { if (te->id == id) { if (prev == NULL) eventLoop->timeEventHead = te->next; else prev->next = te->next; if (te->finalizerProc) te->finalizerProc(eventLoop, te->clientData); zfree(te); return AE_OK; } prev = te; te = te->next; } return AE_ERR; /* NO event with the specified ID found */}
目前就分析到這裡,帶有幾個問題後面再去閱讀源碼:
beforesleep到底幹了什嗎?
真箇程式的流程,包括網路連接那部分又是如何組織到Epoll中的?
Redis源碼-事件庫