I read a lot of interpretations of the redis event database on the Internet, and I have studied it several times. I have recorded it. Although the level is limited, there will always be progress.
The network event library encapsulates epoll operations (of course, the multiplexing in Linux) and implements a timer. The timer is also the cornerstone of the server program, and many problems need to be solved by the timer.
(1) Data Structure + algorithms constitute a complete program. To get a glimpse of the redis network library, you must first learn from the data structure.
1. The entire event loop is described using a global data structure, aeeventloop
/* State of an event based program */typedef struct aeEventLoop { int maxfd; /* highest file descriptor currently registered */ int setsize; /* max number of file descriptors tracked */ long long timeEventNextId; time_t lastTime; /* Used to detect system clock skew */ aeFileEvent *events; /* Registered events */ aeFiredEvent *fired; /* Fired events */ aeTimeEvent *timeEventHead; int stop; void *apidata; /* This is used for polling API specific data */ aeBeforeSleepProc *beforesleep;} aeEventLoop;
Maxfd: maintains the maximum FD of registration events
Setsize: the number of events, which is also the maximum value of the array of file events and ready events. All operations performed on each FD require a boundary check.
Timeeventnextid: each time a time event is added, an ID is required. Although the time event chain is not sequential, this ID is always auto-incrementing. This is the largest ID.
Lasttime: used to modify the system time
Events and fired: file events, ready events
Stop: Switch
Apidata: Different implementations are different. epoll contains such a data structure.
typedef struct aeApiState { int epfd; struct epoll_event *events;} aeApiState;
Beforesleep: It must be executed every time you enter the main loop. Here we will do a lot of things.
2. File events
/* File event structure */typedef struct aeFileEvent { int mask; /* one of AE_(READABLE|WRITABLE) */ aeFileProc *rfileProc; aeFileProc *wfileProc; void *clientData;} aeFileEvent;
We can see that the file event maintains the marker of the echo and corresponding FD. Here we may wonder why the FD is not maintained, because the FD is in the ready event structure.
3. Readiness events
/* A fired event */typedef struct aeFiredEvent { int fd; int mask;} aeFiredEvent;
The subscript of the ready event array is the self-maintained FD. At the same time, you can find the corresponding callback through this FD. This part of code is first pasted here.
aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];fe->rfileProc(eventLoop,fd,fe->clientData,mask);fe->wfileProc(eventLoop,fd,fe->clientData,mask);
4. Time event
At present, I think the implementation of redis is not perfect. Of course, as mentioned in the notes in the document, using the linked list to maintain time events will make the search complexity zero (n ), it can be implemented by using a small root heap. Now, let's analyze it briefly.
/* Time event structure */typedef struct aeTimeEvent { long long id; /* time event identifier. */ long when_sec; /* seconds */ long when_ms; /* milliseconds */ aeTimeProc *timeProc; aeEventFinalizerProc *finalizerProc; void *clientData; struct aeTimeEvent *next;} aeTimeEvent;
After the data structure analysis is complete, Let's see its implementation. The main loop section:
void aeMain(aeEventLoop *eventLoop) { eventLoop->stop = 0; while (!eventLoop->stop) { if (eventLoop->beforesleep != NULL) eventLoop->beforesleep(eventLoop); aeProcessEvents(eventLoop, AE_ALL_EVENTS); }}
The logic is in aeprocessevents:
int aeProcessEvents(aeEventLoop *eventLoop, int flags){ int processed = 0, numevents; /* Nothing to do? return ASAP */ if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0; /* Note that we want call select() even if there are no * file events to process as long as we want to process time * events, in order to sleep until the next time event is ready * to fire. */ if (eventLoop->maxfd != -1 || ((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) { int j; aeTimeEvent *shortest = NULL; struct timeval tv, *tvp; if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT)) shortest = aeSearchNearestTimer(eventLoop); if (shortest) { long now_sec, now_ms; /* Calculate the time missing for the nearest * timer to fire. */ aeGetTime(&now_sec, &now_ms); tvp = &tv; tvp->tv_sec = shortest->when_sec - now_sec; if (shortest->when_ms < now_ms) { tvp->tv_usec = ((shortest->when_ms+1000) - now_ms)*1000; tvp->tv_sec --; } else { tvp->tv_usec = (shortest->when_ms - now_ms)*1000; } if (tvp->tv_sec < 0) tvp->tv_sec = 0; if (tvp->tv_usec < 0) tvp->tv_usec = 0; } else { /* If we have to check for events but need to return * ASAP because of AE_DONT_WAIT we need to set the timeout * to zero */ if (flags & AE_DONT_WAIT) { tv.tv_sec = tv.tv_usec = 0; tvp = &tv; } else { /* Otherwise we can block */ tvp = NULL; /* wait forever */ } } numevents = aeApiPoll(eventLoop, tvp); for (j = 0; j < numevents; j++) { aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd]; int mask = eventLoop->fired[j].mask; int fd = eventLoop->fired[j].fd; int rfired = 0; /* note the fe->mask & mask & ... code: maybe an already processed * event removed an element that fired and we still didn't * processed, so we check if the event is still valid. */ if (fe->mask & mask & AE_READABLE) { rfired = 1; fe->rfileProc(eventLoop,fd,fe->clientData,mask); } if (fe->mask & mask & AE_WRITABLE) { if (!rfired || fe->wfileProc != fe->rfileProc) fe->wfileProc(eventLoop,fd,fe->clientData,mask); } processed++; } } /* Check time events */ if (flags & AE_TIME_EVENTS) processed += processTimeEvents(eventLoop); return processed; /* return the number of processed file/time events */}
The Code is a little long. In fact, it takes three steps to abstract the Code:
1. Obtain the epoll_wait wait time based on the flag,
If there is a time event, find the fastest time-out from the event and wait for the time. This policy is clever.
If it is set to not wait, then immediately return
If it is set to another flag, it will be permanently blocked until the event is triggered.
2. Wait until the event occurs and process the event according to the callback
3. process time events
The implementation of aeapipoll encapsulates epoll_wait.
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) { aeApiState *state = eventLoop->apidata; int retval, numevents = 0; retval = epoll_wait(state->epfd,state->events,eventLoop->setsize, tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1); if (retval > 0) { int j; numevents = retval; for (j = 0; j < numevents; j++) { int mask = 0; struct epoll_event *e = state->events+j; if (e->events & EPOLLIN) mask |= AE_READABLE; if (e->events & EPOLLOUT) mask |= AE_WRITABLE; if (e->events & EPOLLERR) mask |= AE_WRITABLE; if (e->events & EPOLLHUP) mask |= AE_WRITABLE; eventLoop->fired[j].fd = e->data.fd; eventLoop->fired[j].mask = mask; } } return numevents;}
Since epoll_wait is encapsulated here, you must look at the epoll_ctl and epoll_create encapsulation. The following encapsulates the epoll handle creation.
static int aeApiCreate(aeEventLoop *eventLoop) { aeApiState *state = zmalloc(sizeof(aeApiState)); if (!state) return -1; state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize); if (!state->events) { zfree(state); return -1; } state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */ if (state->epfd == -1) { zfree(state->events); zfree(state); return -1; } eventLoop->apidata = state; return 0;}
Epoll_ctl encapsulation is the interface that the system needs to expose to the outside world, that is, the interface for creating file events and time events.
For example, create a file event
int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask, aeFileProc *proc, void *clientData){ if (fd >= eventLoop->setsize) { errno = ERANGE; return AE_ERR; } aeFileEvent *fe = &eventLoop->events[fd]; if (aeApiAddEvent(eventLoop, fd, mask) == -1) return AE_ERR; fe->mask |= mask; if (mask & AE_READABLE) fe->rfileProc = proc; if (mask & AE_WRITABLE) fe->wfileProc = proc; fe->clientData = clientData; if (fd > eventLoop->maxfd) eventLoop->maxfd = fd; return AE_OK;}
First, judge the FD value. As mentioned above, take Fe out, set Fe value based on mask, and call aeapievent. epoll_ctl is called.
Remember to wipe your ass. You may need to modify maxfd.
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) { aeApiState *state = eventLoop->apidata; struct epoll_event ee; /* If the fd was already monitored for some event, we need a MOD * operation. Otherwise we need an ADD operation. */ int op = eventLoop->events[fd].mask == AE_NONE ? EPOLL_CTL_ADD : EPOLL_CTL_MOD; ee.events = 0; mask |= eventLoop->events[fd].mask; /* Merge old events */ if (mask & AE_READABLE) ee.events |= EPOLLIN; if (mask & AE_WRITABLE) ee.events |= EPOLLOUT; ee.data.u64 = 0; /* avoid valgrind warning */ ee.data.fd = fd; if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return -1; return 0;}
First, determine whether the FD is nearly registered. If it is not, add it. If it is available, modify it and register it. Here, the underlying interfaces are called directly at the upper layer, that is, AE _epoll is processed at the upper layer, that is, the AE layer only maintains the data structure of the current layer. This Code logic is very strict and can be seen at the author level.
For example, deleting a file event
void aeDeleteFileEvent(aeEventLoop *eventLoop, int fd, int mask){ if (fd >= eventLoop->setsize) return; aeFileEvent *fe = &eventLoop->events[fd]; if (fe->mask == AE_NONE) return; aeApiDelEvent(eventLoop, fd, mask); fe->mask = fe->mask & (~mask); if (fd == eventLoop->maxfd && fe->mask == AE_NONE) { /* Update the max fd */ int j; for (j = eventLoop->maxfd-1; j >= 0; j--) if (eventLoop->events[j].mask != AE_NONE) break; eventLoop->maxfd = j; }}
First, determine whether the FD is not registered. If it is not registered, it will return directly. If it is registered, it will delete the event on the FD, and then process the FE. Finally, it is possible to modify the value of maxfd.
Implementation of aeapidelevent
static void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int delmask) { aeApiState *state = eventLoop->apidata; struct epoll_event ee; int mask = eventLoop->events[fd].mask & (~delmask); ee.events = 0; if (mask & AE_READABLE) ee.events |= EPOLLIN; if (mask & AE_WRITABLE) ee.events |= EPOLLOUT; ee.data.u64 = 0; /* avoid valgrind warning */ ee.data.fd = fd; if (mask != AE_NONE) { epoll_ctl(state->epfd,EPOLL_CTL_MOD,fd,&ee); } else { /* Note, Kernel < 2.6.9 requires a non null event pointer even for * EPOLL_CTL_DEL. */ epoll_ctl(state->epfd,EPOLL_CTL_DEL,fd,&ee); }}
As you can see, you only need to modify and register the FD or delete the above event.
Process time events
static int processTimeEvents(aeEventLoop *eventLoop) { int processed = 0; aeTimeEvent *te; long long maxId; time_t now = time(NULL); if (now < eventLoop->lastTime) { te = eventLoop->timeEventHead; while(te) { te->when_sec = 0; te = te->next; } } eventLoop->lastTime = now; te = eventLoop->timeEventHead; maxId = eventLoop->timeEventNextId-1; while(te) { long now_sec, now_ms; long long id; if (te->id > maxId) { te = te->next; continue; } aeGetTime(&now_sec, &now_ms); if (now_sec > te->when_sec || (now_sec == te->when_sec && now_ms >= te->when_ms)) { int retval; id = te->id; retval = te->timeProc(eventLoop, id, te->clientData); processed++; if (retval != AE_NOMORE) { aeAddMillisecondsToNow(retval,&te->when_sec,&te->when_ms); } else { aeDeleteTimeEvent(eventLoop, id); } te = eventLoop->timeEventHead; } else { te = te->next; } } return processed;}
It is actually a search, and then processing. This node is deleted when processing here, as shown in the following section in aedeletetimeevent:
int aeDeleteTimeEvent(aeEventLoop *eventLoop, long long id){ aeTimeEvent *te, *prev = NULL; te = eventLoop->timeEventHead; while(te) { if (te->id == id) { if (prev == NULL) eventLoop->timeEventHead = te->next; else prev->next = te->next; if (te->finalizerProc) te->finalizerProc(eventLoop, te->clientData); zfree(te); return AE_OK; } prev = te; te = te->next; } return AE_ERR; /* NO event with the specified ID found */}
At present, I have analyzed the problem here. I will go back to the source code with several questions:
What did beforesleep do?
How is the process of a real program, including network connection, organized into epoll?
Redis source code-event Library