As a server program, Redis is the key to network IO processing. Unlike memcached, Redis uses libevent, which implements its own IO event framework and is simple and compact. You can choose Select, Epoll, Kqueue, and so on.As an IO event framework, it is necessary to abstract the commonality of various IO models and to abstract the whole process into:1) Initialization2) adding, deleting events3) Wait for the event to occurFollow this procedure to analyze the code as well. (1) initializationRecall that during the initialization of Redis, the Initserver function invokes Aecreateeventloop to create an event loop object to initialize the events loop. Here's a look at the aeeventloop structure, which stores event loop-related properties.
typedef struct AEEVENTLOOP { int maxfd; /* Highest file descriptor currently registered */ int setsize;/* max number of file descriptors tracked */ long Long Timeeventnextid; <MM> //Store the time when the timer event was last triggered //</MM> time_t lasttime; /* used to detect system clock skew */ aefileevent *events;/* Registered Events */ aefiredevent *fired;/* Fired Events * //<MM>// all timer events organized into linked list //</MM> aetimeevent *timeeventhead; <MM>// whether stop eventloop //</MM> int stop; void *apidata; /* This is used for polling API specific data *// <MM> //Event loop every iteration will call Beforesleep //</mm>
aebeforesleepproc *beforesleep;} Aeeventloop;
SetSize: Specifies the size of the collection of file descriptors to listen for the event loop. This value is related to the maxclients in the configuration file.Events: holds all registered read-write events and is an array of size setsize. The kernel guarantees that the new connection's FD is the minimum value of the currently available descriptor, so the maximum number of setsize descriptors is to listen, so the largest FD is setsize-1. The advantage of this organization is that it can be indexed to the corresponding event with FD, and quickly find the corresponding event based on FD after the event is triggered.Fired: Stores triggered read and write events. It is also an array of setsize size.The Timeeventhead:redis organizes the timer events into a linked list, which points to the table header.Apidata: Storage of Epoll, select and other implementation-related data.Beforesleep: The event loop calls Beforesleep to perform some asynchronous processing before each iteration.
The abstract function of the IO model initialization is aeapicreate. The Aecreateeventloop function creates and initializes the global event loop structure, and invokes Aeapicreate to initialize the data structure that the concrete implementation relies on.
Aeeventloop *aecreateeventloop (int setsize) {Aeeventloop *eventloop; int i; <MM>//SetSize Specifies the number of FD that the event loop listens on//because the kernel guarantees that the newly created FD is the smallest positive integer, directly creates an array of setsize size//, storing the corresponding Event//</MM> if (EventLoop = Zmalloc (sizeof (*eventloop)) = = NULL) goto err; eventloop->events = Zmalloc (sizeof (aefileevent) *setsize); eventloop->fired = Zmalloc (sizeof (aefiredevent) *setsize); if (eventloop->events = = NULL | | eventloop->fired = = NULL) goto err; Eventloop->setsize = SetSize; Eventloop->lasttime = time (NULL); Eventloop->timeeventhead = NULL; Eventloop->timeeventnextid = 0; eventloop->stop = 0; EVENTLOOP->MAXFD =-1; Eventloop->beforesleep = NULL; if (aeapicreate (eventloop) = =-1) goto err; /* Events with mask = = Ae_none is not set. So let's initialize the * vector with it. */for (i = 0; i < setsize; i++) eventloop->events[i].mask = Ae_none; Return Eventloop;err:if (EventLoop) { Zfree (eventloop->events); Zfree (eventloop->fired); Zfree (EventLoop); } return NULL;
In the case of Epoll, the aeapicreate is primarily the FD that creates Epoll, and the epoll_event to listen to, which are defined in:
typedef struct AEAPISTATE { int epfd; struct epoll_event *events;} Aeapistate;
here, the supervisor hears the event organized in the same way as the Event_loop, the same is the size of setsize data, with FD as subscript. Aeapicreate Initializes these properties and stores the aeapistate structure to eventloop->apidata.
static int aeapicreate (Aeeventloop *eventloop) { aeapistate *state = zmalloc (sizeof (aeapistate)); if (!state) return-1; state->events = zmalloc (sizeof (struct epoll_event) *eventloop->setsize); if (!state->events) { zfree (state); return-1; } STATE->EPFD = epoll_create (1024); /* Just a hint for the kernel * /if (STATE->EPFD = =-1) { zfree (state->events); Zfree (state); return-1; } Eventloop->apidata = State; return 0;}
(2) adding, deleting eventsRedis supports two types of events, network IO events, and timer events. The addition and deletion of timer events is relatively simple, mainly to maintain the list of timer events. First look at the structure that represents the timer event:
/* Time Event structure */typedef struct Aetimeevent { long long ID;/* time event identifier. * /long when_sec; * seconds */ long When_ms;/* milliseconds */ aetimeproc *timeproc; Aeeventfinalizerproc *finalizerproc; void *clientdata; struct aetimeevent *next;} Aetimeevent;
When_sec and When_ms: Represents the event stamp triggered by the timer, and after the event loop iteration returns, the event handler is recalled if the current timestamp is greater than this value.Timeproc: Event handler function.Finalizerproc: Cleanup function, called when the timer is removed.Clientdata: The arguments that need to pass in the event handler function.Next: The timer event is organized into a linked list, and next points to the next event.
The Aecreatetimeevent function is used to add timer events, and the logic is simple to calculate the next triggered event based on the current time, assign a value to the event property, and insert it before the timer-linked list header. Remove the node from the linked list by using the Aedeletetimeevent function, locate the event based on the ID, and callback the cleanup function. Specific Timer event processing see below, look at the IO event below.The addition of IO events through aecreatefileevent, the logic is simple, according to the FD to be registered, get its event, set the property, will call the Aeapiaddevent function added to the underlying IO model.
int aecreatefileevent (aeeventloop *eventloop, int fd, int mask, aefileproc *proc, void *clientdata) { if (fd >= eventloop->setsize) { errno = Erange; return ae_err; } Aefileevent *fe = &eventLoop->events[fd]; if (Aeapiaddevent (EventLoop, FD, mask) = =-1) return ae_err; Fe->mask |= Mask; if (Mask & ae_readable) Fe->rfileproc = proc; if (Mask & ae_writable) Fe->wfileproc = proc; Fe->clientdata = Clientdata; if (fd > Eventloop->maxfd) eventloop->maxfd = FD; return AE_OK;}
Mask: Specifies the type of event to register, which can be read or write.Proc: event handler function.
The following is the structure of the IO event, including the registered event type mask, the read and write event handler, and the corresponding parameters.
/* File event structure */typedef struct aefileevent { int mask;/* One of Ae_ (readable| Writable) */ aefileproc *rfileproc; Aefileproc *wfileproc; void *clientdata;} Aefileevent;
Let's take a look at the implementation of Epoll add event, mainly called Epoll_ctl。
static int aeapiaddevent (aeeventloop *eventloop, int fd, int mask) { aeapistate *state = eventloop->apidata; struct epoll_event ee; /* If The FD is already monitored for some event, we need a MOD * operation. Otherwise We need an ADD operation. */ int op = eventloop->events[fd].mask = = Ae_none? Epoll_ctl_add:epoll_ctl_mod; ee.events = 0; Mask |= eventloop->events[fd].mask; /* Merge Old events * /if (Mask & ae_readable) ee.events |= Epollin; if (Mask & ae_writable) ee.events |= epollout; Ee.data.u64 = 0; /* Avoid valgrind warning */ EE.DATA.FD = FD; if (Epoll_ctl (state->epfd,op,fd,&ee) = =-1) return-1; return 0;}
The struct epll_event is used to specify the event to listen on, and the data of the file descriptor binding, which can be returned when the event is triggered. Here, data is stored directly as FD, and by this we can find the corresponding event and invoke its handler function.Epoll Delete and add similar, no longer repeat. (3) Waiting for event triggeringEnter the event loop by calling the Aemain function:
void Aemain (Aeeventloop *eventloop) { eventloop->stop = 0; while (!eventloop->stop) { if (eventloop->beforesleep! = NULL) eventloop->beforesleep (eventloop); Aeprocessevents (EventLoop, ae_all_events);} }
Inside the function is a while loop, which calls the Aeprocessevents function continuously, waiting for the event to occur. Calling the Beforesleep function before each iteration will process the asynchronous task, followed by the Servercron.The Aeprocessevents function first handles the timer event, then the IO event, and the implementation of the function is described below.First, declare variables to record the number of events processed and the events that are triggered. The flags represent the event types that this round needs to handle, and if you do not need to handle timer events and IO events to return directly.
int processed = 0, numevents; /* Nothing to do? return ASAP * /if (! ( Flags & ae_time_events) &&! (Flags & ae_file_events)) return 0;
The timer events in Redis are implemented through Epoll. The general idea is that you need to specify the time of this round of sleep when calling epoll_wait for each event iteration. If no IO event occurs, it is returned after the sleep time has arrived. This value is set to sleep by calculating the next event that occurs first, to the interval of the current time, so that the handler function is guaranteed to be recalled after the event arrives. However, due to the processing of IO events after each return, the trigger event of the timer is inaccurate and must be later than the predetermined trigger time. See below for a concrete implementation.The first is to find the next timer event that occurs first to determine the event of sleep. If there is no timer event, then depending on the flags passed in, the choice is to always block the instruction IO event to occur, or not to block, and return immediately after the check. By calling the Aesearchnearesttimer function to find the first occurrence, a linear lookup is used, the complexity is O (n), and the timer events can be organized in piles to speed up lookups. However, there is only one Servercron timer event in Redis, so there is no need to optimize for the time being.
/* Note that we want call select () even if there is no * file events to process as long as we want to process Tim E * events, in order to sleep until the next time event is a ready * to fire. *///<MM>//Enter poll in two cases, blocking wait event occurs://1) when there is a need to listen for description (maxfd! =-1)//2) need to handle timer event, and dont_wait switch is off </MM> if (eventloop->maxfd! =-1 | | ((Flags & ae_time_events) &&! (Flags & ae_dont_wait))) {int J; Aetimeevent *shortest = NULL; struct Timeval TV, *TVP; <MM>//Depending on when the fastest timer event occurs, determine the time of the poll blocking//</MM> if (Flags & ae_time_events && AMP;! (Flags & ae_dont_wait)) <MM>//Linear find the fastest timer event to occur//</MM> shortest = Aesearchnearesttimer (EventLoop) ; if (shortest) {//<MM>//If there is a timer event, the time of sleep is calculated based on the time it was triggered (MS Unit)//</MM> Long Now_sec, Now_ms; /* Calculate the time missing for the nearest * timer to fire. */Aegettime (&now_sec, &now_ms); TVP = &tv; Tvp->tv_sec = shortest->when_sec-now_sec; if (Shortest->when_ms < Now_ms) {tvp->tv_usec = ((shortest->when_ms+1000)-Now_ms) *1000; Tvp->tv_sec--; } else {tvp->tv_usec = (Shortest->when_ms-now_ms) *1000; } if (Tvp->tv_sec < 0) tvp->tv_sec = 0; if (Tvp->tv_usec < 0) tvp->tv_usec = 0; } else {//<MM>//If there is no timer event, return immediately as appropriate, or block forever//</MM>/* If we There are to check for events but need to return * ASAP because of ae_dont_wait we need to set the timeout * to zero */if (Flags & ae_dont_wait) {tv.tv_sec = Tv.tv_usec = 0; TVP = &tv; } else {/* Otherwise we can block */TVP = NULL;/* Wait forever */} }
Next, call the Aeapipoll function to pass in the previously calculated sleep time and wait for the IO event to be released. After the function returns, the event that is triggered is already populated into the fired array of EventLoop. The implementation of the Epoll is called Epoll_wait, which, after the function returns, stores the triggered event to the first numevents element in the state->events array. Next, populate the fired array, set the FD for each triggering event, and the event type.
static int Aeapipoll (Aeeventloop *eventloop, struct timeval *tvp) {aeapistate *state = eventloop->apidata; int retval, numevents = 0; <MM>//Call Epoll_wait,state->events to store the returned event's FD//</MM> retval = epoll_wait (state->epfd,stat E->events,eventloop->setsize, TVP? (tvp->tv_sec*1000 + tvp->tv_usec/1000):-1); if (retval > 0) {int J; Numevents = retval; <MM>//events occur, the events that occur are stored in the fired array//</MM> for (j = 0; J < Numevents; J + +) { int mask = 0; struct Epoll_event *e = state->events+j; if (E->events & Epollin) mask |= ae_readable; if (E->events & epollout) mask |= ae_writable; if (E->events & epollerr) mask |= ae_writable; if (E->events & epollhup) mask |= ae_writable; EVENTLOOP->FIRED[J].FD = e->data.fd; Eventloop->fired[j].mask = mask; }} return numevents;}
after the event is returned, the event needs to be handled. Iterates through the fired array, obtains the corresponding event of FD, and callbacks its handler function according to the type of event triggered.
for (j = 0; J < Numevents; J + +) {//<MM>//poll returns, all triggered times are stored in the fired array </MM> aefileevent *fe = &eventLoop->events[eventLoop->fired[j].fd]; int mask = eventloop->fired[j].mask; int FD = eventloop->fired[j].fd; int rfired = 0; /* Note the Fe->mask & mask & Code:maybe A already processed * event removed an element Fired and we still didn ' t * processed, so we check if the event is still valid. *//<MM>//callback for the event of FD, registered event handler//</MM> if (Fe->mask & M Ask & ae_readable) {rfired = 1; Fe->rfileproc (Eventloop,fd,fe->clientdata,mask); } if (Fe->mask & mask & ae_writable) {if (!rfired | | Fe->wfileproc! = Fe->rfil Eproc) Fe->wfileproc (eventloop, Fd,fe->clientdata,mask); } processed++; }
The above is, the processing of IO event, the following look at the Timer event processing. The Processtimeevents function is called to handle the timer event.First, verify that the system clock skew occurs, and that changes to system events occur skew? The event is transferred to the past, and if it happens, the time at which all events occur is set to 0, immediately triggered.
/* If The system clock is moved to the future, and then sets back to the * Right value, time events may delayed in a The random. Often this * means that scheduled operations won't be performed soon enough. * Here we try to detect system clock skews, and force all the time * events to is processed ASAP when this Happe Ns:the idea was that * processing events earlier was less dangerous than delaying them * indefinitely, and practice suggests it is. * /if (now < eventloop->lasttime) { te = eventloop->timeeventhead; while (TE) { te->when_sec = 0; Te = te->next; } } Eventloop->lasttime = Now;
Next, iterate through all the timer events, look for the triggered events, and then callback the handler. The return value of the timer event handler determines whether the event is one-time or periodic. If Ae_nomore is returned, it is a one-time event that is deleted when the call is complete. Otherwise, the return value specifies the time of the next trigger.
Te = eventloop->timeeventhead; Maxid = eventloop->timeeventnextid-1; while (TE) {long now_sec, Now_ms; Long long ID; if (Te->id > Maxid) {te = te->next; Continue } aegettime (&now_sec, &now_ms); if (now_sec > Te->when_sec | | (now_sec = = te->when_sec && now_ms >= Te->when_ms)) {//<MM>//Timer Event Trigger time has past, then callback registered event handler//</MM> int retval; id = te->id; retval = Te->timeproc (EventLoop, ID, te->clientdata); processed++; /* After a event is processed our time event list may * No longer are the same, so we restart from head. * Still we make sure to don ' t process events registered * by event handlers itself in order to don ' t l OOP forever. * To doing so we saved the max ID of We want to handle. * * Future Optimizations: * Note, this is not great algorithmically. Redis uses * A single time event so it's not a problem but the right * the-a-to-do-is-to-add th e new elements on head, and "to" flag deleted elements in a special-out-of-later * deletion (putt ing references to the nodes-to-delete into * another linked list). *//<MM>//Based on the return value of the timer event handler, decide whether to delete the timer. If retval is not equal to-1 (ae_nomore), the trigger time for the change timer is//Now + retval (MS)//</MM> if (retval! = Ae_nomore) {Aeaddmillisecondstonow (Retval,&te->when_sec,&te->when_ms); } else {//<MM>//If Ae_nomore is returned, remove the timer//</MM> Aedeletetimeevent (EventLoop, id); } TE = eventloop->timeeventhead; } else {te = te->next; } }
In the callback handler, it is possible to add a new timer event, if you continue to join, there is a risk of a dead loop, so you need to avoid this situation, each cycle does not handle the newly added events, which is implemented by the following code.
if (Te->id > Maxid) { te = te->next; Continue; }
The event Loop section is analyzed to the end, feeling more intuitive, clear, completely can be extracted, as a separate library to use. The processing of the request is described in the following section.
Redis Source Analysis (2)--Event loop