Read memcached source code ---- use libevent

Source: Internet
Author: User

Read memcached source code ---- use libevent

This article is mainly about how to read the memcached source code today about process initiation and what has been done on the network.


I. Use of iblievent

First, we know that memcached uses iblievet as the network framework, while iblievet is a single-threaded asynchronous model based on epoll events in linux. Therefore, the basic idea is to bind a function to a readable, writable, time-out, error, or other event, and call back the binding function when an event occurs.

Let's take a look at the basic libevent api calls.

Struct event_base * base; base = event_base_new (); // initialize libevent

Compared with epoll, event_base_new can be understood as epoll_create in epoll.

Event_base has a loop, which is blocked by epoll calls. when an event occurs, it is processed. The event is bound to event_base, and each event corresponds to a struct event, which can be the listening fd.

Here, struct event uses event_new to create and bind, and event_add to enable it. For example:

struct event *listener_event;listener_event = event_new(base, listener, EV_READ|EV_PERSIST, do_accept, (void*)base);


Parameter description:

Base: event_base type, return value of event_base_new

Listener: listener fd, listen fd

EV_READ | EV_PERSIST: Event Type and attribute

Do_accept: bound callback function

(Void *) base: parameters for the callback function

Event_add (listener_event, NULL );

Compare epoll:

Event_new is equivalent to epoll_wait in epoll. The while loop in epoll uses event_base_dispatch in libevent.

Event_add is equivalent to epoll_ctl in epoll. The parameter is EPOLL_CTL_ADD, and the event is added.

Note: libevent supports the following events and attributes: (bitfield is used, so | is used to make them fit)
EV_TIMEOUT: timeout
EV_READ: the callback function is triggered as long as there is data in the network buffer.
EV_WRITE: the callback function is triggered when only the data buffered by the bastion host is written.
EV_SIGNAL: POSIX semaphore
EV_PERSIST: If this attribute is not specified, the event will be deleted after the callback function is triggered.
EV_ET: Edge-Trigger Edge Trigger, equivalent to epoll et Mode

After the event is created and added, You can process the event, which is equivalent to epoll_wait in epoll. Use event_base_dispatch in libevent to start the event_base loop until there are no more events to be concerned.


With the above analysis, combined with the previous epoll server program, for a server program, the process is basically like this:

1. Create socket, bind, and listen and set it to non-blocking mode.

2. Create an event_base, that is

[Cpp]View plaincopy
  1. Struct event_base * event_base_new (void)

    3. Create an event, host the socket to event_base, specify the event type to listen to, and bind the corresponding callback function (and parameters to be provided ). That is

    [Cpp]View plaincopy
    1. Struct event * event_new (struct event_base * base, evutil_socket_t fd, short events, void (* cb) (evutil_socket_t, short, void *), void * arg)

      4. Enable this event, that is

      [Cpp]View plaincopy
      1. Int event_add (struct event * ev, const struct timeval * TV)

        5. Enter the event loop, that is

        [Cpp]View plaincopy
        1. Int event_base_dispatch (struct event_base * event_base)

          With the above basic things, you can read the memcached.
          Ii. memcached source code analysis
          When the main function is started, a lot of data will be initialized first. Here we only involve the big network, and the others will be analyzed later. Ignore them first.1. initialize the iblievet object of the main working thread.
            /* initialize main thread libevent instance */    main_base = event_init();

          Will call
             /* enter the event loop */    if (event_base_loop(main_base, 0) != 0) {        retval = EXIT_FAILURE;    }

          Loops within the object. Do not exit.
          2. initialize the connection object
          static void conn_init(void) {    freetotal = 200;    freecurr = 0;    if ((freeconns = calloc(freetotal, sizeof(conn *))) == NULL) {        fprintf(stderr, "Failed to allocate connection structures\n");    }    return;}


          In this example, 200 conn * memory is allocated in advance. If there is a connection, it will be retrieved from freeconns. The following code:
          /* * Returns a connection from the freelist, if any. */conn *conn_from_freelist() {    conn *c;    pthread_mutex_lock(&conn_lock);    if (freecurr > 0) {        c = freeconns[--freecurr];    } else {        c = NULL;    }    pthread_mutex_unlock(&conn_lock);    return c;}


          3. What does the conn structure look like?
          Typedef struct conn; struct conn {int sfd; sasl_conn_t * sasl_conn; enum conn_states state; enum bin_substates substate; struct event; short ev_flags; short which; /** which events were just triggered */char * rbuf;/** buffer to read commands into */char * rcurr;/** but if we parsed some already, this is where we stopped */int rsize;/** total allocated size of rbuf */int rbytes;/** how much data, starting from rcur, do we have unparsed */char * wbuf; char * wcurr; int wsize; int wbytes;/** which state to go into after finishing current write */enum conn_states write_and_go; void * write_and_free;/** free this memory after finishing writing */char * ritem;/** when we read in an item's value, it goes here */int rlbytes; /* data for the nread state * // *** item is used to hold an item structure created after reading the command * line of set/add/replace commands, but before we finished reading the actual * data. the data is read into ITEM_data (item) to avoid extra copying. */void * item;/* for commands set/add/replace * // * data for the swallow state */int sbytes; /* how many bytes to swallow * // * data for the mwrite state */struct iovec * iov; int iovsize; /* number of elements allocated in iov [] */int iovused;/* number of elements used in iov [] */struct msghdr * msglist; int msgsize; /* number of elements allocated in msglist [] */int msgused;/* number of elements used in msglist [] */int msgcurr; /* element in msglist [] being transmitted now */int msgbytes;/* number of bytes in current msg */item ** ilist; /* list of items to write out */int isize; item ** icurr; int ileft; char ** suffixlist; int suffixsize; char ** suffixcurr; int suffixleft; enum protocol;/* which protocol this con
            if (sigignore(SIGPIPE) == -1) {        perror("failed to ignore SIGPIPE; sigaction");        exit(EX_OSERR);    }

          Nection speaks */enum network_transport transport;/* what transport is used by this connection * // * data for UDP clients */int request_id;/* Incoming UDP request ID, if this is a UDP "connection" */struct sockaddr request_addr;/* Who sent the most recent request */socklen_t request_addr_size; unsigned char * hdrbuf; /* udp packet headers */int hdrsize;/* number of headers 'worth of space is allocated */bool noreply;/* True if the reply shocould not be sent. * // * current stats command */struct {char * buffer; size_t size; size_t offset;} stats; /* Binary protocol stuff * // * This is where the binary header goes */protocol_binary_request_header binary_header; uint64_t cas;/* the cas to return */short cmd; /* current command being processed */int opaque; int keylen; conn * next;/* Used for generating a list of conn structures */LIBEVENT_THREAD * thread; /* Pointer to the thread object serving this connection */};

          All fields here are required for data processing. It is not described in detail here. It will be broken down in the future.

          Because memcached is a multi-threaded model, it is necessary to unlock an object from freeconn.
          Ignore SIGIPIE signal to prevent rst program exiting
            if (sigignore(SIGPIPE) == -1) {        perror("failed to ignore SIGPIPE; sigaction");        exit(EX_OSERR);    }

          The multi-threaded model is initialized, and an iblievent event model for each thread is to call the event_init function.
          /* start up worker threads if MT mode */    thread_init(settings.num_threads, main_base);

          The internal implementation is not detailed. It mainly calls the pthread_create function.
          4. Start the network listening event through the port number.
          The Code is as follows:
             if (settings.port && server_sockets(settings.port, tcp_transport,                                           portnumber_file)) {            vperror("failed to listen on TCP port %d", settings.port);            exit(EX_OSERR);        }

          Then, call the following function:
          static int server_socket(const char *interface,                         int port,                         enum network_transport transport,                         FILE *portnumber_file)

          Because a host may have multiple NICs, such as dual-line data centers, China Unicom, or China Telecom, the following code appears for internal implementation:

           for (next= ai; next; next= next->ai_next) {        conn *listen_conn_add;        if ((sfd = new_socket(next)) == -1) {            /* getaddrinfo can return "junk" addresses,             * we make sure at least one works before erroring.             */            if (errno == EMFILE) {                /* ...unless we're out of fds */                perror("server_socket");                exit(EX_OSERR);            }            continue;        }

          While
          static int new_socket(struct addrinfo *ai)

          This function is used to call the socket function and is set to non-blocking.

          5. Generate a conn object for the listener.
          The Code is as follows:
           if (!(listen_conn_add = conn_new(sfd, conn_listening,                                             EV_READ | EV_PERSIST, 1,                                             transport, main_base))) {                fprintf(stderr, "failed to create listening connection\n");                exit(EXIT_FAILURE);            }            listen_conn_add->next = listen_conn;            listen_conn = listen_conn_add;

          static conn *listen_conn = NULL;
          As a global static variable. Single-chain table with no Headers
          We continue to go deep into the conn_new function.
          conn *conn_new(const int sfd, enum conn_states init_state,                const int event_flags,                const int read_buffer_size, enum network_transport transport,                struct event_base *base) {    conn *c = conn_from_freelist();


          What are the main actions of this function?
          First, retrieve a conn * From free_cnn_list, and allocate memory to each other. initialize related fields based on the configuration information.

          Second, add it to the iblievent event library.
          event_set(&c->event, sfd, event_flags, event_handler, (void *)c);    event_base_set(base, &c->event);    c->ev_flags = event_flags;    if (event_add(&c->event, 0) == -1) {        if (conn_add_to_freelist(c)) {            conn_free(c);        }        perror("event_add");        return NULL;    }

          In this step, the event binding event_handler function on sfd is to bind and callback when data is readable when the connection comes up.
          7. Explanation of the state machine
          The event_handler function will call
          static void drive_machine(conn *c)
          Function. So what does this function do?
          Of course it is waiting for the connection, that is, the accept function. Therefore, the stock market conn_listening status,
            while (!stop) {        switch(c->state) {        case conn_listening:            addrlen = sizeof(addr);            if ((sfd = accept(c->sfd, (struct sockaddr *)&addr, &addrlen)) == -1)


          Of course, sfd is also set to non-blocking.
          At this time, data is coming up.
          Therefore, you must set the READ command status and call the following functions:
          /* * Dispatches a new connection to another thread. This is only ever called * from the main thread, either during initialization (for UDP) or because * of an incoming connection. */void dispatch_conn_new(int sfd, enum conn_states init_state, int event_flags,                       int read_buffer_size, enum network_transport transport) {    CQ_ITEM *item = cqi_new();    char buf[1];    int tid = (last_thread + 1) % settings.num_threads;    LIBEVENT_THREAD *thread = threads + tid;    last_thread = tid;    item->sfd = sfd;    item->init_state = init_state;    item->event_flags = event_flags;    item->read_buffer_size = read_buffer_size;    item->transport = transport;    cq_push(thread->new_conn_queue, item);    MEMCACHED_CONN_DISPATCH(sfd, thread->thread_id);    buf[0] = 'c';    if (write(thread->notify_send_fd, buf, 1) != 1) {        perror("Writing to thread notify pipe");    }}



          Through annotations, we can know that this function is about allocating other threads to a new connection,
          From the code, we can see that first, allocate an item block, assign the fd of the connected socket to the item, and assign the current status, flag bit, read buff size, and so on, and then allocate a thread, this section describes how to push an item to the processing queue of the thread.
          Then, the operator event is handled by writing a C character to the pipeline to notify the other end of the pipeline. Therefore, the connection is allocated.
          Then let's take a look at the thread if it is processed.
          When initializing the thread, the two operators of the pipeline have been put into iblievent. The following code:
            /* Listen for notifications from other threads */    event_set(&me->notify_event, me->notify_receive_fd,              EV_READ | EV_PERSIST, thread_libevent_process, me);    event_base_set(me->base, &me->notify_event);    if (event_add(&me->notify_event, 0) == -1) {        fprintf(stderr, "Can't monitor libevent notify pipe\n");        exit(1);    }



          Callback Function bound:
          static void thread_libevent_process(int fd, short which, void *arg) 


          When the character 'C' is read, an item * is retrieved from the queue, and the following function is used.

          conn *conn_new(const int sfd, enum conn_states init_state,                const int event_flags,                const int read_buffer_size, enum network_transport transport,                struct event_base *base) 

          Similarly
           conn *c = conn_from_freelist();

          Take a conn * and initialize it. This is the same as the previous one. The state of knowledge is different. Therefore, a state machine mode is used here.
          The following statuses are available:
          enum conn_states {    conn_listening,  /**< the socket which listens for connections */    conn_new_cmd,    /**< Prepare connection for next command */    conn_waiting,    /**< waiting for a readable socket */    conn_read,       /**< reading in a command line */    conn_parse_cmd,  /**< try to parse a command from the input buffer */    conn_write,      /**< writing out a simple response */    conn_nread,      /**< reading in a fixed number of bytes */    conn_swallow,    /**< swallowing unnecessary bytes w/o storing */    conn_closing,    /**< closing this connection */    conn_mwrite,     /**< writing out many items sequentially */    conn_max_state   /**< Max state value (used for assertion) */};

          That is
          static void drive_machine(conn *c)
          Core logic. Set the status and call different codes,
          Therefore, after a State ends, the following code is always used:
          /* * Sets a connection's current state in the state machine. Any special * processing that needs to happen on certain state transitions can * happen here. */static void conn_set_state(conn *c, enum conn_states state) {    assert(c != NULL);    assert(state >= conn_listening && state < conn_max_state);    if (state != c->state) {        if (settings.verbose > 2) {            fprintf(stderr, "%d: going from %s to %s\n",                    c->sfd, state_text(c->state),                    state_text(state));        }        if (state == conn_write || state == conn_mwrite) {            MEMCACHED_PROCESS_COMMAND_END(c->sfd, c->wbuf, c->wbytes);        }        c->state = state;    }}



          At this point, the network framework has been basically processed. The starting frame is very simple and practical. Redis is also a basic idea model, but a single thread, and memcached is a multi-thread model. It can be used for reference in the development mode.

          This article is original article, more articles, welcome to http://blog.csdn.net/wallwind










Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.