memcached結構分析（一）—

memcached結構分析（一）——執行緒模式

最後更新：2018-12-04 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

題記：

最近在閱讀memcached的原始碼，打算將自己學習所得成文留念，更因為是第一次正式接觸memcached，水平有限，希望大家多多交流。此系列文章按自己的理解將memcached分成幾個模組分別分析。這裡以memcached-1.4.6為例。

一，libevent簡介

memcached中的網路資料轉送與處理完全依賴libevent。我會在另一篇文章介紹libevent。這裡簡單介紹libevent的用法。首先介紹相關定義。

1）檔案描述符（file descriptor）狀態為可讀或可寫（readable/writable），是指使用者線程在對此狀態的檔案描述符進行IO操作時，read/write系統調用會馬上從核心buff讀取或向核心buff寫入資料並返回，而不會因為無可讀資料，或無可寫入空間而阻塞，直到描述符滿足IO條件（IO conditions are ready），即變為可讀或可寫。

2）IO事件（event）是指檔案描述符狀態從不可讀到可讀，或從不可寫到可寫的一次狀態變化。由此可知，一個IO事件一定與一個檔案描述符關聯，而且分為可讀事件或可寫事件等不同事件類型。

使用者線程使用libevent則通常按以下步驟：

1）使用者線程通過event_init()函數建立一個event_base對象。event_base對象管理所有註冊到自己內部的IO事件。多線程環境下，event_base對象不能被多個線程共用，即一個event_base對象只能對應一個線程。
2）然後該線程通過event_add函數，將與自己感興趣的檔案描述符相關的IO事件，註冊到event_base對象，同時指定事件發生時所要調用的事件處理函數（event handler）。伺服器程式通常監聽通訊端（socket）的可讀事件。比如，伺服器線程註冊通訊端sock1的EV_READ事件，並指定event_handler1()為該事件的回呼函數。libevent將IO事件封裝成struct event類型對象，事件類型用EV_READ/EV_WRITE等常量標誌。
3）註冊完事件之後，線程調用event_base_loop進入迴圈監聽（monitor）狀態。該迴圈內部會調用epoll等IO複用函數進入阻塞狀態，直到描述符上發生自己感興趣的事件。此時，線程會調用事先指定的回呼函數處理該事件。例如，當通訊端sock1發生可讀事件，即sock1的核心buff中已有可讀資料時，被阻塞的線程立即返回（wake up）並調用event_handler1()函數來處理該次事件。
4）處理完這次監聽獲得的事件後，線程再次進入阻塞狀態並監聽，直到下次事件發生。

二，memcached執行緒模式
1，多線程的初始化與啟動。
memcached是一個典型的單進程多線程伺服器。memcached啟動後，main thread線程會初始化各個模組，如調用slabs_init()函數初始化記憶體管理模組，當然也包括建立多個worker thread線程以及初始化相關資料，最後調用event_base_loop()進入監聽迴圈。
本節介紹各個線程以及相關資料的建立以及初始化工作。描述具體代碼前，先介紹主要資料結構。memcached將原始線程id（pthread_t）封裝成LIBEVENT_THREAD對象，該對象與線程一一對應，此對象定義如下：

/*  * File: memcached.h  */typedef struct {    pthread_t thread_id; /* 線程id */    struct event_base *base; /* 該event_base對象管理該線程所有的IO事件 */    struct event notify_event; /* 此事件對象與下面的notify_receive_fd描述符關聯 */    int notify_receive_fd; /* 與main thread通訊的管道(pipe)的接收端描述符 */    int notify_send_fd; /* 與main thread通訊的管道的發送端描述符 */    struct thread_stats stats; /* Stats generated by this thread */    struct conn_queue *new_conn_queue; /* 此隊列是被鎖保護的同步對象，主要用來在main thread線程與該worker thread線程之間傳遞初始化conn對象所需資料 */    cache_t *suffix_cache; /* suffix cache */} LIBEVENT_THREAD;/* * File: thread.c  * 與所有worker thread線程對應的線程對象數組  */static LIBEVENT_THREAD *threads;

我們重點關注LIBEVENT_THREAD定義中，添加了中文注釋的欄位。具體功能見對應注釋。
main thread線程建立以及初始化worker thread的操作主要通過thread_init()和setup_thread()函數來完成。thread_init()主要代碼如下：

/* * File: thread.c * thread_init() */// 1) 此for迴圈初始化worker thread線程對象數組。for (i = 0; i < nthreads; i++) {// 1.1) 建立與main thread線程通訊的管道，並初始化notify_*_fd描述符。    int fds[2];    if (pipe(fds)) {        perror("Can't create notify pipe");        exit(1);    }    threads[i].notify_receive_fd = fds[0];    threads[i].notify_send_fd = fds[1];// 1.2) 主要用來註冊與threads[i]線程的notify_event_fd描述符相關的IO事件。    setup_thread(&threads[i]);}// 2) 此for迴圈啟動worker thread線程。worker_libevent()函數內部主要調用event_base_loop()函數，即迴圈監聽該線程註冊的IO事件。/* Create threads after we've done all the libevent setup. */for (i = 0; i < nthreads; i++) {    create_worker(worker_libevent, &threads[i]);}// 3) 等待所有子線程，即worker thread線程啟動後，此函數才返回。/* Wait for all the threads to set themselves up before returning. */pthread_mutex_lock(&init_lock);while (init_count < nthreads) {    pthread_cond_wait(&init_cond, &init_lock);}pthread_mutex_unlock(&init_lock);

thread_init()函數的重點是通過setup_thread()函數為每個worker thread線程註冊與notify_event_fd描述符有關的IO事件，這裡的notify_event_fd描述符是該worker thread線程與main thread線程通訊的管道的接收端描述符。通過註冊與該描述符有關的IO事件，worker thread線程就能監聽main thread線程發給自己的資料（事件）。setup_thread()函數主要代碼如下：

/* * File: thread.c * setup_thread() */// 1.2.1) 初始化線程對象中notify_event事件對象，並將其註冊到event_base對象。/* Listen for notifications from other threads */event_set(&me->notify_event, me->notify_receive_fd,          EV_READ | EV_PERSIST, thread_libevent_process, me);event_base_set(me->base, &me->notify_event);if (event_add(&me->notify_event, 0) == -1) {    fprintf(stderr, "Can't monitor libevent notify pipe\n");    exit(1);}// 1.2.2） 建立與初始化new_conn_queue隊列。me->new_conn_queue = malloc(sizeof(struct conn_queue));if (me->new_conn_queue == NULL) {    perror("Failed to allocate memory for connection queue");    exit(EXIT_FAILURE);}cq_init(me->new_conn_queue);

由1.2.1)處程式碼片段知，該worker thread線程將監聽notify_event_fd描述符上的可讀事件，即監聽與main thread線程t通訊的管道上的可讀事件，並指定用thread_libevent_process()函數處理該事件。
在3)處的程式碼片段執行完畢後，各個worker thread線程就已經完成初始化並啟動，而且各個worker thread線程開始監聽並等待處理與notify_receive_fd描述符有關的IO事件。

在worker thread線程啟動後，main thread線程就要建立監聽通訊端（listening socket）來等待用戶端串連請求。這裡的監聽（listen）用戶端串連請求與libevent中的監聽（monitor）IO事件有一定區別。在memcached中，通訊端跟線程id一樣，都被進一步封裝。通訊端被封裝成conn對象，表示與用戶端的串連（connection），該結構體定義很大，現選擇與主題相關的幾個欄位，定義如下：

/*  * File: memcache.h  */  typedef struct conn conn;  struct conn {      int    sfd;    // 原始通訊端      sasl_conn_t *sasl_conn;      enum conn_states  state;    // 此串連的態變變數，用於標記此串連在運行過程中的各個狀態。此欄位很重要。取值範圍由conn_states枚舉定義。          enum bin_substates substate;  // 與state欄位類似    struct event event;    // 此事件對象與該通訊端，即sfd欄位關聯。      short  ev_flags; // 與上一欄位有關，指定監聽的事件類型，如EV_READ。      short  which;   /** which events were just triggered */  // 以下欄位略  }

下面是main thread線程建立listening socket的地方：

/* * File: memcached.c * server_socket() */// 4) main thread線程在這裡建立並初始化listening socket，包括註冊與該conn對象相關的IO事件。注意conn_listening參數，它指定了該conn對象的初始化狀態。if (!(listen_conn_add = conn_new(sfd, conn_listening,                                             EV_READ | EV_PERSIST, 1,                                             transport, main_base))) {    fprintf(stderr, "failed to create listening connection\n");    exit(EXIT_FAILURE);}listen_conn_add->next = listen_conn;listen_conn = listen_conn_add;

conn_new()是memcached中一個重要的函數，此函數負責將原始通訊端封裝成為一個conn對象，同時會註冊與該conn對象相關的IO事件，並指定該串連（conn）的初始狀態。這裡要注意的是listening socket的conn對象被初始化為conn_listening狀態，這個細節會在後面用到。conn_new()函數的部分代碼如下：

/* * File: memcached.c * conn_new() */// 4.1) 初始化conn對象的相關欄位。注意state欄位。c->sfd = sfd;c->state = init_state;// 中間初始化步驟略// 4.2) 註冊與該串連有關的IO事件event_set(&c->event, sfd, event_flags, event_handler, (void *)c);event_base_set(base, &c->event);c->ev_flags = event_flags;if (event_add(&c->event, 0) == -1) {    if (conn_add_to_freelist(c)) {       conn_free(c);    }    perror("event_add");    return NULL;}

再次提醒，連線物件的state欄位是一個很重要的變數，它標誌了該conn對象在運行過程中的各個狀態，該欄位的取值範圍由conn_states枚舉定義。由4處程式碼片段，傳遞給conn_new()函數的conn_listening常量知，main thread線程建立了一個初始狀態為conn_listening的串連。這裡可以提前透露下，worker thread線程在接受main thread線程的指派後（下一節會介紹），會建立初始狀態為conn_new_cmd的conn對象。
大家應該熟悉了如何註冊IO事件，就不贅述了。這裡要提醒的是，你會發現memcached中所有conn對象相關的處理函數都是event_handler()函數，它在內部將主要的事件處理部分交給drive_machine()函數。這個函數就全權負責處理與客戶串連相關的事件。主線程在完成初始化後，會通過event_base_loop()進入監聽迴圈，此時主線程開始等待listening socket上的串連請求。

2，用戶端串連的建立與指派

上一節介紹的啟動步驟完成之後，memcached的主線程開始監聽listening socket上的可讀事件，即等待用戶端串連請求，而worker thread監聽各自notify_receive_fd描述符上的可讀事件，即等待來自main thread線程的資料。現在，我們來看當用戶端向memcached伺服器發來串連請求，memcached會如何處理。參考上一節關於建立listening socket的部分內容，我們知道，當用戶端發來串連請求，main thread線程會因listening
socket發生可讀事件而返回（wake up），並調用event_handler()函數來處理該請求，此函數會調用drive_machie()函數，其中處理用戶端串連請求的部分如下：

/* * File: memcached.c * drive_machine() */switch(c->state) {        case conn_listening:// 5) 以下數行建立與用戶端的串連，得到sfd通訊端。            addrlen = sizeof(addr);            if ((sfd = accept(c->sfd, (struct sockaddr *)&addr, &addrlen)) == -1) {                if (errno == EAGAIN || errno == EWOULDBLOCK) {                    /* these are transient, so don't log anything */                    stop = true;                } else if (errno == EMFILE) {                    if (settings.verbose > 0)                        fprintf(stderr, "Too many open connections\n");                    accept_new_conns(false);                    stop = true;                } else {                    perror("accept()");                    stop = true;                }                break;            }            if ((flags = fcntl(sfd, F_GETFL, 0)) < 0 ||                fcntl(sfd, F_SETFL, flags | O_NONBLOCK) < 0) {                perror("setting O_NONBLOCK");                close(sfd);                break;            }// 6) 此函數將main thread線程建立的原始通訊端以及一些初始化資料，傳遞給某個指定的worker thread線程。            dispatch_conn_new(sfd, conn_new_cmd, EV_READ | EV_PERSIST,                                     DATA_BUFFER_SIZE, tcp_transport);            stop = true;            break;

這裡就是conn對象的state欄位發揮作用的地方了：），drive_machine()函數是一個巨大的switch語句，它根據conn對象的目前狀態，即state欄位的值選擇執行不同的分支，因為listening socket的conn對象被初始化為conn_listening狀態，所以drive_machine()函數會執行switch語句中case conn_listenning的分支，即建立並指派用戶端串連部分。見5)處程式碼片段。

在這裡，main thread線程利用dispatch_conn_new()函數，來將用戶端串連通訊端（這裡還只是原始通訊端）以及其它相關初始化資料，傳遞給某個worker thread線程。這裡就要用到上一節提到的，main thread線程與worker thread線程之間的管道（pipe），還有線程對象中的new_conn_queue隊列。代碼如下：

void dispatch_conn_new(int sfd, enum conn_states init_state, int event_flags,                       int read_buffer_size, enum network_transport transport) {// 6.1) 建立一個CQ_ITEM對象，並通過一個簡單的取餘機制選擇將該CQ_ITEM對象傳遞給哪個worker thread。    CQ_ITEM *item = cqi_new();    int tid = (last_thread + 1) % settings.num_threads;    LIBEVENT_THREAD *thread = threads + tid;    last_thread = tid;// 6.2) 初始化建立的CQ_ITEM對象    item->sfd = sfd;    item->init_state = init_state;    item->event_flags = event_flags;    item->read_buffer_size = read_buffer_size;    item->transport = transport;// 6.3) 將CQ_ITEM對象推入new_conn_queue隊列。    cq_push(thread->new_conn_queue, item);// 6.4) 向與worker thread線程串連的管道寫入一位元組的資料。    MEMCACHED_CONN_DISPATCH(sfd, thread->thread_id);    if (write(thread->notify_send_fd, "", 1) != 1) {        perror("Writing to thread notify pipe");    }}

此函數主要建立並初始化了一個CQ_ITEM對象，該對象包含許多建立conn對象所需用的初始化資料，如原始通訊端（sfd），初始化狀態（init_state）等，然後該函數將該CQ_ITEM對象傳遞給某個被選定的worker thread線程。在上一節介紹LIBEVENT_THREAD線程對象時說過，new_conn_queue隊列用來在兩個線程之間傳遞資料，這裡就被用來向worker thread線程傳遞一個CQ_ITEM對象。除此之外，還要注意main thread線程向與worker
thread線程串連的管道寫入了一個位元組的資料。此舉意在觸發管道另一端，即notify_receive_fd描述符的可讀事件。現在我們看管道另一端的worker thread線程會發生什麼。
我們知道memcached啟動後，worker thread線程會監聽notify_receive_fd描述符上的可讀事件。因為main thread線程向管道寫入了一個位元組的資料，worker thread線程會因notify_receive_fd描述符上發生可讀事件而返回，並調用事先註冊時指定的thread_libevent_process()函數來處理該事件，該函數主要代碼如下：

/* * File: thread.c * thread_libevent_process() */// 7) 從管道中讀出一個位元組資料，此位元組即main thread線程先前向notify_send_fd描述符寫入的位元組。if (read(fd, buf, 1) != 1)        if (settings.verbose > 0)            fprintf(stderr, "Can't read from libevent pipe\n");// 8) 從new_conn_queue隊列中彈出一個CQ_ITEM對象，此對象即先前main thread線程推入new_conn_queue隊列的對象。    item = cq_pop(me->new_conn_queue);// 9) 根據這個CQ_ITEM對象，建立並初始化conn對象，該對象負責用戶端與該worker thread線程之間的通訊。    if (NULL != item) {        conn *c = conn_new(item->sfd, item->init_state, item->event_flags,                           item->read_buffer_size, item->transport, me->base);// 以下略

注意，在7)處程式碼片段，從管道讀出的一個位元組資料就是main thread線程在2.4處寫入的資料。顯然，該資料本身沒有意義，它的目的只是觸發worker thread線程這邊notify_receive_fd描述符的可讀事件。然後根據取得的CQ_ITEM對象建立並初始化conn對象。這裡要注意的是，在6)處程式碼片段，main thread線程將該CQ_ITEM對象的init_state欄位初始化為conn_new_cmd，那麼worker thread線程建立的conn對象的state欄位將被初始化為conn_new_cmd。

到這裡，就完成了從用戶端發送串連請求，到main thread線程建立原始通訊端，再到將原始通訊端等初始化資料指派到各個worker thread線程，到最後worker thread線程建立conn對象，開始負責與用戶端之間通訊的整個流程。worker thread就從這裡開始監聽該用戶端串連的可讀事件，並準備用event_handler()函數處理從用戶端發來的資料。

參考：1）http://bachmozart.iteye.com/blog/344172

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

memcached結構分析（一）——執行緒模式

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support