Redis source code analysis 23-VM (on)

Source: Internet
Author: User
VM is a new feature of Redis2.0. Before there is no VM, redis will put all the data in the db in the memory. As redis continues to run, the memory used will increase. At the same time, the client accesses some data more frequently than other data. Redis introduces the VM function to try to solve this problem. In short, the VM makes redis have very few

VM is a new feature of Redis2.0. Before there is no VM, redis will put all the data in the db in the memory. As redis continues to run, the memory used will increase. At the same time, the client accesses some data more frequently than other data. Redis introduces the VM function to try to solve this problem. In short, the VM makes redis have very few

VM is a new feature of Redis2.0. Before there is no VM, redis will put all the data in the db in the memory. As redis continues to run, the memory used will increase. At the same time, the client accesses some data more frequently than other data. Redis introduces the VM function to try to solve this problem. In short, the VM enables redis to save infrequently accessed values to the disk. At the same time, the keys of all values are stored in the memory, so that the searched value can have similar performance before and after VM is enabled.

In redis, VM is one of the most complex modules in redis. We will introduce it in three sections. This section describes the main data structure of redis. The next section describes the non-blocking mode, and the last section describes the multi-thread mode.

Let's take a look at redis's general object structure redisObject:

// When the VM is enabled, location of The object # define REDIS_VM_MEMORY 0/* The object is on memory */# define REDIS_VM_SWAPPED 1/* The object is on disk */# define REDIS_VM_SWAPPING 2/* Redis is swapping this object on disk */# define REDIS_VM_LOADING 3/* Redis is loading this object from disk * // * The VM object structure */struct redisObjectVM {off_t page; /* the page at witch the object is stored on disk */off_t usedpages; /* Number of pages used on disk */time_t atime;/* Last access time */} vm;/* The actual Redis Object * // common type // for The key, typedef struct redisObject {void * ptr; unsigned char type; unsigned char encoding; unsigned char storage;/* If this object is a key, where is the value? * REDIS_VM_MEMORY, REDIS_VM_SWAPPED ,... */unsigned char vtype;/* If this object is a key, and value is swapped out, * this is the type of the swapped out object. */int refcount;/* VM fields, this are only allocated if VM is active, otherwise the * object allocation function will just allocate * sizeof (redisObjct) minus sizeof (redisObjectVM ), so using * Redis without VM active will not have any overhead. */struct redisObjectVM vm;} robj;

The type in robj stores the object type, such as string, list, and set. Storage stores the location of the value corresponding to the key object: memory, disk, being swapped out to disk, loading. Vtype indicates the type of the value corresponding to the key object. Page and usedpages Save the value corresponding to the key object. atime is the last access time of value. Therefore, when the storage type of the key object indicated by robj is REDIS_VM_SWAPPED, it indicates that the value of the key is no longer in memory and the value needs to be loaded from the page location in the VM, the vaue type is vtype and the size is usedpages.

When creating an object, assign the appropriate robj object size based on whether to enable the VM mechanism.

static robj *createObject(int type, void *ptr) {   ---   else {        if (server.vm_enabled) {            pthread_mutex_unlock(&server.obj_freelist_mutex);            o = zmalloc(sizeof(*o));        } else {            o = zmalloc(sizeof(*o)-sizeof(struct redisObjectVM));        }    }    ---    if (server.vm_enabled) {        /* Note that this code may run in the context of an I/O thread         * and accessing to server.unixtime in theory is an error         * (no locks). But in practice this is safe, and even if we read         * garbage Redis will not fail, as it's just a statistical info */        o->vm.atime = server.unixtime;        o->storage = REDIS_VM_MEMORY;    }    return o;}

All relevant VM structures are stored in the following fields of redisServer.

 /* Global server state structure */struct redisServer {    ---    /* Virtual memory state */    FILE *vm_fp;    int vm_fd;    off_t vm_next_page; /* Next probably empty page */    off_t vm_near_pages; /* Number of pages allocated sequentially */    unsigned char *vm_bitmap; /* Bitmap of free/used pages */    time_t unixtime;    /* Unix time sampled every second. */    /* Virtual memory I/O threads stuff */    /* An I/O thread process an element taken from the io_jobs queue and     * put the result of the operation in the io_done list. While the     * job is being processed, it's put on io_processing queue. */    list *io_newjobs; /* List of VM I/O jobs yet to be processed */    list *io_processing; /* List of VM I/O jobs being processed */    list *io_processed; /* List of VM I/O jobs already processed */    list *io_ready_clients; /* Clients ready to be unblocked. All keys loaded */    pthread_mutex_t io_mutex; /* lock to access io_jobs/io_done/io_thread_job */    pthread_mutex_t obj_freelist_mutex; /* safe redis objects creation/free */    pthread_mutex_t io_swapfile_mutex; /* So we can lseek + write */    pthread_attr_t io_threads_attr; /* attributes for threads creation */    int io_active_threads; /* Number of running I/O threads */    int vm_max_threads; /* Max number of I/O threads running at the same time */    /* Our main thread is blocked on the event loop, locking for sockets ready     * to be read or written, so when a threaded I/O operation is ready to be     * processed by the main thread, the I/O thread will use a unix pipe to     * awake the main thread. The followings are the two pipe FDs. */    int io_ready_pipe_read;    int io_ready_pipe_write;    /* Virtual memory stats */    unsigned long long vm_stats_used_pages;    unsigned long long vm_stats_swapped_objects;    unsigned long long vm_stats_swapouts;    unsigned long long vm_stats_swapins;   ---};

Vm_fp and vm_fd point to the vm file on the disk and use these two pointers to read and write vm files. Vm_bitmap manages the distribution and release of each page in the vm file (0 indicates that the page is idle, and 1 indicates that it is used ). The size of each page is configured by vm-page-size, and the number of pages is configured by vm-pages. It is worth mentioning that each page of redis can only be placed with one object, and one object can be placed on multiple consecutive pages. Unixtime is only the cache time value, which is used for calculating the latest usage frequency of value. The subsequent structure is related to the multi-threaded switch-out/switch-In vlue. When multithreading is used, changing the value in/out is considered as one job. The types of jobs are as follows:

/* VM threaded I/O request message */#define REDIS_IOJOB_LOAD 0          /* Load from disk to memory */#define REDIS_IOJOB_PREPARE_SWAP 1  /* Compute needed pages */#define REDIS_IOJOB_DO_SWAP 2       /* Swap from memory to disk */typedef struct iojob {    int type;   /* Request type, REDIS_IOJOB_* */    redisDb *db;/* Redis database */    robj *key;  /* This I/O request is about swapping this key */    robj *val;  /* the value to swap for REDIS_IOREQ_*_SWAP, otherwise this                 * field is populated by the I/O thread for REDIS_IOREQ_LOAD. */    off_t page; /* Swap page where to read/write the object */    off_t pages; /* Swap pages needed to save object. PREPARE_SWAP return val */    int canceled; /* True if this command was canceled by blocking side of VM */    pthread_t thread; /* ID of the thread processing this entry */} iojob;

A job of the REDIS_IOJOB_LOAD type is used to load a value, and a job of the REDIS_IOJOB_DO_SWAP type is used to replace a value. Before switching a value, a job of the REDIS_IOJOB_PREPARE_SWAP type is created to calculate the number.

No matter which of the above three types, the newly created job will use queueIOJob in the io_newjobs queue, and the thread entry function IOThreadEntryPoint will move the job in io_newjobs to the server. io_processing, and then after the job type is completed (load value/The number of pages to be exchanged for calculating the value/for the value), the job from the server. io_processing is moved into io_processed. Write a byte to the pipe in which server. io_ready_pipe_write is located (io_ready_pipe_read and io_ready_pipe_write constitute the two ends of the pipeline), so that the sleeping vmThreadedIOCompletedJob can continue to run.

Io_ready_clients saves the client linked list that can continue running (because the waiting value is blocked before). The subsequent structures are related to multi-threaded protection and global vm statistics.

The main task of VM initialization in vmInit is to initialize several structures described above. In addition, the most important task is to set the MPs queue's read event processing function vmThreadedIOCompletedJob, which runs when the MPs queue is readable and is closely related to the multi-thread operation.

static void vmInit(void) {    off_t totsize;    int pipefds[2];    size_t stacksize;    struct flock fl;    if (server.vm_max_threads != 0)        zmalloc_enable_thread_safeness(); /* we need thread safe zmalloc() */    redisLog(REDIS_NOTICE,"Using '%s' as swap file",server.vm_swap_file);    /* Try to open the old swap file, otherwise create it */    if ((server.vm_fp = fopen(server.vm_swap_file,"r+b")) == NULL) {        server.vm_fp = fopen(server.vm_swap_file,"w+b");    }    if (server.vm_fp == NULL) {        redisLog(REDIS_WARNING,            "Can't open the swap file: %s. Exiting.",            strerror(errno));        exit(1);    }    server.vm_fd = fileno(server.vm_fp);    /* Lock the swap file for writing, this is useful in order to avoid     * another instance to use the same swap file for a config error. */    fl.l_type = F_WRLCK;    fl.l_whence = SEEK_SET;    fl.l_start = fl.l_len = 0;    if (fcntl(server.vm_fd,F_SETLK,&fl) == -1) {        redisLog(REDIS_WARNING,            "Can't lock the swap file at '%s': %s. Make sure it is not used by another Redis instance.", server.vm_swap_file, strerror(errno));        exit(1);    }    /* Initialize */    server.vm_next_page = 0;    server.vm_near_pages = 0;    server.vm_stats_used_pages = 0;    server.vm_stats_swapped_objects = 0;    server.vm_stats_swapouts = 0;    server.vm_stats_swapins = 0;    totsize = server.vm_pages*server.vm_page_size;    redisLog(REDIS_NOTICE,"Allocating %lld bytes of swap file",totsize);    if (ftruncate(server.vm_fd,totsize) == -1) {        redisLog(REDIS_WARNING,"Can't ftruncate swap file: %s. Exiting.",            strerror(errno));        exit(1);    } else {        redisLog(REDIS_NOTICE,"Swap file allocated with success");    }    server.vm_bitmap = zmalloc((server.vm_pages+7)/8);    redisLog(REDIS_VERBOSE,"Allocated %lld bytes page table for %lld pages",        (long long) (server.vm_pages+7)/8, server.vm_pages);    memset(server.vm_bitmap,0,(server.vm_pages+7)/8);    /* Initialize threaded I/O (used by Virtual Memory) */    server.io_newjobs = listCreate();    server.io_processing = listCreate();    server.io_processed = listCreate();    server.io_ready_clients = listCreate();    pthread_mutex_init(&server.io_mutex,NULL);    pthread_mutex_init(&server.obj_freelist_mutex,NULL);    pthread_mutex_init(&server.io_swapfile_mutex,NULL);    server.io_active_threads = 0;    if (pipe(pipefds) == -1) {        redisLog(REDIS_WARNING,"Unable to intialized VM: pipe(2): %s. Exiting."            ,strerror(errno));        exit(1);    }    server.io_ready_pipe_read = pipefds[0];    server.io_ready_pipe_write = pipefds[1];    redisAssert(anetNonBlock(NULL,server.io_ready_pipe_read) != ANET_ERR);    /* LZF requires a lot of stack */    pthread_attr_init(&server.io_threads_attr);    pthread_attr_getstacksize(&server.io_threads_attr, &stacksize);    /* Solaris may report a stacksize of 0, let's set it to 1 otherwise 115     * multiplying it by 2 in the while loop later will not really help   */    if (!stacksize) stacksize = 1;    while (stacksize < REDIS_THREAD_STACK_SIZE) stacksize *= 2;    pthread_attr_setstacksize(&server.io_threads_attr, stacksize);    /* Listen for events in the threaded I/O pipe */    if (aeCreateFileEvent(server.el, server.io_ready_pipe_read, AE_READABLE,        vmThreadedIOCompletedJob, NULL) == AE_ERR)        oom("creating file event");}

Original article address: redis source code analysis 23-VM (I), thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.