memcached Source Analysis-----Item lock level with item reference count

Source: Internet
Author: User
Tags assert event listener



Reprint Please specify source: http://blog.csdn.net/luotuo44/article/details/42913549



Lock level:

As you know from the previous Extension hash table: When the hash table is extended, there is a dedicated thread that is responsible for migrating the item data from the old hash table to the new hash table (this thread is also called the migration thread). In addition, there are worker line routines that occasionally access item (including insertions, deletions, and fetches). The operations of these threads on item are essentially mutually exclusive and must be lock-controlled.

If you use only one lock, you can grab the lock to use the hash table, and you can't use it without grabbing it. In this way, the efficiency of memcached will become quite low. To do this, memcached uses different levels of locks like a database. Memcached defines two levels of lock, segment level, and global level. In peacetime (when no hash table is extended), lock at the segment level is used. When you extend a Hashtable, you use a global-level lock.

What level is the segment level? The hash table is divided by a few buckets in a few buckets, and one segment should have multiple buckets. So the entire hash table has multiple segment-level locks. Because the number of segment-level locks has been determined at the beginning of the program, it will no longer change. As the hash table expands, the number of buckets is increased. So as the hash table expands, more and more buckets correspond to a segment, which means that more and more buckets correspond to a lock.


When the hash table expands, both the migration thread and the workers thread use global locks. These threads compete for global locks, and the lock is not allowed to manipulate the hash table item. During non-scaling, the migration thread is dormant, and the workers thread uses a segment-level lock and grabs a segment lock, allowing access to the corresponding buckets. This way, if different worker threads access different segments, they can be accessed at the same time, increasing the amount of concurrency.

Here's a look at the definition of segment-level locks and global-level locks. The Thread_init function allocates and initializes a segment-level lock.

Static pthread_mutex_t *item_locks;//pointer to segment lock array/* Size of the item lock hash table */static uint32_t item_lock_count;//segment Lock number Volume static unsigned int item_lock_hashpower;static pthread_mutex_t item_global_lock;//Global Lock # define HASHSIZE (n) ((unsigned    long int) 1<< (n)) void thread_init (int nthreads, struct event_base *main_base) {int i;    int power;    Pthread_mutex_init (&cache_lock, NULL);    Pthread_mutex_init (&init_lock, NULL);     Pthread_cond_init (&init_cond, NULL);//nthreads is the number of workers threads passed in when called by the main function (Nthreads < 3) {power = 10;    } else if (Nthreads < 4) {power = 11;    } else if (Nthreads < 5) {power = 12;    } else {//max.//8192 buckets, and central locks don ' t scale much past 5 threads */power = 13;    }//power is a power of 2 Item_lock_count = Hashsize (Power); Item_lock_hashpower = power;//The middle level of the hash table lock. Not a bucket should be a lock.    Instead, multiple buckets share a lock item_locks = calloc (item_lock_count, sizeof (pthread_mutex_t)); if (! iteM_locks) {perror ("Can ' t Allocate item locks");    Exit (1);    } for (i = 0; i < Item_lock_count; i++) {Pthread_mutex_init (&item_locks[i], NULL);    } pthread_mutex_init (&item_global_lock, NULL); ...}


Toggle Lock Level:

Now look at how segment level locks and global level locks are used. The migration thread does not use a segment level lock, and in ASSOC.C's Assoc_maintenance_thread function, the migration thread only calls the Item_lock_global () function to lock the global locking item_global_lock. This is mainly to see how workers threads use segment level locks and global level locks.


Lock level for worker threads:

Workers thread if you want to access the hash table item, the Item_lock function is called first to lock. The Item_lock function automatically chooses whether to use segment-level or global-level locks as needed. The following is the specific code.

//memcached.h file Item Lock level enum Item_lock_types {item_lock_granular = 0,//segment level Item_lock_global//global level};//THREAD.C file static Pthread_ke y_t item_lock_type_key;//thread private Data key value void Item_lock (uint32_t hv) {//Get thread private variable uint8_t *lock_type = pthread_getspecific (ite M_lock_type_key);//likely This macro definition is used for code instruction optimization//likely (*lock_type = = item_lock_granular) to tell the compiler//*lock_type equals item_lock_ The likelihood of a granular is very large if (likely (*lock_type = = Item_lock_granular)) {//Use a segment level lock for a large probability//For some buckets of ITEM locking Mutex_lock (&item_locks[    HV & Hashmask (Item_lock_hashpower)]);    } else {//To all item locking Mutex_lock (&item_global_lock);    }}void Item_unlock (uint32_t hv) {uint8_t *lock_type = pthread_getspecific (Item_lock_type_key); if (likely (*lock_type = = item_lock_granular)) {Mutex_unlock (&ITEM_LOCKS[HV & Hashmask (Item_lock_hashpower)    ]);    } else {mutex_unlock (&item_global_lock); }}

You can see that memcached determines which lock is currently being used, depending on the thread private variable (the corresponding key value is Item_lock_type_key). Only thread-private data with a key value of Item_lock_type_key is set for each worker thread. To toggle the lock, modify the thread's private data directly. Then take a look at the initialization of workers thread private data.

Static Libevent_thread *threads;void thread_init (int nthreads, struct event_base *main_base) {... pthread_key_create ( &item_lock_type_key, NULL);    for (i = 0; i < nthreads; i++) {//Create worker thread, thread function is worker_libevent, thread parameter is &threads[i]        create_worker (worker_ Libevent, &threads[i]);    } ...} static void *worker_libevent (void *arg) {//This function is also called at initialization    libevent_thread *me = arg;    Me->item_lock_type = item_lock_granular;//Initial state use segment level lock//set thread private data for workers thread//Because all workers threads will call this function, So all workers threads are set with the same key value//thread private Data    pthread_setspecific (Item_lock_type_key, &me->item_lock_type);


To implement a switchover:

         you can see that thread-private data for each thread is a member variable item_lock_type of the Libevent_thread struct that is unique to each thread. The lock switch can be completed as long as the item_lock_type variable of the workers thread is modified as needed. The hash table migration thread will call the Switch_item_lock_type function in the Assoc_maintenance_thread function in the assoc.c file, allowing all workers threads to switch to segment-level or global-level locks. Now take a look at how it is implemented.

void Switch_item_lock_type (enum item_lock_types type) {char buf[1];    int i;        Switch (type) {case item_lock_granular:buf[0] = ' l ';//With L indicates Item_lock_granular segment level lock break;        Case Item_lock_global:buf[0] = ' g ';//G indicates Item_lock_global global level lock break;            default:fprintf (stderr, "Unknown lock Type:%d\n", type);            ASSERT (1 = = 0);    Break    } pthread_mutex_lock (&init_lock);    Init_count = 0; for (i = 0; i < settings.num_threads; i++) {//The worker thread is notified by writing a character to the worker listening pipe if (write (threads[i].notify_send_fd            , buf, 1)! = 1) {perror ("Failed writing to notify Pipe"); /* Todo:this is a fatal problem. Can it ever happen temporarily?    */}}//waits for all workers threads to switch the lock to the type of lock specified by type wait_for_thread_registration (settings.num_threads); Pthread_mutex_unlock (&init_lock);} static void wait_for_thread_registration (int nthreads) {while (Init_count < nthreads) {pthread_cond_wait (&init_cond, &init_lock); }}

Because all workers threads are in the Event_base loop, the workers thread can be notified by writing a byte directly to the workers thread listening pipeline.

Why does the migration thread have to switch the lock type of the workers thread in such a roundabout way? Is it OK to modify the Item_lock_type member variable of the libevent_thread structure of all threads directly?

This is mainly because the migration thread does not know what the worker thread is doing at the moment. If the worker thread is accessing item, and the segment level lock is preempted. At this point you switch the worker thread's lock to the global lock, and when the worker line threads unlocked, the global lock is solved (refer to the preceding Item_lock and Item_unlock code), so the program crashes. So it is not possible to migrate threads to switch, only to migrate threads to notify worker threads, and then worker threads to switch themselves. Of course, the worker thread is going to have to get the job done before it changes the switch. So after the migration thread notifies all worker threads, it calls the Wait_for_thread_registration function to hibernate until all the worker threads have switched to the specified lock type before waking up.

Now let's take a look at how the workers thread switches. Because the previous migration thread was writing a character to the pipeline that workers thread listens to, we looked directly at the pipeline event listener function thread_libevent_process set by the workers thread.

static void thread_libevent_process (int fd, short which, void *arg) {    libevent_thread *me = arg;    Char buf[1];    if (read (FD, BUF, 1)! = 1)        if (Settings.verbose > 0)            fprintf (stderr, "Can ' t read from libevent pipe\n");    Switch (buf[0]) {  ...     Case ' l ':    me->item_lock_type = item_lock_granular;//Toggle Item to segment level//Wake sleep Migration thread on init_cond condition variable    register_ Thread_initialized ();        break;    Case ' G ':    me->item_lock_type = item_lock_global;//Toggle Item Lock to global level    register_thread_initialized ();        break;    }} static void register_thread_initialized (void) {    pthread_mutex_lock (&init_lock);    init_count++;    Pthread_cond_signal (&init_cond);    Pthread_mutex_unlock (&init_lock);}


Switch ON Demand:

Now that you've finished building the infrastructure, let's see how the migration thread is going to regulate everything.

void Item_lock_global (void) {Mutex_lock (&item_global_lock);} void Item_unlock_global (void) {Mutex_unlock (&item_global_lock);} The static void *assoc_maintenance_thread (void *arg) {//do_run_maintenance_thread is a global variable with an initial value of 1, in Stop_assoc_maintenance_        The thread//function is assigned a value of 0, terminating the migration thread while (do_run_maintenance_thread) {int II = 0; /* Lock the cache, and bulk move multiple buckets to the new * hash table. */Item_lock_global ();//Lock the global level, all item is under the control of global lock//Lock the item inside the hash table.        Otherwise, when other threads are adding or deleting a hash table, the//data inconsistency occurs. The Cache_lock lock is also locked in the item.c Do_item_link and Do_item_unlink.        Mutex_lock (&cache_lock);.//Here is the migration of a bucket of data to a new hash table//After traversing all the item of a bucket, release the lock Mutex_unlock (&cache_lock);            Item_unlock_global ()//Release global lock if (!expanding) {//no longer need to migrate data. /* Finished expanding. The threads to use fine-grained (fine-grained) locks *///into here, stating that no data migration is required (stop scaling). Tells all the workers threads to switch to segment-level locks when accessing item. Locks that are blocked to all workers threads switch to the segment level Switch_item_lock_type (Item_lock_granuLAR);            Slabs_rebalancer_resume (); /* We are doing expanding.            Just wait for next invocation */Mutex_lock (&cache_lock); Started_expanding = false; Resets//hangs the extension thread until another thread inserts the data and finds that the item count has reached 1.5 times times the hash table size,//The call to another thread calls the Assoc_start_expand function, which calls pthread_cond_signal//            Wake-up extension thread pthread_cond_wait (&maintenance_cond, &cache_lock);            /* Before doing anything, tell threads to use a global lock */Mutex_unlock (&cache_lock); Slabs_rebalancer_pause ();//Wake up from the Maintenance_cond condition variable and start extending the hash table and migrating the data again. The migration thread locks the global level lock when migrating data for a bucket.//At this point the workers thread cannot use a segment-level lock, but instead uses a global-level lock, with all workers threads and the migration thread competing for global-level locks.//Which thread grabbed it, Have the right to access item.//the following line of code is to notify all workers threads of the lock switching//To the global level of your access to item.            Switch_item_lock_type will wait through the conditional variable hibernation,//until all workers threads are switched to global level locks before waking up switch_item_lock_type (Item_lock_global);            Mutex_lock (&cache_lock);        Assoc_expand ();//apply a larger hash table and set expanding to True Mutex_unlock (&cache_lock); }   } return NULL; 

Sharp-eyed readers may also see Mutex_lock (&cache_lock) and Slabs_rebalancer_resume (). Yes, this is another lock processing for two locks. Why do you add these two locks? Because there are other threads besides the worker thread that manipulate the LRU queue and hash table. These threads, however, do not have to be notified as a worker thread. So we can only use another big lock. Of course, these threads are dormant for most of the time and do not affect performance too much. Because of the other threads involved, this blog post will not further explain the two locks.




Reference count:


Why reference counting is required:

If the reader is aware of C + + 's shared_ptr, it will be easier to see what follows. Because shared_ptr also uses the concept of reference counting.

To ensure thread safety, a lock must be added when accessing and manipulating an item. And locking will inevitably lead to a decline in performance. If you lock at the beginning of the processing read operation until the read operation is unlocked (that is, the full locking), it is difficult for some popular data to be updated (that is, write operations). This is because read operations on popular data can be quite frequent and write operations will be delayed. If you do not lock the whole process, a worker thread reads one item and another worker thread deletes the item. If the item is deleted, the item being read will operate on an illegal item. For performance and handling of this situation, memcached uses reference counting techniques. Here the reference count C + + smart pointer shared_ptr principle is the same. When no thread is referencing the item, the item is deleted (The actual memory is returned to the slab allocator).

        memcached in order to use the reference counting technique, a REFCOUNT member is defined in the item struct to record the total number of the item being referenced (consumed by the worker thread). Of course, increasing and decreasing the reference count of the item must be an atomic operation. To do this, memcached defines two functions.

unsigned short refcount_incr (unsigned short *refcount) {#ifdef have_gcc_atomics    return __sync_add_and_fetch ( RefCount, 1); #elif defined (__sun)    return Atomic_inc_ushort_nv (refcount); #else    unsigned short res;    Mutex_lock (&atomics_mutex);    (*refcount) + +;    res = *refcount;    Mutex_unlock (&atomics_mutex);    return res; #endif}unsigned short REFCOUNT_DECR (unsigned short *refcount) {#ifdef have_gcc_atomics    return __sync_ Sub_and_fetch (RefCount, 1); #elif defined (__sun)    return Atomic_dec_ushort_nv (refcount); #else    unsigned short Res;    Mutex_lock (&atomics_mutex);    (*refcount)--;    res = *refcount;    Mutex_unlock (&atomics_mutex);    return res; #endif}//refcount_incr (&it->refcount); typically this is called//REFCOUNT_DECR (&it->refcount)

If you do not understand __sync_add_and_fetch and __sync_sub_and_fetch, then hurry Google. Because they are more important functions, they can be used to create a lock-free queue. Both functions return the value after the operation.


How to use reference counting:

Of course, even if you have a reference count, you need to lock it. Because of the interval between getting the item and increasing the reference count, there may be other threads that have deleted the item. So the general process is this: The worker line Chengxianga locks, then gets the item, then increments the reference count for this item, and finally releases the lock. At this point the worker thread takes possession of this item, and other worker threads must detect if the item has a reference count of 0 when performing the delete operation, which is to check if there are other worker threads using (referencing) the item. Let's give an example.

Item *item_get (const char *key, const size_t nkey) {    item *it;    uint32_t HV;    HV = Hash (key, nkey);    Item_lock (HV);    it = Do_item_get (key, Nkey, HV);    Item_unlock (HV);    return it;} /** wrapper around Assoc_find which does the lazy expiration logic *///called Do_item_get functions have been added with the Item_lock (HV) segment level lock or global lock item * Do_item_get (const char *key, const size_t nkey, const uint32_t HV) {    Item *it = Assoc_find (key, Nkey, HV);//assoc_find Inside the function there is no lock        if (it = NULL) {//was found, at this point the reference count for item is at least 1        refcount_incr (&it->refcount);//thread safely self-increment ...    } ...    return it;}

The above code is called when the Get command is processed. The whole process is just like that. Of course, the worker thread finally needs to reduce the reference count of this item. For the Get command, the last call to the Item_remove command reduces the item's reference count. Do you think it's strange to call a function called remove? Look at the code implementation.

void Item_remove (item *item) {    uint32_t hv;    HV = Hash (Item_key (ITEM), item->nkey);    Item_lock (HV);    Do_item_remove (item);    Item_unlock (HV);} void Do_item_remove (item *it) {        assert ((It->it_flags & item_slabbed) = = 0);    ASSERT (It->refcount > 0);    if (REFCOUNT_DECR (&it->refcount) = = 0) {//reference count equals 0 when return        item_free (it);//Return the item to the slab allocator    }}

As you can see, this is because reducing the number of references to an item may be to remove this item. Why is it? Considering this scenario, thread a adds a reference count for this item because it is going to read an item, and then thread B comes in and deletes the item. This delete command is definitely executed, not to say that the item is referenced by another thread and does not execute the delete command. However, it is certainly not possible to delete it immediately, because thread A is still using this item, and memcached uses deferred deletion . When thread b executes the delete command, the reference count of item is reduced by one more time, so that when thread a releases its reference to item, the reference number of item becomes 0. The item is then released (returned to the slab allocator).

One thing to note: When an item is inserted into the hash table and the LRU queue, then the item is referenced by the hash table and the LRU queue. At this point, if no other thread is referencing the item, then the item has a reference number of 1 (the hash table and the LRU queue as a reference). So a worker thread is going to delete an item (of course, the worker thread has to take possession of the item before deleting it), then it needs to reduce the reference count of two times the item, one time to reduce the hash table and the LRU queue, and the other is to reduce its own references. So it is often possible to see in the code that deleting an item requires calling functions Do_item_unlink (it, HV) and Do_item_remove (it).


Tail_repair_time:

Consider a situation in which a worker thread increases the number of references to an item by REFCOUNT_INCR. But for some reason (perhaps the kernel is out of the question), the worker thread has not been able to call REFCOUNT_DECR before it hangs. At this point the reference number of this item is certainly not equal to 0, that is, there is always a worker thread occupying it. But actually the worker thread is already dead. So for this situation need to fix. The fix is a lot simpler: Assign the reference count of this item to 1 directly.

According to what determines a worker thread hangs? First, in memcached, in general, any function calls will not take too much time, even if the function needs to be locked. So if the last access time of this item is now far away, but it is also referenced by a worker thread, then it is almost possible to tell the worker thread to hang up. Before the 1.4.16 version, this time distance is fixed to 3 hours. Settings.tail_repair_time storage time distance from 1.4.16, can be set when starting memcached, the default time distance is 1 hours. Now this version of 1.4.21 does not do this fix by default, the default value of Settings.tail_repair_time is 0. Because Memcached's authors rarely see this bug, it is estimated that the operating system is more stable. The above release notes are from Link 1 and link 2.

The above is a theoretical explanation, the following look at memcached implementation of it.

Item *do_item_alloc (char *key, const size_t nkey, const int flags, const rel_time_t EXPTIME, const int    nbytes, const uint32_t CUR_HV) {uint8_t nsuffix;    Item *it = NULL;    Char suffix[40];//The total space required to store this item size_t ntotal = Item_make_header (nkey + 1, flags, nbytes, suffix, &nsuffix);    if (Settings.use_cas) {ntotal + = sizeof (uint64_t);    }//According to size judging from which slab unsigned int id = slabs_clsid (ntotal);    Item *search;    Item *next_it;    Search = Tails[id];        for (; Search! = NULL; search=next_it) {next_it = search->prev;        uint32_t HV = Hash (Item_key (search), search->nkey); /* Now see if the item is refcount locked */if (REFCOUNT_INCR (&search->refcount)! = 2) {Refcoun            T_DECR (&search->refcount); /* Old rare bug could cause a refcount leak.     We haven ' t seen * it in years, but we leave this code in to prevent failures * just in case */       if (settings.tail_repair_time &&//starts detection Search->time + Settings.tail_repair_time < Current_time) {//has not been visited within this time distance search->refcount = 1;//Release a reference to item by the thread Do_item_unlink_noloc        K (search, HV);//The item is removed from the hash table and the LRU queue and the reference count is reduced by one} continue; }...    } ...}

The Settings.tail_repair_time in the code indicates whether this detection is turned on, which is not enabled by default (the default value is equal to 0). Can be turned on with the-O tail_repair_time option when starting memcached. Refer to the memcached startup parameter details and the default values for key configurations.







memcached Source Analysis-----Item lock level with item reference count

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.