memcached Source Analysis-----Hash Table basic operation and expansion process

Source: Internet
Author: User
Tags assert memcached volatile


Reprint Please specify source: http://blog.csdn.net/luotuo44/article/details/42773231


Warm tip: This article uses some global variables that can be set at startup Memcached. The meaning of these global variables can be referenced in the memcached startup parameters in detail. For these global variables, the process is simply to take their default values as described in how to read memcached source code.


The code inside the assoc.c file is to construct a hash table. One reason to memcached fast is to use a hash table. Now take a look at how memcached uses a hash table.


Hash structure:

The main function calls the Assoc_init function to request and initialize the hash table. To reduce the likelihood of conflicting hash tables, the hash table for memcached is longer, and the Hashtable has a power of 2. The global variable hashpower is used to record the power of 2. The main function calls the Assoc_init function using the global variable Settings.hashpower_init as the parameter that indicates the power of the hash table when it is initialized. Settings.hashpower_init can be set up when starting memcached, refer to the memcached startup parameter details and the default value of the key configuration.

Memcached.h File # define Hashpower_default 16//assoc.h file unsigned int hashpower = hashpower_default; #define HASHSIZE (N) ( (UB4) 1<< (n))//Here is a power of 1 to shift n times//hashsize (n) to 2, so the binary form of the Hashmask value is the number that is followed by a total of 1. It's very much like a bit operation. The result of &//value & Hashmask (n) is definitely a number smaller than Hashsize (n). That is, the result is in the hash table.//hashmask (n) can also be called Hash Mask # # Hashmask (N) (Hashsize (n)-1)//hash table array pointer static item** primary_hashtable = 0;//Default parameter value is 0. This function is called by the main function, and the default value for the parameter is 0void assoc_init (const int hashtable_init) {    if (hashtable_init) {        hashpower = Hashtable_ init;    } Because the hash table grows slowly, dynamic memory allocation is used. The hash table stores the data as a//pointer, which saves space. Hashsize (hashpower) is the length of the hash table    primary_hashtable = calloc (Hashsize (hashpower), sizeof (void *));    if (! primary_hashtable) {        fprintf (stderr, "Failed to init hashtable.\n");        Exit (exit_failure);//hash table is the basis of memcached work, if failure can only exit run    }}

When it comes to hash tables, then there are two problems, one is the hashing algorithm, and the other is how to resolve the conflict.

For hash functions (algorithms), Memcached uses one of the two open source MurmurHash3 and Jenkins_hash directly. By default, Jenkins is used to set the setting to MurmurHash3 when the memcached is started. The memcached is the input of the client input key value as the hash algorithm, and a 32-bit unsigned integer output (with variable HV storage) is obtained. Because the hash table length is not 2^32-1 so large, it needs to be truncated with the HASHMASK macro in the preceding code. Because it is a bit operation, HV & Hashmask (Hashpower) can often be seen in the memcached code.

Memcached uses the most common chain address method to resolve conflict issues. As you can see from the previous code, Primary_hashtable is a two-level pointer variable that points to a one-dimensional pointer array with each element of the array pointing to a linked list (the item node on the list has the same hash value). each element of an array, in a memcached It's also called a barrel . (bucket) , so buckets are used in the later statements . is a hash table, where the No. 0 bucket has 2 item, and the 2nd, 3, and 5th buckets each have an item. Item is the structure used to store user data.




Basic operation:


Insert item:

Then look at how to insert an item into the hash table. It finds the location in the hash table directly from the hash value (that is, finds the corresponding bucket) and then inserts it into the bucket's conflict chain using the head interpolation method. The item struct has a dedicated H_next pointer member variable that is used to concatenate the hash conflict chain.

static unsigned int hash_items = 0;//hash The number of items in the table/* note:this isn ' t an assoc_update.  The key must not already exist to call this *///HV is the hash value of this item key value int Assoc_insert (item *it, const uint32_t HV) {    Unsigne d int oldbucket;//Insert a item//for the first time look at this function and look directly at the Else section    if (Expanding &&        (Oldbucket = (HV & Hashmask ( hashpower-1))) >= Expand_bucket)    {...    } else {//Insert hash table using head interpolation        it->h_next = PRIMARY_HASHTABLE[HV & Hashmask (Hashpower)];        PRIMARY_HASHTABLE[HV & Hashmask (hashpower)] = it;    }    hash_items++;//Hash table item number plus one ...    return 1;}


Find Item:

When you insert item into a hash table, you can start looking for item. Here's a look at how to find an item in a hash table. The key value of item HV can only be positioned to the bucket position in the hash table, but there may be more than one item on the conflict chain of a bucket, so the key value of item is required in addition to the HV when looking for it.

Since the hash value can only determine which bucket (bucket) is in the hash table, there is a conflict chain inside a bucket//At this point, you need to use specific key values to traverse and compare all nodes on the conflict chain. Although key is a string ending with '/', it is a bit time-consuming to call strlen (the key-value string needs to be traversed). So another parameter is required//nkey indicates the length of the key item *assoc_find (const char *key, const size_t nkey, const uint32_t HV) {    item *it;    unsigned int oldbucket;//directly see else part    if (expanding &&        (Oldbucket = (HV & Hashmask (hashpower-1))) > = Expand_bucket)    {        it = Old_hashtable[oldbucket];    } else {    //is determined by the hash value this key is the it that belongs to that bucket (bucket)        = PRIMARY_HASHTABLE[HV & Hashmask (Hashpower)];    } Here, it is determined that the key belongs to the bucket. Traversing the conflict chain of the corresponding bucket can be    item *ret = NULL;    while (IT) {//length is the same as memcmp comparison, more efficient        if ((Nkey = = It->nkey) && (memcmp (Key, Item_key (IT), nkey) = = 0)) { C13/>ret = it;            break;        }        it = it->h_next;    }    return ret;}


Delete item:

         Here's how you can remove an item from a hash table. The general practice of removing a node from a linked list is to first locate the node's predecessor, and then use the next pointer of the predecessor node for the deletion and stitching operations. The memcached approach is similar, implemented as follows:

void Assoc_delete (const char *key, const size_t nkey, const uint32_t HV) {Item **before = _hashitem_before (key, Nkey,        HV);//Get the H_next member address of the precursor node if (*before) {//Find successful item *nxt; hash_items--;//because before is a level two pointer whose value is the H_next member address of the predecessor item of the item being looked up.//So *before points to the item being looked up. Because before is a two-level pointer, so//* Before as an lvalue, you can assign a value to the H_next member variable.        So the following three lines of code is//so that after the deletion of the middle item, the front and back of the item can also be linked.        NXT = (*before)->h_next;   (*before)->h_next = 0; /* Probably pointless, but whatever.        */*before = NXT;    Return  }/* Note:we never actually get here. The callers don ' t delete things they can ' t find. */assert (*before! = 0);} Find Item. Returns the H_next member address of the predecessor node and returns the H_next member address of the last//node in the conflict chain if a lookup fails. Because the value of the last node's h_next is null. You can know if a lookup succeeds by using the * operator on the return value//.    Static item** _hashitem_before (const char *key, const size_t nkey, const uint32_t HV) {Item **pos; unsigned int oldbucket;//Also, when you look at it, jump directly to the Else section if (expanding &&//is extending the hash table (Oldbucket = (HV & Hashmask (has HPOWER-1)))>= expand_bucket) {pos = &old_hashtable[oldbucket];    } else {//find the corresponding bucket position in the hash table pos = &AMP;PRIMARY_HASHTABLE[HV & Hashmask (Hashpower)]; }//Traversal bucket conflict chain find ITEM while (*pos && (nkey! = (*pos)->nkey) | | memcmp (KEY, Item_key (*pos), Nkey)) {pos    = & (*pos)->h_next; }//*pos will know if there is a successful search.    If *pos equals null then the lookup fails, otherwise the lookup succeeds. return POS;}



Extended Hash Table:

When the number of item in the hash table reaches 1.5 times times the length of the hash table, the Hashtable is expanded to increase the table length of the Hashtable. Memcached when an item is inserted, it checks whether the current item total is 1.5 times times the length of the hash table. Because the hash value of item is fairly uniform, the collision chain length of each bucket is about 1.5 nodes on average. So memcached's hash lookup is still fast.


Migrating Threads:

There is a big problem with extending the hash table: the length of the hash table changes after the extension, and the position after the item hash is changed (recall how memcached determines the position of the bucket based on the hash value of the key value). So if you want to extend the hash table, then you need to recalculate the hash of all the item in the Hashtable to the new hash position (bucket position), and then move the item to the new bucket. This is done for all of the item, so this must be a time-consuming operation. This operation is referred to as data migration later in this article.

Because data migration is a time-consuming operation, this work is done by a dedicated thread ( Let's call this thread a migration thread ). This migration thread was created by a function called by the main function. Look at the following code:

#define Default_hash_bulk_move 1int Hash_bulk_move = Default_hash_bulk_move;//main function calls this function to start the data migration thread int START_ASSOC_ Maintenance_thread () {    int ret;    Char *env = getenv ("Memcached_hash_bulk_move");    if (env! = NULL) {//hash_bulk_move's role is to be said later. Here is the assignment of hash_bulk_move to        hash_bulk_move = atoi (env) through environment variables;        if (Hash_bulk_move = = 0) {            hash_bulk_move = default_hash_bulk_move;        }    }    if (ret = pthread_create (&maintenance_tid, NULL,                              assoc_maintenance_thread, null))! = 0) {        fprintf (stderr , "Can ' t create thread:%s\n", strerror (ret));        return-1;    }    return 0;}

When the migration thread is created, it goes into hibernation (by waiting for a condition variable), and when the worker thread inserts the item, it discovers that the hash table needs to be extended to invoke the Assoc_start_expand function to wake the migration thread.

The static bool Started_expanding = False;//assoc_insert function calls this function when the item number is 1.5 times times the length of the hash table to call the static void Assoc_start_expand (    void) {if (started_expanding) return;    Started_expanding = true; Pthread_cond_signal (&maintenance_cond);} static bool Expanding = false;//Indicates whether the hash table is in an extended state static volatile int do_run_maintenance_thread = 1;static void *assoc_maint    Enance_thread (void *arg) {//do_run_maintenance_thread is a global variable with an initial value of 1 and is assigned a value of 0 in the stop_assoc_maintenance_thread//function, terminating the migration thread        while (do_run_maintenance_thread) {int II = 0;//locked Item_lock_global ();        Mutex_lock (&cache_lock);//To release the lock Mutex_unlock (&cache_lock) When the item is migrated//traversed;        Item_unlock_global ();            if (!expanding) {//Do not need to migrate data (UP). /* We are doing expanding.            Just wait for next invocation */Mutex_lock (&cache_lock); Started_expanding = false; Resets//hangs the migration thread until the worker thread inserts the data and finds that the item count has reached 1.5 times times the hash table size,//The call to the worker thread calls the Assoc_start_expand function, which invokes the Pthread_cond_    signal//wake-up migration thread        Pthread_cond_wait (&maintenance_cond, &cache_lock);            Mutex_unlock (&cache_lock); Mutex_lock (&cache_lock);        Assoc_expand ();//apply a larger hash table and set expanding to True Mutex_unlock (&cache_lock); }} return NULL;}


Migrating data incrementally:

In order to avoid the migration of the worker thread to add and delete the hash table, so in the data migration when the lock, the worker thread grabbed the lock to delete and find the hash table. Memcached the migration line cannot be Cheng too long in order to achieve rapid response (that is, the worker thread can quickly complete and delete the find operation). But data migration itself is a time-consuming operation, which is a contradiction.

In order to solve this contradiction, memcached adopted the method of gradual migration. It is done in a loop: locking-"Only a small amount of data migration-" unlocked. The effect of this is that although the migration thread will preempt the lock multiple times, the time to seize the lock is very short, which increases the probability of the worker thread grabbing the lock, allowing the worker thread to complete its operation quickly. How many item is a small part? The global variable Hash_bulk_move mentioned above indicates how many buckets are item, the default value is 1 buckets, and the value of Hash_bulk_move is assumed to be 1 for convenience of narration.

The specific approach to gradual migration is to call the Assoc_expand function to request a new larger hash table, migrate only the old hash table one bucket of item to the new hash table, and then release the lock after migrating a bucket. An old hashtable and Novi Hashi table are required at this point. In the memcached implementation, with primary_Hashtable to represent the new table (there are some posts called the Main Table), old_hashtable represents the old table (sub-table).

As mentioned earlier, the migration thread is created and sleeps until it is awakened by the worker thread. When the migration thread wakes up, it calls the Assoc_expand function to enlarge the table length of the hash table. The Assoc_expand function is as follows:

static void Assoc_expand (void) {    old_hashtable = primary_hashtable;//Request a new hash table and use old_hashtable to point to the old hash table    Primary _hashtable = Calloc (hashsize (hashpower + 1), sizeof (void *));    if (primary_hashtable) {        hashpower++;        expanding = true;//indicates that it has entered        the extended state Expand_bucket = 0;//start Data migration from bucket # No. 0    } else {        primary_hashtable = old_hashtable;        /* Bad news, but we can keep running. */    }}


Now take a look at the complete assoc_maintenance_thread thread function to realize how the migration thread is migrating data gradually. Why do you say it's complete? Because there's still something inside the function. This post is not explained, but it does not prevent us from reading the function. There will be other posts in the back that explain this thread function.

static unsigned int expand_bucket = 0;//point to the bucket to be migrated # define Default_hash_bulk_move 1int Hash_bulk_move = Default_hash_bulk_ move;static volatile int do_run_maintenance_thread = 1;static void *assoc_maintenance_thread (void *arg) {//do_run_        Maintenance_thread is a global variable with an initial value of 1, which is assigned a value of 0 in the stop_assoc_maintenance_thread//function, terminating the migration thread while (Do_run_maintenance_thread) {        int II = 0;//lockout Item_lock_global (); Mutex_lock (&cache_lock);//hash_bulk_move is used to control how many buckets of the item are moved per migration.            The default is one.//If expanding is true, it will enter the loop body, so when the migration line Cheng Gang created, it will not enter the loop body for (ii = 0; II < hash_bulk_move && expanding; ++ii) {            Item *it, *next;            int bucket;//in the Assoc_expand function Expand_bucket be assigned 0//traverse the bucket indicated by Expand_bucket in the old hash table, migrating all item//of that bucket to the new hash table. for (it = old_hashtable[expand_bucket]; NULL! = it; it = next) {next = it->h_next;//recalculates the new hash value, gets its position in the new hash table bucket = hash (Item_key (it), It-&gt ; nkey) & Hashmask (Hashpower);//insert this item into the new hash table It->h_next = Primary_hashtable[buckeT];            Primary_hashtable[bucket] = it; }//does not need to empty the old bucket. Assign the chain header of the conflict chain directly to NULL old_hashtable[expand_bucket] = null;//migrate a bucket, and then point Expand_bucket to the next bucket to be migrated Expand_bu            cket++;                if (Expand_bucket = = Hashsize (hashpower-1)) {//all data migrated expanding = FALSE;//Set extension flag to False            Free (old_hashtable);        }//the lock Mutex_unlock (&cache_lock) is released after traversing all the item hash_bulk_move buckets;        Item_unlock_global ();            if (!expanding) {//no longer need to migrate data. /* Finished expanding. The threads to use fine-grained (fine-grained) locks *///into here, stating that no data migration is required (stop scaling).            ... mutex_lock (&cache_lock); Started_expanding = false; Resets//hangs the migration thread until the worker thread inserts the data and finds that the item count has reached 1.5 times times the hash table size,//The call to the worker thread calls the Assoc_start_expand function, which invokes the Pthread_cond_            signal//wake-up Migration thread pthread_cond_wait (&maintenance_cond, &cache_lock); /* Before doing anything, tell threads to use a global lock */Mutex_unlock(&cache_lock); Mutex_lock (&cache_lock);        Assoc_expand ();//apply a larger hash table and set expanding to True Mutex_unlock (&cache_lock); }} return NULL;}


Greatest second act

Now go back and look at the INSERT, delete, and find operations for the hash table, as these operations may occur during the hash table migration phase. One thing to note is that the INSERT, delete, and find operations in the ASSOC.C file do not see the lock operation. But as already said, need and migration thread preemption lock, grab the lock to do the corresponding operation. In fact, this lock is added by the caller (key function) for inserting, deleting, and finding, so it is not visible in the code.

Because the hash table may be expanding at the time of insertion, there is a choice when inserting: inserting to a new or old table? Memcached's approach is to insert the old table when item has not been migrated to the new table, or insert it into the new table. The following is the code for the insertion section.

/* note:this isn ' t an assoc_update.  The key must not already exist to call this *///HV is the hash value of this item key value int Assoc_insert (item *it, const uint32_t HV) {    Unsigne d int oldbucket;//Inserts an item    if (expanding &&//is currently in the Extended hash table state        (Oldbucket = (HV & Hashmask ( hashpower-1))) >= Expand_bucket)//Data migration has not migrated to this bucket    {    //inserted into the old table        it->h_next = Old_hashtable[oldbucket];        Old_hashtable[oldbucket] = it;    } else {    //INSERT into new table        It->h_next = PRIMARY_HASHTABLE[HV & Hashmask (Hashpower)];        PRIMARY_HASHTABLE[HV & Hashmask (hashpower)] = it;    }    hash_items++;//Hash Table Item number plus//when the hash list item number reaches 1.5 times times the hash table capacity, it will be extended//if, of course, it is now in the extended state, it is no longer expanded if    (! expanding && hash_items > (hashsize (Hashpower) * 3)/2) {        assoc_start_expand ();//wake-up migration thread, extend hash table    }    return 1;}

Here's a question, why not insert it directly into the new table? Inserting directly into a new table is absolutely no problem for data consistency. Online it is said to ensure the order of the same bucket item, but because the migration line Rountines accesses insert thread for lock preemption uncertainty, no order can be guaranteed by the Assoc_insert function. This article considers it to be quick to find. If you are inserting directly into a new table, you might want to find both the old and the new tables at the same time to find the item. Finding a table, finding it, and then looking for another table is considered fast enough.

If you follow the implementation of the Assoc_insert function, you can find item without finding two tables. Look at the lookup function below.

Since the hash value can only determine which bucket (bucket) is in the hash table, there is a conflict chain inside a bucket//At this point, you need to use specific key values to traverse and compare all nodes on the conflict chain. Because key is not a string that ends with '/', it requires another parameter nkey to indicate the length of the key item *assoc_find (const char *key, const size_t nkey, const uint32_t HV) { C0/>item *it;    unsigned int oldbucket;    if (expanding &&//is extending the hash table        (Oldbucket = (HV & Hashmask (hashpower-1))) >= Expand_bucket)//The item is still in the old table    {        it = Old_hashtable[oldbucket];    } else {    //is determined by the hash value this key is the it that belongs to that bucket (bucket)        = primary_hashtable[ HV & Hashmask (hashpower)];    } Here you have determined which table the item you are looking for belongs to, and you have also determined the bucket location. Traversing the conflict chain of the corresponding bucket can be    item *ret = NULL;    while (IT) {//length is the same as memcmp comparison, more efficient        if ((Nkey = = It->nkey) && (memcmp (Key, Item_key (IT), nkey) = = 0)) { C13/>ret = it;            break;        }        it = it->h_next;    }    return ret;}


         Delete operations and find operations almost, here directly posted, not much to say. The delete operation is also for the lookup operation.

void Assoc_delete (const char *key, const size_t nkey, const uint32_t HV) {Item **before = _hashitem_before (key, Nkey,        HV);//Get the H_next member address of the precursor node if (*before) {//Find successful item *nxt; hash_items--;//because before is a level two pointer whose value is the H_next member address of the predecessor item of the item being looked up.//So *before points to the item being looked up. Because before is a two-level pointer, so//* Before as an lvalue, you can assign a value to the H_next member variable.        So the following three lines of code is//so that after the deletion of the middle item, the front and back of the item can also be linked.        NXT = (*before)->h_next;   (*before)->h_next = 0; /* Probably pointless, but whatever.        */*before = NXT;    Return  }/* Note:we never actually get here. The callers don ' t delete things they can ' t find. */assert (*before! = 0);} Find Item. Returns the H_next member address of the predecessor node and returns the H_next member address of the last//node in the conflict chain if a lookup fails. Because the value of the last node's h_next is null. You can know if a lookup succeeds by using the * operator on the return value//.    Static item** _hashitem_before (const char *key, const size_t nkey, const uint32_t HV) {Item **pos;    unsigned int oldbucket; if (expanding &&//is extending the hash table (Oldbucket = (HV & Hashmask (hashpower-1))) >= Expand_bucket) {pos = &old_hashtable[oldbucket];    } else {//find the corresponding bucket position in the hash table pos = &AMP;PRIMARY_HASHTABLE[HV & Hashmask (Hashpower)]; }//here has determined which table the item to find belongs to, and also determines the bucket location.        Traversing the conflict chain of the bucket can//traverse the bucket's conflict chain to find the ITEM while (*pos && (nkey! = (*pos)->nkey) | | memcmp (KEY, Item_key (*pos), Nkey)) {    pos = & (*pos)->h_next; }//*pos will know if there is a successful search.    If *pos equals null then the lookup fails, otherwise the lookup succeeds. return POS;}


As you can tell from the above discussion, inserting and deleting an item must know if the bucket for this item has been migrated to the new table.





memcached Source Analysis-----Hash Table basic operation and expansion process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.