Memcached source code analysis ----- basic operations on the hash table and resizing process, memcached -----

Source: Internet
Author: User

Memcached source code analysis ----- basic operations on the hash table and resizing process, memcached -----


Reprinted please indicate the source: http://blog.csdn.net/luotuo44/article/details/42773231


Tip: This article uses some global variables that can be set when memcached is started. For the meanings of these global variables, refer to memcached startup parameters. For these global variables, take their default values directly as described in "how to read memcached source code.


The code in the assoc. c file is to construct a hash table. One reason for memcached's speed is that a hash table is used. Now let's take a look at how memcached uses hash tables.


Hash structure:

The main function calls the assoc_init function to apply for and initialize the hash table. To reduce the possibility of hash table conflicts, the hash table of memcached is long and the length of the hash table is a power of 2. The global variable hashpower is used to record the power of 2. When the main function calls the assoc_init function, the global variable settings. hashpower_init is used as the parameter to specify the power of the hash table during initialization. Settings. hashpower_init can be set when memcached is started. For details, refer to memcached startup parameters and key configuration default values.

// Memcached. hfile # define HASHPOWER_DEFAULT 16 // assoc. h文 unsigned int hashpower = HASHPOWER_DEFAULT; # define hashsize (n) (ub4) 1 <(n) // here 1 shifts left n times // hashsize (n) is the power of 2, so the binary form of the hashmask value is the number of all followed by 1. This is similar to the result of the bitwise operation & // value & hashmask (n). It must be a number smaller than hashsize (n. that is, the result is stored in the hash table // hashmask (n), which can also be called a hash mask # define hashmask (n) (hashsize (n)-1) // The array pointer of the hash table static item ** primary_hashtable = 0; // The default parameter value is 0. This function is called by the main function. The default value of the parameter is 0 void assoc_init (const int hashtable_init) {if (hashtable_init) {hashpower = hashtable_init;} // because the hash table will gradually increase, therefore, dynamic memory allocation is required. The data stored in the hash table is a // pointer, which saves more space. // Hashsize (hashpower) is the length of the hash table, which is primary_hashtable = calloc (hashsize (hashpower), sizeof (void *); if (! Primary_hashtable) {fprintf (stderr, "Failed to init hashtable. \ n"); exit (EXIT_FAILURE); // hash table is the basis of memcached work. If it fails, it can only exit running }}

When it comes to hash tables, there should be two problems: one is the hash algorithm and the other is how to solve the conflict.

For hash functions (algorithms), memcached directly uses one of the open-source murhash3 and jenkins_hash. Jenkins is used by default. You can set it to MurmurHash3 when starting memcached. Memcached directly uses the key value entered by the client as the hash algorithm input to obtain a 32-bit unsigned integer output (stored using the variable hv ). Because the length of the hash table is not as large as 2 ^ 32-1, we need to use the hashmask macro in the previous code to perform truncation. Because it is a bit operation, we often see hv & hashmask (hashpower) in memcached code ).

Memcached uses the most common link address method to solve conflicts. From the code above, we can see that primary_hashtable is a second-level pointer variable pointing to a one-dimensional pointer array, each element of the array points to a linked list (the item node on the linked list has the same hash value ). Each element of the array is also called a bucket in memcached. Therefore, the bucket will be used in the subsequent expressions. Is a hash table, where bucket 0th has two items, and bucket 2nd, 3, and 5 each have one item. Item is the struct used to store user data.




Basic operations:


Insert item:

Next, let's take a look at how to insert an item in the hash table. It directly finds the location in the hash table based on the hash value (that is, find the corresponding bucket), and then inserts it into the conflict chain of the bucket using the header insertion method. The item struct has a special h_next pointer member variable used to connect to the hash conflict chain.

Static unsigned int hash_items = 0; // number of items in the hash table/* Note: this isn't an assoc_update. the key must not already exist to call this * // hv is The hash value of this item key value int assoc_insert (item * it, const uint32_t hv) {unsigned int oldbucket; // insert an item using the header insertion method. // The first time you read this function, you can directly view the else part if (expanding & (oldbucket = (hv & hashmask (hashpower-1)> = expand_bucket ){...} else {// insert it-> h_next = primary_hashtable [hv & h Ashmask (hashpower)]; primary_hashtable [hv & hashmask (hashpower)] = it;} hash_items ++; // The number of items in the hash table plus one... Return 1 ;}


Find item:

After inserting an item into the hash table, you can start searching for the item. The following describes how to find an item in the hash table. Item key value hv can only be located in the bucket location of the hash table, but there may be multiple items in the conflict chain of one bucket, therefore, in addition to hv, the item key value is also required for searching.

// The hash value can only determine the bucket in the hash table ), however, a bucket contains a conflicted link // at this time, you need to use a specific key value to traverse and compare all nodes on the conflicted link one by one. Although the key is a string ending with '\ 0', calling strlen is still time consuming (you need to traverse the key-value string ). Therefore, you need another parameter // nkey to specify the length of this key item * assoc_find (const char * key, const size_t nkey, const uint32_t hv) {item * it; unsigned int oldbucket; // directly view the else part if (expanding & (oldbucket = (hv & hashmask (hashpower-1)> = expand_bucket) {it = old_hashtable [oldbucket];} else {// judging by the hash value that the key belongs to that bucket it = primary_hashtable [hv & hashmask (hashpower)];} // here, it is determined that the key belongs to the bucket. After traversing the conflicting chains of the corresponding bucket, you can call memcmp comparison only when item * ret = NULL; while (it) {// has the same length, more efficient if (nkey = it-> nkey) & (memcmp (key, ITEM_key (it), nkey) = 0) {ret = it; break ;} it = it-> h_next;} return ret ;}


Delete item:

The following describes how to delete an item from a hash table. To delete a node from a linked list, first locate the node's parent node, and then use the next pointer of the parent node to delete and splice the node. The implementation of memcached is similar to the following:

Void assoc_delete (const char * key, const size_t nkey, const uint32_t hv) {item ** before = _ hashitem_before (key, nkey, hv ); // obtain the h_next Member Address of the precursor node if (* before) {// find the successful item * nxt; hash_items --; // because before is a second-level pointer, the value is the h_next Member Address of the first item to be searched. // so * before points to the item to be searched. because before is a second-level pointer, you can assign values to the h_next member variable when/* before is used as the left value. So the following three lines of code are // so that after the intermediate item is deleted, the items before and after can be connected. Nxt = (* before)-> h_next; (* before)-> h_next = 0;/* probably pointless, but whatever. */* before = nxt; return;}/* Note: we never actually get here. the callers don't delete things they can't find. */assert (* before! = 0);} // search for items. Returns the h_next Member Address of the first node. If the search fails, the h_next Member Address of the last node in the conflict chain is returned. Because the h_next value of the last node is NULL. You can check whether the search is successful by using the * operator on the return value. Static item ** _ hashitem_before (const char * key, const size_t nkey, const uint32_t hv) {item ** pos; unsigned int oldbucket; // Similarly, directly jump to the else section if (expanding & // expanding the hash table (oldbucket = (hv & hashmask (hashpower-1)> = expand_bucket) {pos = & old_hashtable [oldbucket];} else {// locate the bucket location in the hash table pos = & primary_hashtable [hv & hashmask (hashpower)];} // search for item while (* pos & (nkey! = (* Pos)-> nkey) | memcmp (key, ITEM_key (* pos), nkey) {pos = & (* pos)-> h_next ;} // * the pos can check whether the search is successful. If * pos is NULL, the search fails. Otherwise, the search is successful. Return pos ;}



Extended hash table:

When the number of items in a hash table reaches 1.5 times the length of the hash table, the hash table is extended to increase the length of the hash table. When inserting an item, memcached checks whether the total number of items has reached 1.5 times the length of the hash table. Because the hash values of items are relatively uniform, the length of the conflicted chain in each bucket is about 1.5 nodes on average. Therefore, memcached's hash search is still very fast.


Migration thread:

There is a big problem with extended hash tables: After expansion, the length of the hash table has changed, the position after item hash also changes (recall how memcached determined the location of the bucket based on the hash value of the key value ). Therefore, to extend the hash table, you need to re-calculate all items in the hash table to the new hash location (bucket location ), migrate the item to the new bucket. This must be a time-consuming operation for all items. This operation will be called data migration later.

Because data migration is a time-consuming operation, this task is completed by a dedicated thread (call this thread a migration thread. This migration thread is created by calling a function by the main function. See the following code:

# Define DEFAULT_HASH_BULK_MOVE 1int hash_bulk_move = DEFAULT_HASH_BULK_MOVE; // The main function calls this function to start the data migration thread int forward () {int ret; char * env = getenv ("Forward "); if (env! = NULL) {// The role of hash_bulk_move will be discussed later. The environment variable hash_bulk_move is assigned hash_bulk_move = atoi (env); if (hash_bulk_move = 0) {hash_bulk_move = DEFAULT_HASH_BULK_MOVE;} if (ret = pthread_create (& done, NULL, assoc_maintenance_thread, NULL ))! = 0) {fprintf (stderr, "Can't create thread: % s \ n", strerror (ret); return-1 ;}return 0 ;}

After the migration thread is created, it enters the sleep state (by waiting for the conditional variable). When the worker thread inserts an item, it finds that the Migration thread needs to be extended by calling the assoc_start_expand function to wake up the migration thread.

Static bool started_expanding = false; // The assoc_insert function calls this function. When the number of items reaches 1.5 times the length of the hash table, the static void assoc_start_expand (void) {if (started_expanding) is called) return; started_expanding = true; pthread_cond_signal (& maintenance_cond);} static bool expanding = false; // indicates whether the hash table is in the extended state; static void * assoc_maintenance_thread (void * arg) {// do_run_maintenance_thread is a global variable, The initial value is 1. In the stop_assoc_maintenance_thread // function, 0 is assigned, and the Migration thread while (do_run_maintenance_thread) {int ii = 0; // locks item_lock_global (); mutex_lock (& cache_lock );... // perform item migration // release the lock mutex_unlock (& cache_lock) After traversing; item_unlock_global (); if (! Expanding) {// No need to migrate data (now ). /* We are done expanding .. just wait for next invocation */mutex_lock (& cache_lock); started_expanding = false; // reset // suspend the migration thread, after the worker thread inserts data, it finds that the number of items has reached 1.5 times the hash table size. // at this time, the worker thread calls the assoc_start_expand function, this function will call pthread_cond_signal // wake up the migration thread pthread_cond_wait (& maintenance_cond, & cache_lock); mutex_unlock (& cache_lock );... mutex_lock (& cache_lock); assoc_expand (); // apply for a larger hash table and set expanding to true mutex_unlock (& cache_lock) ;}} return NULL ;}


Step-by-Step data migration:

To avoid the addition or deletion of hash tables by the worker thread during migration, You need to lock the tables during data migration. Only after the worker thread gets the lock can it add or delete hash tables. To achieve rapid response (that is, the worker thread can quickly complete the add, delete, and search operations), memcached cannot lock the migration thread for too long. However, data migration is a time-consuming operation, which is a contradiction.

To solve this conflict, memcached adopts the gradual migration method. The solution is to unlock a small part of the data in a loop: "lock. The effect of doing so is: although the migration thread will seize the lock multiple times, the time it takes to occupy the lock each time is very short, which increases the probability that the worker thread will grab the lock, this allows the worker thread to quickly complete its operations. How many items are there in a small part? The global variable hash_bulk_move mentioned above indicates the number of items in the bucket. The default value is 1 bucket. For convenience, hash_bulk_move is regarded as 1.

The specific method of gradual migration is to call the assoc_expand function to apply for a new and larger hash table. Each time, only the items in one bucket of the old hash table are migrated to the new hash table, release the lock after one bucket is migrated. In this case, an old hash table and a new hash table are required. In the implementation of memcached, use primary_Hashtable indicates the new table (also known as the master table in some blog posts), and old_hashtable indicates the old table (sub-table ).

As mentioned above, the Migration thread will sleep after being created until it is awakened by the worker thread. When the migration thread wakes up, it will call the assoc_expand function to expand the table length of the hash table. The assoc_expand function is as follows:

Static void assoc_expand (void) {old_hashtable = primary_hashtable; // apply for a new hash table and use old_hashtable to point to the old hash table primary_hashtable = calloc (hashsize (hashpower + 1 ), sizeof (void *); if (primary_hashtable) {hashpower ++; expanding = true; // indicates that the extended status expand_bucket = 0; // start data migration from bucket 0} else {primary_hashtable = old_hashtable;/* Bad news, but we can keep running. */}}


Now let's take a look at the complete assoc_maintenance_thread thread function to see how the migration thread migrates data gradually. Why is it complete? This function contains some things that are not explained in this blog, but it does not prevent us from reading this function. Other blog posts will explain this thread function later.

Static unsigned int expand_bucket = 0; // point to the bucket to be migrated # define DEFAULT_HASH_BULK_MOVE 1int hash_bulk_move = DEFAULT_HASH_BULK_MOVE; static volatile int ready = 1; static void * empty (void * arg) {// do_run_maintenance_thread is a global variable with an initial value of 1. 0 is assigned to the stop_assoc_maintenance_thread function, and the Migration thread is terminated while (do_run_maintenance_thread) {int ii = 0; // lock item_lock_global (); mutex_lock (& cache_l Ock); // hash_bulk_move is used to control how many items are moved during each migration. The default value is one. // If expanding is true, the loop body is entered. Therefore, when the migration thread is just created, it does not enter the loop body for (ii = 0; ii 


Horse rifle:

Now let's look back at the insert, delete, and search operations on the hash table, because these operations may occur in the hash table migration phase. Note that the locking operation cannot be found in the insert, delete, and search operations in the assoc. c file. As mentioned above, you must preemptible the lock with the migration thread and obtain the lock to perform corresponding operations. In fact, this lock is added by the caller (main function) who inserts, deletes, and finds the lock, so it cannot be seen in the code.

During insertion, the hash table may be being expanded, so you have to choose whether to insert it to the new table or the old table? The memcached practice is: when the bucket in the old table corresponding to item has not been migrated to the new table, it will be inserted to the old table; otherwise, it will be inserted to the new table. The following is the code for inserting part.

/* Note: this isn' t an assoc_update. the key must not already exist to call this * // hv is The hash value of this item key value int assoc_insert (item * it, const uint32_t hv) {unsigned int oldbucket; // insert an item if (expanding & // currently in the extended hash table state (oldbucket = (hv & hashmask (hashpower-1)> = expand_bucket) using the header Insertion Method) // The data has not been migrated to this bucket during data migration {// insert it to the old table-> h_next = old_hashtable [oldbucket]; old_hashtable [oldbucket] = it ;} else {// insert it to the new table-> h _ Next = primary_hashtable [hv & hashmask (hashpower)]; primary_hashtable [hv & hashmask (hashpower)] = it;} hash_items ++; // Add one to the number of items in the hash table // when the number of items in the hash table reaches 1.5 times the size of the hash table, it will be extended // Of course, if it is in the extended state, if (! Expanding & hash_items> (hashsize (hashpower) * 3)/2) {assoc_start_expand (); // wake up the migration thread and expand the hash table} return 1 ;}

Here is a question: why is it not directly inserted into the new table? Directly inserting data into a new table is completely correct for data consistency. Some people say on the Internet that they want to ensure the order of items in the same bucket. However, due to the uncertainty of the lock preemption between the migration thread and the insert thread, no sequence can be guaranteed through the assoc_insert function. This article considers it to be a quick search. If it is directly inserted into the new table, you may need to search for both the new and old tables to find the item. After a table is searched, it is not found, and then another table is searched. Such a search is considered not fast enough.

If the Implementation follows the assoc_insert function, the item can be found without two tables. See the following search function.

// The hash value can only determine the bucket in the hash table ), however, a bucket contains a conflicted link // at this time, you need to use a specific key value to traverse and compare all nodes on the conflicted link one by one. Because the key is not a string ending with '\ 0' //, You need to specify the length of the key in another nkey parameter item * assoc_find (const char * key, const size_t nkey, const uint32_t hv) {item * it; unsigned int oldbucket; if (expanding & // expanding hash table (oldbucket = (hv & hashmask (hashpower-1)> = expand_bucket) // The item is still in the old table {it = old_hashtable [oldbucket];} else {// The key belongs to the bucket by the hash value) it = primary_hashtable [hv & hashmask (hashpower)];} // you have determined which item you want to find belongs. Table, and the location of the bucket is also determined. After traversing the conflicting chains of the corresponding bucket, you can call memcmp comparison only when item * ret = NULL; while (it) {// has the same length, more efficient if (nkey = it-> nkey) & (memcmp (key, ITEM_key (it), nkey) = 0) {ret = it; break ;} it = it-> h_next;} return ret ;}


The delete operation is similar to the search operation. It is directly posted here. The delete operation also needs to be searched.

Void assoc_delete (const char * key, const size_t nkey, const uint32_t hv) {item ** before = _ hashitem_before (key, nkey, hv ); // obtain the h_next Member Address of the precursor node if (* before) {// find the successful item * nxt; hash_items --; // because before is a second-level pointer, the value is the h_next Member Address of the first item to be searched. // so * before points to the item to be searched. because before is a second-level pointer, you can assign values to the h_next member variable when/* before is used as the left value. So the following three lines of code are // so that after the intermediate item is deleted, the items before and after can be connected. Nxt = (* before)-> h_next; (* before)-> h_next = 0;/* probably pointless, but whatever. */* before = nxt; return;}/* Note: we never actually get here. the callers don't delete things they can't find. */assert (* before! = 0);} // search for items. Returns the h_next Member Address of the first node. If the search fails, the h_next Member Address of the last node in the conflict chain is returned. Because the h_next value of the last node is NULL. You can check whether the search is successful by using the * operator on the return value. Static item ** _ hashitem_before (const char * key, const size_t nkey, const uint32_t hv) {item ** pos; unsigned int oldbucket; if (expanding & // expanding the hash table (oldbucket = (hv & hashmask (hashpower-1) >=expand_bucket) {pos = & old_hashtable [oldbucket];} else {// locate the bucket location in the hash table pos = & primary_hashtable [hv & hashmask (hashpower)];} // you have determined which table the item belongs to and the location of the bucket. You can traverse the conflicting chains of the corresponding bucket to find the item while (* pos & (nkey! = (* Pos)-> nkey) | memcmp (key, ITEM_key (* pos), nkey) {pos = & (* pos)-> h_next ;} // * the pos can check whether the search is successful. If * pos is NULL, the search fails. Otherwise, the search is successful. Return pos ;}


From the above discussion, we can know whether the bucket corresponding to this item has been migrated to the new table.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.