Dict Dictionary of Redis underlying data structures 2

Source: Internet
Author: User
Tags rehash






For the questions raised in the previous article, this time answer:






As can be seen from the rehash process, ht[0] and ht[1] have entries in the rehash process, i.e. all entries in the dictionary are distributed in Ht[0] and ht[1],



And then the trouble came out. The main problems are as follows: (now how to solve the problem)






1. How to find key.



2. How to insert a new key.



3. How to delete a key.



4. How to ensure that the rehash process constantly inserts, deletes entries, and rehash no errors.



5. How to traverse dict all entries, how to ensure the traversal order.



6. How to ensure that the iterator is valid and correct.






1. How to find key


dictEntry * dictFind (dict * d, const void * key)
{
     dictEntry * he;
     unsigned int h, idx, table;
     if (d-> ht [0] .size == 0) return NULL; / * We do n‘t have a table at all * /
     if (dictIsRehashing (d)) _dictRehashStep (d); // If rehash is in progress, perform a rehash operation
     h = dictHashKey (d, key); // Calculate the hash of the key
     // First look up on the ht [0] table
     for (table = 0; table <= 1; table ++) {
         idx = h & d-> ht [table] .sizemask;
         he = d-> ht [table] .table [idx];
         while (he) {
             if (dictCompareKeys (d, key, he-> key))
                 return he;
             he = he-> next;
         }
         // When not found on ht [0], if rehash is being performed now, the key may be on ht [1], you need to find on ht [1]
         if (! dictIsRehashing (d)) return NULL;
     }
     return NULL;
}





Because there are entries on ht[0] and ht[1] on rehash, it is necessary to find the element in all two tables to determine if the element exists. As for which table to look for first, it does not affect the result.



During the lookup process, if rehash is in progress, a rehash operation is performed, which corresponds to the rehash implementation, because rehash is not completed at one time and needs to be divided into multiple finishes. So how do you divide it into multiple times and when should I perform a rehash operation? In the Dictrehash function, you already know how to divide it into several times, and the execution is scattered into some operations, such as finding elements. This decentralized rehash step does not have a significant impact on a single query request, keeping query performance stable.






2. How to insert a new key


// Add entries to the dictionary
/ * Add an element to the target hash table * /
int dictAdd (dict * d, void * key, void * val)
{
    dictEntry * entry = dictAddRaw (d, key); // Insert key
    if (! entry) return DICT_ERR;
    dictSetVal (d, entry, val); // Set the value corresponding to the key
    return DICT_OK;
}
/ * Low level add. This function adds the entry but instead of setting
 * a value returns the dictEntry structure to the user, that will make
 * sure to fill the value field as he wishes.
 *
 * This function is also directly exposed to the user API to be called
 * mainly in order to store non-pointers inside the hash value, example:
 *
 * entry = dictAddRaw (dict, mykey);
 * if (entry! = NULL) dictSetSignedIntegerVal (entry, 1000);
 *
 * Return values:
 *
 * If key already exists NULL is returned.
 * If key was added, the hash entry is returned to be manipulated by the caller.
 * /
dictEntry * dictAddRaw (dict * d, void * key)
{
    int index;
    dictEntry * entry;
    dictht * ht;
    if (dictIsRehashing (d)) _dictRehashStep (d); // rehash
    // If key already exists, return null
    / * Get the index of the new element, or -1 if
     * the element already exists. * /
    if ((index = _dictKeyIndex (d, key)) == -1)
        return NULL;
    // If rehash is in progress, insert the new element into ht [1], otherwise insert into ht [0]
    / * Allocate the memory and store the new entry * /
    ht = dictIsRehashing (d)? & d-> ht [1]: & d-> ht [0];
    entry = zmalloc (sizeof (* entry));
    entry-> next = ht-> table [index];
    ht-> table [index] = entry;
    ht-> used ++;
    / * Set the hash entry fields. * /
    dictSetKey (d, entry, key); // Insert
    return entry;
}





When Dict is not rehash, it is easier to insert elements into ht[0]. But if rehash is in progress, insert the element into ht[1]. Why do you have to insert elements into ht[1] instead of ht[0]? The reason is in the process of rehash. Rehash is the process of moving entries from ht[0] to ht[1], and when all entries are moved, the rehash process is complete. To ensure that the rehash process can be completed, there are several points to note:



The elements of a. Ht[0] cannot continue to grow, even if the element is not growing faster than moving elements to ht[1].



B. Determine the next item to be moved (such as whether the next entry is determined by a method, can traverse all the entries on the ht[0])



C. Determining when to move all entries






The reason the element cannot be inserted into ht[0] is to make sure that B. In the rehash process, the buckets that have been processed by REHASHIDX are recorded, because the rehashidx is linearly growing and will eventually traverse all the buckets on ht[0], but if you want rehash to traverse all the entries, you also need to ensure that the processed buckets are no longer able to insert new elements. So the new element can only be inserted into the ht[1]. In addition, because no new elements are inserted into ht[0], A is also guaranteed.






3. How to delete a key.


// First look in ht [0], if it can't find it, look in ht [1], delete if there is.
/ * Search and remove an element * /
static int dictGenericDelete (dict * d, const void * key, int nofree)
{
    unsigned int h, idx;
    dictEntry * he, * prevHe;
    int table;
    if (d-> ht [0] .size == 0) return DICT_ERR; / * d-> ht [0] .table is NULL * /
    if (dictIsRehashing (d)) _dictRehashStep (d);
    h = dictHashKey (d, key);
    for (table = 0; table <= 1; table ++) {
        idx = h & d-> ht [table] .sizemask;
        he = d-> ht [table] .table [idx];
        prevHe = NULL;
        while (he) {
            if (dictCompareKeys (d, key, he-> key)) {
                / * Unlink the element from the list * /
                if (prevHe)
                    prevHe-> next = he-> next;
                else
                    d-> ht [table] .table [idx] = he-> next;
                if (! nofree) {
                    dictFreeKey (d, he);
                    dictFreeVal (d, he);
                }
                zfree (he);
                d-> ht [table] .used--;
                return DICT_OK;
            }
            prevHe = he;
            he = he-> next;
        }
        if (! dictIsRehashing (d)) break;
    }
    return DICT_ERR; / * not found * /
}





4. How to ensure that the rehash process constantly inserts, deletes entries, and rehash no errors.






As you can see from the insert and delete process, rehash does not make an error.






5. How to traverse dict all entries, how to ensure the traversal order.



6. How to ensure that the iterator is valid and correct.






The traversal of Dict is an iterator with two types of iterators, one is a normal iterator, and the other is a security iterator, which is unsafe compared to an ordinary iterator.






Iterators are the tools that are used by many data structures (containers) to iterate through the elements of an element. There are some issues to be aware of when using iterators:



A. Traversal order of iterators



B. Whether the iterator can alter the elements of the container during the traversal of the element, such as how the elements of the container will be affected, such as traversal order, iterator invalidation






Now look at the Dict iterator.






The traversal order is indeterminate, which is basically considered unordered.



Ordinary iterators do not allow personality dict during traversal. Security iterators are allowed.






Look at the code below,


// Create a normal iterator
dictIterator * dictGetIterator (dict * d)
{
    dictIterator * iter = zmalloc (sizeof (* iter));
    iter-> d = d; // Record dict
    iter-> table = 0;
    iter-> index = -1;
    iter-> safe = 0; // ordinary iterator
    iter-> entry = NULL;
    iter-> nextEntry = NULL;
    return iter;
}
// Create a secure iterator
dictIterator * dictGetSafeIterator (dict * d) {
    dictIterator * i = dictGetIterator (d);
    i-> safe = 1; // safe iterator
    return i;
}
// traversal process
dictEntry * dictNext (dictIterator * iter)
{
    while (1) {
        if (iter-> entry == NULL) {
            // The current entry is null, it may be just created, it may be an empty bucket, it may be the last entry to reach the bucket, or it may be iterating through all buckets
            dictht * ht = & iter-> d-> ht [iter-> table];
            if (iter-> index == -1 && iter-> table == 0) {
                // the iterator just created
                if (iter-> safe)
                    iter-> d-> iterators ++; // If it is a safe iterator, write it down in the dict
                else
                    iter-> fingerprint = dictFingerprint (iter-> d); // Ordinary Iterator
            }
            iter-> index ++; // Next bucket
            if (iter-> index> = (long) ht-> size) {
                // If the table has been traversed, if rehash is currently being performed, and ht [0] is traversed, then ht [1]
                if (dictIsRehashing (iter-> d) && iter-> table == 0) {
                    iter-> table ++;
                    iter-> index = 0;
                    ht = & iter-> d-> ht [1];
                } else {
                    break; // Completed
                }
            }
            // Make a note of the current entry
            iter-> entry = ht-> table [iter-> index];
        } else {
            // point to the next entry
            iter-> entry = iter-> nextEntry;
        }
        if (iter-> entry) {
            // Find the entry and note down the next entry for this entry
            / * We need to save the ‘next’ here, the iterator user
             * may delete the entry we are returning. * /
            iter-> nextEntry = iter-> entry-> next;
            return iter-> entry; // return the found entry
        }
    }
    // No entry found, dict has been traversed
    return NULL;
} 





You can see the three sequence of iterator traversal from the above traversal process:



A. Traverse Ht[0], if rehash is in progress, after traversing all buckets of ht[0], traverse ht[1]



B. In a HT, traversal is a small-to-large traversal of a bucket



C. Multiple entries in the same bucket, the traversal order is traversed from the chain head to the end of the chain, but the position of the entry in the chain itself is also indeterminate.






It can be concluded from the above three sequence that the iterator traversal process is unordered.






The following is a discussion of whether iterators can traverse all the entries. At this point, the common iterators are separated from the security iterators for discussion.






An ordinary iterator, seen from the code, that computes the fingerprint of the dict as the normal iterator begins to traverse, allowing Dict to insert, delete entries, and rehash during traversal. However, when the iterator is released, it compares the traversed dict with the fingerprint of the dict before the traversal, and the program exits if there is an inconsistency. At this point, it can be known that the normal iterator does not allow traversal, although traversing the time code is not blocked, but in the end it will cause the program to exit error. However, the comparison fingerprint the same, does not indicate that Dict has not changed, can only say if fingerprint different dict must have issued a change.






void Dictreleaseiterator (Dictiterator *iter)



{



if (! ( Iter->index = =-1 && iter->table = = 0)) {



if (Iter->safe)



iter->d->iterators--;



Else



ASSERT (Iter->fingerprint = = Dictfingerprint (iter->d));



}



Zfree (ITER);



}






Security iterators, which are noted on dict at the beginning of a traversal, are no different from normal iterators. So what does it take to write a security iterator on Dict? By looking up the code, you can see that the security iterator counter using Dict is where the _dictrehashstep function is.






/* This function performs just a step of rehashing, and only if there is



* No safe iterators bound to our hash table. When we had iterators in the



* Middle of a rehashing we can ' t mess with the both hash tables otherwise



* Some element can be missed or duplicated.



*



* This function was called by Common Lookup or update operations in the



* Dictionary So, the hash table automatically migrates from H1 to H2



* While it is actively used. */



static void _dictrehashstep (Dict *d) {



if (d->iterators = = 0) dictrehash (d,1); Rehash operation is allowed if the security iterator counter is 0



}






The function Dictreleaseiterator from the release iterator can see that the fingerprint operation is not checked, so the so-called security iterator can be derived, but in fact it means:



A. Insertion and deletion of entries can be allowed during an iteration



B. Rehash will not be performed during the iteration, such as rehash before the iteration begins, rehash will be paused after the iteration is completed and rehash proceed after the iteration is complete.






Since the traversal process allows insertions, deletions, and how to traverse the process.



When inserting an element, there is no significant impact on the traversal process, but it is not deterministic to traverse to the element that was just inserted.



When deleting an element, there are four cases: Delete the element that has been traversed, delete the current element, delete the next element to traverse, and delete the non-traversed element that is not the next one to traverse.



Deleting an element that has already been traversed has no effect on the traversal process.



Deleting the current element also has no effect on the traversal process, because the current element is already accessed and the iterator does not rely on the current element when removing an element.



deleting the next element to traverse can be divided into two cases where the next element has been recorded in the iterator's nextentry and not recorded in the iterator. If the next element is not recorded in the nextentry of the iterator, there is no effect on the traversal process. If it is already logged in the NextEntry, the iterator fails at this point, and attempting to access the next element will produce an unexpected effect.



Deleting the non-traversed element that is not the next one to traverse is also affected by the traversal process, except that the element that has been deleted is not traversed.






From the discussion above, the security iterator is not really secure, and the removal of the element may cause the iterator to fail.






It is now discussed why security iterators do not allow rehash during traversal, because if rehash is allowed, the traversal process will not guarantee that some elements may traverse multiple times and some elements will not be traversed. Here are some scenarios:



A. The iterator now traverses to ht[0] An element x, at which point X is in bucket 2nd, because rehash can do that, just move the element y of the # 1th bucket of ht[0] to ht[1], after which the iterator iterates through ht[0], and then iterates over the Y again.



B. The iterator is traversing to the ht[1] 4th bucket, and the subsequent buckets have not been traversed, at which point the rehash process is carried out and all elements of ht[0] are moved to Ht[1], the rehash process is completed, Ht[1] is switched to ht[0]. Since the iterator in the record is currently traversing ht[1], then the iterator iterates through the elements of the number 4th bucket of ht[1] (original ht[0]), and the traversal process ends, while there are actually some elements that are not traversed.






As you can see from the above discussion, rehash cannot be allowed during traversal.






As you can see from the above discussion, using a security iterator, as long as there is no action to delete the element, the traversal process is basically no problem, and the elements that already exist at the beginning of the traversal are traversed. Just using the security iterator itself has a certain effect on the dict. One is to suspend the rehash process, and the other is to hold the security iterator without releasing it, and the rehash process cannot go on.






This article is from the "Chhquan" blog, make sure to keep this source http://chhquan.blog.51cto.com/1346841/1827440



Dict Dictionary of Redis underlying data structures 2


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.