Redis research-associative array of 3.2 data Structures (dictionary)

Source: Internet
Author: User
Tags rehash

The source code to be learned in this chapter is in Dict.h and dict.c two files.

In the Java language, or in other languages that support associative arrays, the first thing we know is that associative arrays (dictionaries) are "arrays" of key-value, so how do you step-by-step through Redis? Let's break it down, the associative array (the dictionary) is the "array" of Key-value, and the first thing we have to do is key-value this structure.

Key-value structure typedef struct DICTENTRY {        //key    void *key;    Value    Union {        void *val;        uint64_t U64;        int64_t s64;    } V;    Why do you need this? This is the struct dictentry *next used to solve the key conflict problem    ;} dictentry;


The structure defined above, key represents the key, the value can be a pointer, it can be a uint64_t integer, or it can be a int64_t integer. So, what's the specific role of next? The purpose of this pointer is to concatenate multiple key-value pairs with the same hash value, which can be used to resolve key conflict issues.

The next question is, how do you build an "array"? The definition in Redis is shown in the following code:

typedef struct DICTHT {        //array    dictentry **table;    Size    unsigned long size;    unsigned long sizemask;    The number of nodes already    unsigned long used;} dictht;



The table above is an array, and the element of each array is a pointer to Dictentry. and the Size property records the sizes in the table, why do you have this thing? We often hear the name "hash bucket", which is the role of "hash bucket" to indicate how many barrels of this hash table, then, what is used? He represents the current number of elements in table (but I think it's more about how many indexes have been used). A sizemask now, he is God horse? He is closely related to the hash, and the size of the Sizemark is always equal to size-1, as far as the hash is concerned, the back is used again.

Next, it should be our ultimate goal-associative array (dictionary), which he defines in Redis:

typedef struct DICT {    dicttype *type;    void *privdata;    Dictht ht[2];    int rehashidx; /* rehashing not in progress if rehashidx = = 1 */    int iterators;/* Number of iterators currently running */} dict;

We know that to implement a generic dictionary, when you define, you cannot use specific types, and therefore, you can not specify a specific operation, so in the Redis dictionary, for different types, you are able to configure their own operations, the type attribute is to play this role, his definition is as follows:

For different dictionary types, bind different operation functions typedef struct DICTTYPE {    //function    unsigned int (*hashfunction) (const void *key) that computes the hash value;    The function of the copy key is    void * (*keydup) (void *privdata, const void *key);    function of copy value    void * (*valdup) (void *privdata, const void *obj);    The function of the contrast key is    int (*keycompare) (void *privdata, const void *key1, const void *key2);    Destroys the key's function    void (*keydestructor) (void *privdata, void *key);        Destroys the value of the function    Void (*valdestructor) (void *privdata, void *obj);} Dicttype;

So what is the Privdata property for? From the different functions we can bind to different types, let's consider this attribute as a property that stores general data.

The only thing that really stores data is the HT array, which has two elements of the dictht type, why do you need two? One of them is used to store the real key-value, and the other one is used for rehash.

What do you rehashidx this whole number for? To indicate the progress of the rehash, if the dictionary is not rehash, then his value is-1.

The iterators integer is used to record the iterator that is being used on the current dictionary.

From the Key-value structure definition to the Key-value Array (table) definition, to the dictionary definition, the implementation of the route is clear. According to the above definition we can see that so far, we have three key implementations or concepts that are not clearly understood, namely hash and conflict, and re-hash.

What is a hash?

As a simple example, when we want to add a key value pair k1-v1 to a dictionary dict, from the above we know that the real storage data is the HT array in this dict, and the element of this HT group is DICTHT, also an array, for arrays, One of the most commonly used properties is the index of the array, so if you want to add this key-value pair to the array of the dictionary, you need to calculate which index of the key-value pair should be placed in the dictionary's array.

For the above description, when we want to add a key value pair to the dictionary, we need to go through the following steps:

1. Use the hashfunction in the type of this dict (dictionary) to calculate the hash value of this key value:

Keyhashvalue=dict->type->hashfunction (K1);

2. As we said earlier, there are two very important attributes in a hash table, one is size (to indicate how many hash buckets), and the other is the Sizemark attribute (his value equals size-1), and the hash value obtained with Sizemark and above can be used to index the array:

index=keyhashvalue&ht[0].sizemark;//we specify that the first hash table that stores the data is HT

From the two steps above, the performance and data distribution here depends primarily on the hash function you are bound to.

What is a hash conflict?
Why is there a hash conflict? As we add new key-value pairs from above, we are most likely to encounter the same index of the array that the different keys are computed from, and this time we say there is a hash conflict. So, in Redis, how did he solve the problem? The answer is the next pointer we have defined in Dictentry. With this pointer, different key-value pairs that have the same hash value form a linked list. And we see that the form of this kind of linked list is not head and tail, so for performance reasons, the addition of different key value pairs with the same hash value will be placed in the list header, thereby reducing the complexity.


What is a heavy hash (rehash)?

Before we say a heavy hash, we should understand what a load factor is. The so-called load factor is the number of nodes that have been stored in your hash table (N) divided by the capacity of the hash table (M), here M>=n, then the load factor is n/m, this ratio shows that your hash list is full.

Understanding the load factor makes it easier to understand why there is a heavy hash. In our operation on the dictionary, it causes the dictionary to store more or less key values, which in turn leads to a wide range of load factors, and in order to ensure that the load factor is within our scope, we need to do a heavy hash. How do you do it?

Under certain circumstances (this is the case in later chapters), the program touches the hash operation, which is the following steps:

1. Allocate space for the dictionary's ht[1], the size of which is the first n-ht[0].used*2 of a number greater than 2. (such as used=4, then 4*2=8, and 8 happens to be 2 3 times.) If used=5,5*2=10, and more than 10 of the N of the 2 of the square should take 4, so the size of ht[1] should be 2^4=16, and so on).

2. Re-calculate the hash of the key value in Ht[0] on ht[1].

3. When all key-value pairs in ht[0] have been transferred to Ht[1], release ht[0] and set ht[1] to ht[0] and create a new blank hash table on ht[1] for the next use.

However, there will be a problem, when the ht[0] on the key value of the super-many times, is not to stop responding, only do rehash? If that's the case, there's no need for Redis, so a progressive rehash is used in Redis. How to play it? The key is to dict->rehashidx this counter play a role.

1. Allocate space for ht[1], this dict has both ht[0] and ht[1] two hash tables;

2. When making a hash, set the REHASHIDX to the index of the hash being flushed;

3. The key value on ht[0] is hedged on the hash to ht[1], and when the hash is complete, the REHASHIDX is set to-1;

Therefore, all operations are directed at two hash tables during the hash.

Broadly speaking, here are the common APIs.

Create a new dictionary dict *dictcreate (dicttype *type,        void *privdataptr) {    dict *d = zmalloc (sizeof (*D));    _dictinit (d,type,privdataptr);    return D;}

The function above uses a private function, _dictinit. Defined as follows:

Initialize the dictionary int _dictinit (dict *d, Dicttype *type,        void *privdataptr) {    //Initialize, as you can see from the function below, there is no space allocated here    _dictreset (&d->ht[0]);    _dictreset (&d->ht[1]);    Set type-specific function    d->type = type;    Set private data    d->privdata = privdataptr;    Set hash table rehash status    d->rehashidx =-1;    Set the number of security iterators for the dictionary    d->iterators = 0;    return DICT_OK;}

It uses the _dictreset private function:

static void _dictreset (Dictht *ht) {    ht->table = NULL;    ht->size = 0;    Ht->sizemask = 0;    ht->used = 0;}



Add a new key value pair int Dictadd (dict *d, void *key, void *val) {        Dictentry *entry = Dictaddraw (D,key);    The key already exists    if (!entry) return dict_err;    The key does not exist    Dictsetval (d, entry, Val);    Add successful    return DICT_OK;}

Dictentry *dictaddraw (dict *d, void *key) {    int index;    Dictentry *entry;    DICTHT *ht;    If Dict is making a hash, then step rehash    if (dictisrehashing (d)) _dictrehashstep (d);    /* Get The index of the new element, or-1 if * the element already exists. *////    /////////////If the value is-1, then the table The display key already exists    if (index = _dictkeyindex (d, key)) = =-1)        return NULL;    /* Allocate the memory and store the new entry *    ///If the dictionary is rehash, add the new key to the 1th hash table    //Otherwise, add the new key to the No. 0 hash table    HT = Dictisrehashing (d)? &D->HT[1]: &d->ht[0];    Allocate space for new nodes    entry = zmalloc (sizeof (*entry));    Inserting a new node into the list header    Entry->next = ht->table[index];    Ht->table[index] = entry;    Update hash table already used number of nodes    ht->used++;    /* Set the hash entry fields. *      /dictsetkey (d, entry, key);    return entry;}

static void _dictrehashstep (Dict *d) {    if (d->iterators = = 0) dictrehash (d,1);}


int Dictrehash (dict *d, int n) {//Is not thread safe OH//dict does not return directly to if (rehash (d)) return 0 when!dictisrehashing;        N-Step migration while (n--) {dictentry *de, *nextde; /* Check If we already rehashed the whole table ... * *//If the No. 0 hash table is empty, then the rehash execution is complete if (d->ht[0].used =            = 0) {//release No. 0 Hash table zfree (d->ht[0].table);            Set the original No. 1th hash table to the new No. 0 hash table d->ht[0] = d->ht[1];            Resets the old 1th hash table _dictreset (&d->ht[1]);            Close Rehash Identification d->rehashidx =-1;        Rehash has completed return 0;         }/* Note that REHASHIDX can ' t overflow as we are sure there is more * elements because ht[0].used! = 0 */        Ensure that REHASHIDX does not have a cross-border assert (D->ht[0].size > (unsigned) d->rehashidx);        The index that is empty in the array is omitted, and the next non-empty index is found while (d->ht[0].table[d->rehashidx] = NULL) d->rehashidx++; Point to the index of the list header node de = d->ht[0].table[d->rehashidx];            /* Move all the "keys" in the "this bucket" from the old to the new hash HT */////To migrate all nodes in the list to a fresh hash table while (DE) {            unsigned int h;            Save pointer to next node NEXTDE = de->next; /* Get The index in the new hash table *//Calculate the hash value of the newly hash table, as well as the position of the node insertion h = Dicthashkey (d, De->key) & Amp            d->ht[1].sizemask;            Inserting a node into a new hash table De->next = d->ht[1].table[h];            D->ht[1].table[h] = de;            Update Counter d->ht[0].used--;            d->ht[1].used++;        Continue processing the next node de = NEXTDE;        }//Set the pointer for the hash table index just after the migration to null D->HT[0].TABLE[D->REHASHIDX] = NULL;    Update Rehash Index d->rehashidx++; } return 1;}

Dictentry *dictfind (dict *d, const void *key) {    dictentry *he;    unsigned int h, IDX, table;    The dictionary is empty, directly returning null    if (d->ht[0].size = = 0) return null;/* We have a table at all *//    If Dict is rehash, then enter Line Rehash    if (dictisrehashing (d)) _dictrehashstep (d);    Computes the hash value of the key    h = Dicthashkey (d, key);    Look for this key in the dictionary's hash table, where there are two hash tables for    (table = 0; table <= 1; table++) {        //Calculate index value        idx = h & d->ht[table]. Sizemask;        Traverse all nodes of a linked list on a given index, find key        he = d->ht[table].table[idx];        while (HE) {//found returns the            if (Dictcomparekeys (d, Key, He->key)) return                he;            he = he->next;        }         If the operation is not found here, first of all to determine whether Dict is in the rehash, if it is, you want to go to another hash table to find, not found to return null        if (!dictisrehashing (d)) return null;    }    The two hash tables did not find the return NULL when they were here    .}

In Dict, obtain the specified key corresponding to the Valuevoid *dictfetchvalue (dict *d, const void *key) {    dictentry *he;    he = Dictfind (D,key);    Return he? Dictgetval (He): NULL;}

The above has been said to increase, check, the following also changed, deleted

static int Dictgenericdelete (dict *d, const void *key, int nofree) {unsigned int h, IDX;    Dictentry *he, *prevhe;    int table; DICT is empty, return delete error if (d->ht[0].size = = 0) return dict_err;    /* D->ht[0].table is NULL *///Single Step Rehash if (dictisrehashing (d)) _dictrehashstep (d);    Computes the hash value H = dicthashkey (d, key);        Traverse hash table for (table = 0; table <= 1; table++) {//Calculate index value idx = h & d->ht[table].sizemask;        Point to the list on the index he = d->ht[table].table[idx];//This could be a list prevhe = NULL;                Traverse all nodes on the list while (he) {if (Dictcomparekeys (d, Key, He->key)) {//Find target Node                    /* Unlink the element from the list///Remove the IF (PREVHE) from the linked list                Prevhe->next = he->next;                else D->ht[table].table[idx] = he->next;                Release the call key and the value of the deallocation function? if (!nofree) {diCtfreekey (d, he);                Dictfreeval (d, he);                }//Release node itself zfree (he);                Update the number of used nodes, personally think there is a problem, because a node may exist a linked list, and this time the deletion is probably only a part of the list, so the number of nodes is not less d->ht[table].used--;            Returns the signal that has been found return DICT_OK;            } prevhe = he;        he = he->next;     }//If executed here, indicates that the given key is not found in the No. 0 hash table//Then, depending on whether the dictionary is in progress or not, decide if you want to find the 1th hash rehash if (!dictisrehashing (d)) break; }//did not find return dict_err; /* Not found */}
int Dictdelete (dict *ht, const void *key) {    return dictgenericdelete (ht,key,0);//To invoke the function that freed the node}


int Dictdeletenofree (dict *ht, const void *key) {    return dictgenericdelete (ht,key,1);//Do not call release function}



int Dictreplace (dict *d, void *key, void *val) {    dictentry *entry, auxentry;    /* Try to add the element. If the key     * does not exists Dictadd will suceed. *//    Try to add key-value pairs directly to the dictionary    //If key key does not exist, add will succeed if    (Dictadd (d , key, val) = = DICT_OK)        return 1;    /* It already exists, get the entry *    //Run here, the key keys already exist, then find the node containing this key    entry = Dictfind (d, key);    /* Set The new value and free the old one. Note that it was important     * to-do and this order, as the value could just be exactly the same     * as the Previou S one. In this context, think to reference counting,     * want to increment (set), and then decrement (free), and not the
   * reverse. *    ///First save the original value of the pointer    auxentry = *entry;    Then set the new value    Dictsetval (d, entry, Val);    Then release the old value    dictfreeval (d, &auxentry);    return 0;}




when we learn the Java Collection class, one of the most commonly used weapons is an iterator, and in the dict of Redis, we also implement iterators, which are classified as safe and unsafe.

typedef struct DICTITERATOR {            //iterated dictionary    dict *d;    Table: The hash table number that is being iterated, and the value can be 0 or 1.    //Index: The index position of the Hashtable to which the iterator is currently pointing.    //Safe iterator is safe, when it is 1, he is safe, otherwise unsafe    int table, index, safe;    Entry: pointer//NextEntry to the node currently iterated    : The next node of the current iteration node, because when the security iterator is operating, the node that the entry only has is likely to be modified, so an extra pointer is needed to hold the position of the next node, thus preventing the pointer from being lost    dictentry *entry, *nextentry;    Long long fingerprint; /* Unsafe iterator fingerprint for misuse detection */} dictiterator;


Generates an insecure iterator dictiterator *dictgetiterator (dict *d) {    Dictiterator *iter = zmalloc (sizeof (*iter));    Iter->d = D;    iter->table = 0;    Iter->index =-1;    Iter->safe = 0;    Iter->entry = NULL;    Iter->nextentry = NULL;    return ITER;}

Generate a secure iterator dictiterator *dictgetsafeiterator (dict *d) {    Dictiterator *i = Dictgetiterator (d);    I->safe = 1;    return i;}

OK, this section is a bit more, please forgive me, if you have any questions, please contact qq:359311095

Redis research-associative array of 3.2 data Structures (dictionary)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.