Redis Dictionary (dict) rehash process source code parsing

Source: Internet
Author: User
Tags rehash
Redis' memory storage structure is a large dictionary storage, which is what we usually refer to as a hash table. Redis is as small as CACHE capable of storing tens of thousands of records, and as large as tens of millions or even hundreds of millions of records (depending on memory), which fully shows that Redis is powerful as a buffer. The core data structure of Redis is a dictionary (dict), which is in the process of increasing data volume. You will encounter the problem of HASH (key) collision. Assuming that DICT is not large enough, the probability of collision increases, so that the elements stored in a single hash bucket will be more and more, and the query efficiency will become slower. Assuming that the amount of data has changed from tens of millions to tens of thousands, a process of continuous reduction. DICT memory will cause unnecessary waste. Redis's dict fully considered the dict's own expansion and contraction in the design process, and realized a process called rehash.

There are two conditions for making dict rehash:

1) The total number of elements is divided by the number of DICT buckets to get the average number of elements stored in each bucket (pre_num). Assuming pre_num> dict_force_resize_ratio, the dict expansion operation will be triggered. dict_force_resize_ratio = 5.

2) In total elements * 10 <the number of buckets, that is, the filling rate must be <10%,
DICT will shrink. Make total / bk_num close to 1: 1.


dict rehash expansion process:



Source code function call and analysis:

dictAddRaw-> _ dictKeyIndex-> _ dictExpandIfNeeded-> dictExpand, this function call relationship is to expand the call relationship of dict,
_dictKeyIndex function code:


static int _dictKeyIndex (dict * d, const void * key)
{
    unsigned int h, idx, table;
    dictEntry * he;

    // Suppose there is a need. Expand the dictionary
    if (_dictExpandIfNeeded (d) == DICT_ERR)
        return -1;

    // Calculate the hash value of key
    h = dictHashKey (d, key);

    // Find the given key in two hash tables
    for (table = 0; table <= 1; table ++) {

        // According to the hash value and the sizemask of the hash table
        // Calculate which index the key may appear in the table array
        idx = h & d-> ht [table] .sizemask;

        // Find the given key in the node list
        // Since the number of elements in the linked list is usually 1 or a very small ratio
        // So we can treat this operation as O (1)
        he = d-> ht [table] .table [idx];
        while (he) {
            // key already exists
            if (dictCompareKeys (d, key, he-> key))
                return -1;

            he = he-> next;
        }

        // When it is executed for the first time, it means that it has finished searching for d-> ht [0]
        // At this time, assume that the hash table is not in rehash. There is no need to find d-> ht [1]
        if (! dictIsRehashing (d)) break;
    }

    return idx;
}
_dictExpandIfNeeded function code analysis:
static int _dictExpandIfNeeded (dict * d)
{
    // Already in progressive rehash where, return directly
    if (dictIsRehashing (d)) return DICT_OK;

    // Assume that the hash table is empty. Then expand it to the initial size
    // O (N)
    if (d-> ht [0] .size == 0) return dictExpand (d, DICT_HT_INITIAL_SIZE);

    // Assume the number of used nodes of the hash table> = the size of the hash table.
    // And any of the following conditions are true:
    // 1) dict_can_resize is true
    // 2) The ratio of the number of used nodes divided by the size of the hash table is greater than
    // dict_force_resize_ratio
    // Then call dictExpand to expand the hash table
    // The expanded volume is at least twice the number of used nodes
    // O (N)
    if (d-> ht [0] .used> = d-> ht [0] .size &&
        (dict_can_resize ||
         d-> ht [0] .used / d-> ht [0] .size> dict_force_resize_ratio))
    {
        return dictExpand (d, d-> ht [0] .used * 2);
    }

    return DICT_OK;
}

dict rehash narrowing process:


Source code function call and analysis:

serverCron-> tryResizeHashTables-> dictResize-> dictExpand

The serverCron function is a heartbeat function, calling the tryResizeHashTables section is:


int serverCron (struct aeEventLoop * eventLoop, long long id, void * clientData) {
    ....
    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
        // Maintain the hash table ratio near 1: 1
        tryResizeHashTables ();
        if (server.activerehashing) incrementallyRehash (); // Rehash action
    }
    ....
}
Code analysis of tryResizeHashTables function:

void tryResizeHashTables (void) {
    int j;

    for (j = 0; j <server.dbnum; j ++) {

        // Narrow the key space dictionary
        if (htNeedsResize (server.db [j] .dict))
            dictResize (server.db [j] .dict);

        // Narrow the expiration time dictionary
        if (htNeedsResize (server.db [j] .expires))
            dictResize (server.db [j] .expires);
    }
}



The htNeedsResize function is to infer whether the condition of dict reduction is necessary. The fill rate must be> 10%, otherwise it will be reduced. The detailed code is as follows:
int htNeedsResize (dict * dict) {
    long long size, used;

    // Hash table size
    size = dictSlots (dict);

    // Number of used nodes in the hash table
    used = dictSize (dict);

    // When the size of the hash table is greater than DICT_HT_INITIAL_SIZE
    // and the filling rate of the dictionary is lower than REDIS_HT_MINFILL
    // returns 1
    return (size && used && size> DICT_HT_INITIAL_SIZE &&
            (used * 100 / size <REDIS_HT_MINFILL));
}
dictResize function code:
int dictResize (dict * d)
{
    int minimal;

    // cannot be false in dict_can_resize
    // Or call when the dictionary is rehash
    if (! dict_can_resize || dictIsRehashing (d)) return DICT_ERR;

    minimal = d-> ht [0] .used;

    if (minimal <DICT_HT_INITIAL_SIZE)
        minimal = DICT_HT_INITIAL_SIZE;

    return dictExpand (d, minimal);
}

The above two processes finally call the dictExpand function. This function is mainly to generate a new HASH table (dictht) and let dict.rehashidx = 0. Indicates that the rehash action is started. The detailed rehash action is to inject the data of ht [0] to ht [1] again according to the rules of hash invisible. The detailed code example is as follows:
int dictExpand (dict * d, unsigned long size)
{
    dictht n; / * New hash table of transferred data * /
    
    // Calculate the true size of the hash table
    unsigned long realsize = _dictNextPower (size);
    if (dictIsRehashing (d) || d-> ht [0] .used> size || d-> ht [0] .size == realsize)
        return DICT_ERR;

    // Create and initialize a new hash table
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = zcalloc (realsize * sizeof (dictEntry *));
    n.used = 0;

    // Assuming ht [0] is empty, then this is the act of creating a new hash table at a time
    // Set the new hash table to ht [0] and return
    if (d-> ht [0] .table == NULL) {
        d-> ht [0] = n;
        return DICT_OK;
    }

    / * Prepare a second hash table for incremental rehashing * /
    // Assume that ht [0] is not empty. Then this is the behavior of expanding the dictionary
    // Set the new hash table to ht [1] and open the rehash logo
    d-> ht [1] = n;
    d-> rehashidx = 0;

    return DICT_OK;
}

After the dictionary dict's rehashidx is set to 0, it indicates that the rehash action is started. This flag will be checked during the operation of the heartbeat function. Assuming that rehash is required, a progressive rehash action can be performed. The process of function call is:
serverCron-> incrementallyRehash-> dictRehashMilliseconds-> dictRehash
IncrementallyRehash function code:
/ *
 * Called in Redis Cron, the first hash table in the database that can be rehashed
 * Perform 1 ms progressive rehash
 * /
void incrementallyRehash (void) {
    int j;

    for (j = 0; j <server.dbnum; j ++) {
        / *
Keys dictionary * /
        if (dictIsRehashing (server.db [j] .dict)) {
            dictRehashMilliseconds (server.db [j] .dict, 1);
            break; / * The specified CPU milliseconds have been exhausted * /
        }
...
}

The dictRehashMilliseconds function is to run the rehash action according to the specified number of milliseconds of the CPU operation, one for each 100 units.

The code example is as follows:
/ *
 * Within a given number of milliseconds, rehash the dictionary in units of 100 steps.
 * /
int dictRehashMilliseconds (dict * d, int ms) {
    long long start = timeInMilliseconds ();
    int rehashes = 0;

    while (dictRehash (d, 100)) {/ * 100 steps of data at a time * /
        rehashes + = 100;
        if (timeInMilliseconds ()-start> ms) break; / * Time-consuming completion. Pause rehash * /
    }
    return rehashes;
}
/ *
 * Run N-step progressive rehash.

 *
 * Suppose there are still elements in the hash table that need to be rehash after running. Then return 1.
 * Assuming that all the elements in the hash table have been migrated, then return 0.


 *
 * Each step of rehash will move the entire linked list node on an index in the hash table array,
 * So there may be more than one key migrated from ht [0] to ht [1].


 * /
int dictRehash (dict * d, int n) {
    if (! dictIsRehashing (d)) return 0;

    while (n--) {
        dictEntry * de, * nextde;
        // Assuming ht [0] is already empty, the migration is complete
        // Replace the original ht [0] with ht [1]
        if (d-> ht [0] .used == 0) {

            // Free the hash table array of ht [0]
            zfree (d-> ht [0] .table);

            // Point ht [0] to ht [1]
            d-> ht [0] = d-> ht [1];

            // Clear the pointer of ht [1]
            _dictReset (& d-> ht [1]);
            // Close the rehash logo
            d-> rehashidx = -1;
            // notify the caller that rehash is complete
            return 0;
        }
        assert (d-> ht [0] .size> (unsigned) d-> rehashidx);
        // Move to the index of the first linked list in the array that is not NULL
        while (d-> ht [0] .table [d-> rehashidx] == NULL) d-> rehashidx ++;
        // point to the head of the list
        de = d-> ht [0] .table [d-> rehashidx];
        // Migrate all elements in the linked list from ht [0] to ht [1]
        // Since there is usually only one element in the bucket, or no more than a certain ratio
        // so this operation can be seen as O (1)
        while (de) {
            unsigned int h;

            nextde = de-> next;

            / * Get the index in the new hash table * /
            // Calculate the hash value of the element in ht [1]
            h = dictHashKey (d, de-> key) & d-> ht [1] .sizemask;

            // Add the node to ht [1] and adjust the pointer
            de-> next = d-> ht [1] .table [h];
            d-> ht [1] .table [h] = de;

            // update counter
            d-> ht [0] .used--;
            d-> ht [1] .used ++;

            de = nextde;
        }

        // Set the pointer to NULL to facilitate skipping the next time rehash
        d-> ht [0] .table [d-> rehashidx] = NULL;

        // Advance to the next index
        d-> rehashidx ++;
    }

    // notify the caller that there are elements waiting for rehash
    return 1;
}


to sum up. Redis's rehash action is a core operation of memory management and data management, because Redis mainly uses a single thread for data management and message effects. Its rehash data migration process uses a gradual data migration model. This is done to prevent the rehash process from being too long and blocking the data processing thread.
There is no multi-thread migration model using memcached. The rehash process of memcached will be introduced later.

The rehash process from redis is very clever and elegant. It is worth noting here that when redis finds data, it is looking for the ht [0] being migrated and the ht [1] being migrated at the same time. Prevent data hits during migration.











Redis dictionary (dict) rehash process source code analysis
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.