Redis design and implementation [PART I] Data structures and objects-C source Reading (i)

Source: Internet
Author: User
Tags rehash redis server

one, simple dynamic string SDS

Keywords: space pre-allocation, lazy space release, binary security

C string is not easy to change, so Redis uses C strings in some places that do not need to modify the string values, as String literals (string literal), such as the print log:
Redislog (redis_waring, "REDIS is today ready to exit, Bye bye ...");

In a Redis database, key-value pairs that contain strings are implemented by SDS at the bottom.

SDS is also used as buffer: the AOF buffer in the AOF module, and the input buffer in the client state, are implemented by SDS.

Source

The SDS structure is defined in Sds.h:

    /*     * 保存字符串对象的结构     */    struct sdshdr {             // buf 中已占用空间的长度        int len;        // buf 中剩余可用空间的长度,即未使用空间        intfree;        // 数据空间        char buf[];    };

The complexity of obtaining an SDS length is O (1), which is automatically set and updated by the SDS API at execution time, using SDS without any manual modification of the length of the work.

Space allocation

The SDS space allocation Strategy is: When SDS API needs to be modified, the API will first check whether the SDS space to meet the requirements of the modification, if not satisfied, the API will automatically expand the SDS to the size required to perform the modification, and then perform the actual modification operation, Eliminates the possibility of a buffer overflow occurring.

With unused space, SDS implements two optimization strategies for space pre-allocation and inert space release:

    • Space Pre-allocation

Space pre-allocation is used to reduce the number of memory allocations required to perform continuous string growth operations.
With this pre-allocation strategy, the number of memory redistribution times required for SDS to continuously grow n strings is reduced from a certain n to a maximum of n times.
The amount of unused space in which additional allocations are made is determined by the following formula:

1. 如果对SDS进行修改后,SDS的长度(即len属性的值)小于1MB,就分配和len属性同样大小的未使用空间,即len属性的值和free属性的值相同   2. 如果对SDS进行修改之后,SDS的长度大于等于1MB,就分配1MB的未使用空间。  
    • Inert Space release

Lazy space Release a memory redistribution operation to optimize the SDS string operation: When the SDS API needs to shorten the SDS-saved string, the program does not immediately use memory redistribution to reclaim the shortened bytes, but instead uses the free property to record the number of bytes and wait for future use.

SDS APIs are binary safe (binary-safe), and all SDS APIs deal with the data stored in the BUF array in binary mode, and the program does not restrict, filter, or assume what data is written and what it reads when it is read.

Redis uses an BUF array of SDS to store binary data instead of characters.

SDS can be compatible with some C string functions.

second, linked list

Keywords: polymorphic

When a list key contains a larger number of elements, or if the list contains elements that are long strings, Redis uses the linked list as the underlying implementation of the listing key.

The underlying implementation of the integers list key is a linked table, where each node in the linked table holds an integer value.

In addition to linked lists, publications and subscriptions, slow queries, monitors and other functions are also used in the linked list, the Redis server itself also uses a linked list to save the state information of multiple clients, as well as the use of linked lists to build the client output buffer.

Source

The definition of a linked list structure is in adlist.h:

    /*-double-ended linked list node */    typedef structListNode {//predecessor node        structListNode *prev;//Rear node        structListNode *next;the value of the//node        void*value; } ListNode;/ * * Double-ended list iterator */    typedef structListiter {//Current iteration to the nodeListNode *next;//Direction of iteration        intDirection } Listiter;/*-double-ended linked list structure */    typedef struct List{//Table header nodeListNode *head;//Footer nodeListNode *tail;//Node value copy function        void* (*dup) (void*PTR);//Node value deallocation function        void(* Free)(void*PTR);//Node value comparison function        int(*match) (void*ptr,void*key);the number of nodes included in the// list        unsigned LongLen }List;

The list structure provides table header pointer head, footer pointer tail, and list-length counter len,dup, free, and match members for the linked list, which are the type-specific functions required to implement the Polymorphic list:

    • The DUP function is used to copy the values saved by the linked table node.
    • The free function is used to release the values saved by the linked table nodes;
    • The match function is used to compare whether the value saved by the linked table node is equal to another input value.

The features of the Redis list implementation are as follows:

    • Dual-ended, non-ring, with header and footer pointers, linked list length counters, polymorphic
third, the dictionary

Keywords: polymorphic, progressive REHASH,MURMURHASH2

Redis database is the use of a dictionary as the underlying implementation, the database of the increase, delete, change, check is also built on the operation of the dictionary.

A dictionary is also one of the underlying implementations of hash keys, and Redis uses a dictionary as the underlying implementation of the hash key when a hash button contains a large number of values, or if the elements in a key-value pair are relatively long strings.

The Redis dictionary uses a hash table as the underlying implementation, and a hash table can have multiple hash table nodes, and each hash table node holds a key-value pair in the dictionary.

Source

The hash table used by the dictionary is defined in dict.h:

 /* * Hash Table * Each dictionary uses two hash tables, which actually     Now progressive rehash. */ typedef  struct  dictht {//hash table array, each element in the array is a pointer to the DICTENTRY structure  dictentry **table;        //hash Table size  unsigned               long  size;        //hash Table size mask, used to calculate index value         //always equals size-1  unsigned         long  sizemask;        //the number of nodes that the hash table already has  unsigned     long  used; } dictht;  
    • The table property is an array, and each element in the array is a pointer to the DICTENTRY structure, and each dictentry structure holds a key-value pair.
    • The Size property records the sizes of the hash table, which is the size of the table array.
    • The Used property records the current node of the hash table (number of key-value pairs)
    • The Sizemask property and the hash value together determine which index of a table array the key should be placed on
    /*     * 哈希表节点     */    typedefstruct dictEntry {              // 键        void *key;        // 值        union {            void *val;            uint64_t u64;            int64_t s64;        } v;        // 指向下个哈希表节点,形成链表        struct dictEntry *next;    } dictEntry;
    The
    • key property holds the key in the key-value pair
    • v property to hold the value in the key-value pair, where the value in the key-value pair can be a pointer, or a uint64_t integer, or a int64_t integer
    • The next property points to a pointer to another hash table node and uses the chain address method to resolve the key conflict.
 /* * Dictionary */ typedef  struct  dict {//type-specific functions  Dicttype *type;        //private Data         void  *privdata;        //hash table  dictht ht[2 ];        //Rehash index         //when rehash is not in progress, the value is-1  int  rehashidx;        /* rehashing not in progress if rehashidx = =-1 */        //the number of security iterators that are currently running  int  iterators; /* number of iterators currently running */} dict;  

The Type property and the Privdata property are set for different types of key-value pairs, for creating polymorphic dictionaries:

    • The Type property is a pointer to the DICTTYPE structure, and each dicttype structure holds a cluster of functions that manipulate a particular type of key-value pair, and Redis sets different type-specific functions for dictionaries of different purposes.
    • The Privdata property holds the optional arguments that need to be passed to those type-specific functions.
/ * Dictionary type specific function * /typedef structDicttype {//function to calculate the hash value    unsigned int(*hashfunction) (Const void*key);//Copy key function    void* (*keydup) (void*privdata,Const void*key);//Copy value of function    void* (*valdup) (void*privdata,Const void*obj);//Comparison key functions    int(*keycompare) (void*privdata,Const void*key1,Const void*key2);//Destroy the key function    void(*keydestructor) (void*privdata,void*key);//function to destroy values    void(*valdestructor) (void*privdata,void*obj);} Dicttype;
    • The HT property is an array of two items, each item in the array is a dictht hash table, generally, the dictionary uses only the ht[0] hash table, and the ht[1] hash table is only used when rehash to the ht[0] hash table.
    • The Rehashidx property records the current progress of rehash and, if not currently rehash, it has a value of-1.
    /* Dictionary iterator *-If the Safe property has a value of 1, during the iteration, the program can still execute Dictadd, Dictfind, and other functions to modify the dictionary.     *-If safe is not 1, then the program will only call Dictnext to iterate over the dictionary--not to modify the dictionary. */    typedef structDictiterator {//dictionary that is iteratedDict *d;//table: The hash table number that is being iterated, the value can be 0 or 1.         //Index: The index position of the Hashtable to which the iterator is currently pointing.         //Safe: Identifies if this iterator is secure        intTable, index, safe;//ENTRY: Pointer to the node that is currently iterated        //NextEntry: The next node of the current iteration node        //Because the node pointed to by entry may be modified when the security iterator is operating,        //So an extra pointer is needed to save the position of the next node,        //To prevent pointer lossDictentry *entry, *nextentry;Long Longfingerprint;/ * Unsafe iterator fingerprint for misuse detection * /} Dictiterator;
Hash

Redis computes hash and index values in the following ways:

    // 使用字典设置的哈希函数,计算键key的哈希值    = dict->type->hashFunction(key);    // 使用哈希表的sizemask属性和哈希值,计算出索引值    // 根据情况不同,ht[x]可以是ht[0]或ht[1]    =& dict->ht[x].sizemask;
/ *-------------------------hash functions------------------------------* // * Thomas Wang's + bit Mix Function * /unsigned intDictinthashfunction (unsigned intKey) {key + = ~ (Key << the); Key ^= (Key >>Ten); Key + = (Key <<3); Key ^= (Key >>6); Key + = ~ (Key << One); Key ^= (Key >> -);returnKey;}/ * Identity hash function for integer keys * /unsigned intDictidentityhashfunction (unsigned intKey) {returnKey;}Staticuint32_t dict_hash_function_seed =5381;voidDictsethashfunctionseed (uint32_t seed) {dict_hash_function_seed = seed;} uint32_t Dictgethashfunctionseed (void) {returnDict_hash_function_seed;}/* MurmurHash2, by Austin Appleby * Note-this Code makes a few assumptions what is your machine behaves-* 1. We can read a 4-byte value from any address without crashing * 2. sizeof (int) = = 4 * * And it has a few limitations-* * 1. It won't work incrementally. * 2. It won't produce the same results on Little-endian and Big-endian * machines. */unsigned intDictgenhashfunction (Const void*key,intLen) {/ * ' m ' and ' R ' are mixing constants generated offline.  They ' re not really ' magic ', they just happen to work well. */uint32_t seed = dict_hash_function_seed;Constuint32_t m =0x5bd1e995;Const intR = -;/ * Initialize the hash to a ' random ' value * /uint32_t h = seed ^ len;/ * Mix 4 bytes at a time to the hash */    Const unsigned Char*data = (Const unsigned Char*) key; while(Len >=4{uint32_t k = * (uint32_t*) data;        K *= m;        K ^= k >> R;        K *= m;        H *= m;        H ^= K; Data + =4; Len-=4; }/ * Handle The last few bytes of the input array * /    Switch(LEN) { Case 3: H ^= data[2] << -; Case 2: H ^= data[1] <<8; Case 1: H ^= data[0];    H *= m; };/ * Do a few final mixes of the hash to ensure the last few * bytes is well-incorporated. * *H ^= H >> -;    H *= m; H ^= H >> the;return(unsigned int) H;}/* and a case insensitive hash function (based on DJB hash) */unsigned intDictgencasehashfunction (Const unsigned Char*buf,intLen) {unsigned inthash = (unsigned int) Dict_hash_function_seed; while(len--) hash = (Hash <<5) + hash) + (ToLower(*buf++));/* Hash * + C * *    returnhash;}

When the dictionary is used as the underlying implementation of the database, or the underlying implementation of the hash key, Redis uses the MurmurHash2 algorithm to calculate the hash value of the keys:

    • The advantage of the algorithm is that even if the input key is regular, the algorithm can still give a good random distribution, and the algorithm is very fast.

To keep the load factor of the hash table within a reasonable range, the program needs to expand or contract the size of the hash table appropriately when the hash table holds too many or too few key-value pairs.

    • Load factor calculation formula for a hash table: Load_factor = ht[0].used/ht[0].size
Rehash

The work of extending and shrinking a hash table can be done by performing a rehash (re-hashing) operation, and Redis performs rehash on the Dictionary hash table as follows:

    • Allocates space for the dictionary's ht[1] hash table, the size of the hash table depends on the action to be performed, and the number of key-value pairs currently contained by Ht[0] (that is, the value of the Ht[0].used property)

      1. If the scaling operation is performed, then the size of ht[1] is the first 2^n greater than or equal to Ht[0].used*2 (2 's n Power);
      2. If you are performing a shrink operation, the size of ht[1] is the first 2^n greater than or equal to ht[0].used.
    • Save all key-value pairs rehash to ht[1] in ht[0]: Rehash refers to recalculating the hash and index values of the key, and then placing the key-value pairs at the specified location in the ht[1] hash table.

    • When all key-value pairs contained by ht[0] are migrated to Ht[1] (ht[0] to empty table), release ht[0], set ht[1] to Ht[0, and create a new blank hash table in ht[1], prepare for the next rehash.

When either of the following conditions is met, the program automatically begins to extend operations on the hash table:

    • The server is currently not executing the Bgsave command or the bgrewriteaof command, and the hash table has a load factor greater than or equal to 1
    • The server is currently executing the Bgsave command or the bgrewriteaof command, and the hash table has a load factor greater than or equal to 5

In the process of executing the bgsave command or the bgrewriteaof command, Redis needs to create child processes for the current server process, while most operating systems use write-time replication (copy-on-write) technology to optimize the use of child processes, so during the child process existence, The server increases the load factor required to perform the scaling operation, minimizing the need for hash table expansion operations during child process existence, which avoids unnecessary memory writes and maximizes memory savings.

When the load factor of a hash table is less than 0.1, the program automatically begins a shrink operation on the hash table.

Progressive type Rehash

In order to avoid the impact of rehash on server performance, the server is not a one-time ht[0] all the key value of all rehash to ht[1], but several times, gradually ht[0] inside the key value pair slowly rehash to ht[1].

Here are the detailed steps for the hash table progressive rehash:

    1. Allocate space for ht[1] and let the dictionary hold both ht[0] and ht[1] two hash tables.

    2. Maintain an index counter variable rehashidx in the dictionary with a value of 0, indicating that the rehash work begins formally

    3. During rehash, each time a dictionary is added, deleted, found, or updated, the program performs the specified operation and, incidentally, ht[0] the hash table on all the key values on the REHASHIDX index rehash to ht[1], and when the rehash work is complete, The program increases the value of the Rehashidx property by one.

    4. With the continuous execution of the dictionary operation, all key-value pairs of ht[0] will be rehash to ht[1] at a certain point in time, which is a program that sets the value of the Rehashidx property to 1, indicating that the rehash operation is complete

The progressive rehash takes a divide-and-conquer approach, dividing the rehash key value pairs into each addition, deletion, lookup, and update operation of the dictionary, thus avoiding the large amount of computational effort caused by centralized rehash.

In the progressive rehash process, the dictionary will use both ht[0] and ht[1] Two hash tables, so during progressive rehash, the dictionary delete, find, update will be on two hash table, such as now ht[0] find, not found to go to ht[1] Find

During progressive rehash execution, the new key-value pairs added to the dictionary are all saved to ht[1], and ht[0] does not add any more, which ensures that the number of key-value pairs contained in the ht[0] is reduced only, and that the execution of the rehash operation eventually becomes an empty table.

Redis design and implementation [PART I] Data structures and objects-C source Reading (i)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.