Redis _ dictionary and redis dictionary

Source: Internet
Author: User
Tags rehash

Redis _ dictionary and redis dictionary
Two things you should know before reading this article. First, Redis is a Key-Value database, and second, the dictionary is an abstract data structure that stores Key-Value pairs. It is difficult to guess that the dictionary must be widely used in Redis. In fact, the underlying implementation of the Redis database is the dictionary. addition, deletion, query, and modification of the database is also built on the dictionary operation, if you want to understand Redis in depth, Dictionary decryption is essential. Next, let's take a look at what it looks like.
First, let's see where the dictionary is used in Redis.
I. Database key space
Redis is a key-Value Pair database server. Each database on the server is in a RedisDB structure. The RedisDb structure dict dictionary stores all key-value pairs in the database, we call this dictionary a key space. The key space corresponds directly to the database you see directly.
Ii. Expires dictionary
The Redis database structure is a RedisDb structure, and an attribute expires is also a dictionary, which stores the expiration time of all keys in the database. We call this dictionary an expiration dictionary.
The following describes the RedisDb data structure to help you better understand it.
3. dictionary is one of the underlying implementations of the Hash type.
One of the reasons is that the implementation of the Hash type can be of multiple types. In different scenarios, it can be of different types. However, a Hash key contains many key-value pairs, when some or both key-value pairs are long strings, the dictionary is used as the underlying implementation; otherwise, the compressed list is used as the underlying implementation.
[Note] keys in the key space and the keys in the expired dictionary all point to a key object, so there will be no duplicate objects or a waste of memory space.
Then let's take a look at how the dictionary is implemented in Redis.

The definition of the dictionary is given in dict. h/dict, as follows:

typedef struct dict {    dictType *type;    void *privdata;    dictht ht[2];    long rehashidx; /* rehashing not in progress if rehashidx == -1 */    int iterators; /* number of iterators currently running */} dict;


This is a hash table. Each element in the table array is a pointer to a dictEntry structure. size indicates the size of the hash table, that is, the size of the table array. The sizemask attribute is always equal to size-1, together with the hash value, sizemask determines which array should a key be placed. used indicates the number of nodes in the hash table. used/size is the load factor of a hash table, this factor determines when to expand or contract the hash table.
typedef struct dictht {    dictEntry **table;    unsigned long size;    unsigned long sizemask;    unsigned long used;} dictht;


The following is a hash table node. Each dictEntry structure maintains a key-value pair. The next pointer can connect multiple key-value pairs with the same hash value, the hash function and the hash conflict solution can be extended here. The solution used in Redis is the link address method, that is, if the hash values obtained by multiple values through the hash function are the same, there is a solution to solve the hash conflict after linking to this address, that is, the address searching method, when a hash conflict occurs, a key-value pair is performing a hash function to obtain an unused address. These two schemes have their own advantages and disadvantages, the link address method may degrade into a linked list, and the address searching method may be subject to conflicts during later insertion)

typedef struct dictEntry {    void *key;    union {        void *val;        uint64_t u64;        int64_t s64;        double d;    } v;    struct dictEntry *next;} dictEntry;

Another thing to mention is the rehash of the hash table.

As operations continue, the number of key-value pairs stored in a hash table increases or decreases. Too many or too few key-value pairs in a hash table are bad, it will be equivalent to multiple linked lists. It is not good if there are too few linked lists, and the hit rate of searching will be very low. It is best to maintain the load factor (used/size) of the hash table in a range, therefore, when the number of hash tables is too large or too small, the program will expand or contract the hash table,

Expansion is easy to understand. If the size is 4, but used is 8, it is equivalent to having a chain behind each key, so it is difficult to search. This can be done through Rehash, note the one in the dict Data Structure

Dictht ht [2], here there are two dictht, where ht [1] is idle, during the expansion, the ht [1] is now extended to twice the ht [0, then hash the key-value pairs in ht [0] to ht [1], and set ht [1] to ht [0].

Note the rehash timing. Generally, it refers to expansion when the load factor is greater than 5 and contraction when the load factor is less than 0.1. Another problem is that the dictionary has a rehashidx attribute, this attribute indicates the rehash status. If it is 0, it indicates that rehash is officially started. If no key-value pair is rehash, this value is added, when all the values of ht [0] are transferred to ht [1], the value is set to-1, indicating that the rehash operation is completed.

In fact, there are still many things to be said, such as progressive rehash, which means that the rehash process is not completed in one time, but multiple times and incrementally completed. During the rehash process, all the deletions and searches are performed, updates are performed in two hash tables. For example, if an element is not found in ht [0], go to ht [1, all newly added data are added to ht [1], and no addition operation is performed in ht [0 ].




Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.