Redis dictionary Data Structure

Source: Internet
Author: User

Redis is actually a dictionary structure, and key ----> value is a typical dictionary structure.

[Of course, for vaule, there are different memory organizational structures. This is not the case]

Imagine a storage scenario like this:

Key: "city"

Value: "Beijing"

If there are several such key-value pairs, how do you store them? ensure that the write and query speed is very good ~!

Aside from redis, if you want to store quick searches, the hash algorithm is the fastest. The ideal hash function can bring O (1) Search speeds, so redis does use this method ~!

However, the hash algorithm has two fatal weaknesses: 1) the fill factor cannot be too full. 2) a bad hash algorithm may cause a very high conflict rate.

Fill factor cannot be too full. In theory, it is about 0.5 too high, that is, the hash slots are full. Even in a good hash distribution algorithm, key conflicts cannot be avoided. Bad hash Distribution Algorithm

If a bad hash Distribution Algorithm leads to uneven key distribution, the hash slots calculated by the hash function are all in a bucket, this hash distribution algorithm is the least ideal. The ideal situation is to ensure that each key falls into a different hash slot. [hash slot> key]

Hash Storage Design for actual storage

1) Generally, after the hash distribution function is determined, the controllable factor is that if the fill factor is greater than a certain threshold value of your card, you need to migrate the hash structure, migrate to a larger hash slot. For the hash distribution function that is used in the same way, many people have used various mathematical methods to calculate the distribution function. This distribution function is not studied in depth here, but above this fill factor, the threshold of the card needs to be carefully considered.

2) during the migration process, both the single-threaded and multi-threaded environments will cause a brief service stop process. This will have a very short impact on the production environment. I personally think that in the special storage server process of the server, it is originally intended for a large number of high-concurrency storage, and the hash slot should be larger, this avoids migration of hash slots as much as possible.

Redis Hash Storage Design

Some of the scenarios mentioned above are problems that some hash storage engines will face. The solutions for redis are as follows:

1) At the code level, I think the Code Writing Style of redis code developers is really amazing. The readability is a step-by-step learning:

Redis written in C, but there are many design concepts of STL: dynamic memory management of the iterator, etc.

If you write a Hash Storage, the basic sub-data structures are required:

Each basic element

StructDicelement
{
/* Data */
Void* Key;
Void* Value;
StructDicelement * next;
};

Hash slot

StructDicelement ** hashtable [hashsolt];

 

This is the real source code of redis. A union consortium is used in the middle, either a pointer or a 64-bit number.

Typedef struct dictht {

Dictentry ** table;
Unsigned LongSize;
Unsigned LongSizemask;
Unsigned LongUsed;
} Dictht;

Dictht is a complete hash slot, which records the number of hash slots used in the table. [used] The number of hash slots [size]

Generally, the above two data structures can be used for static Hash Storage structures, but redis has a feature that supports resizing and dynamic resizing, similar to the vector policy of STL, when the critical threshold is reached, it is doubled.

The true result of DIC is as follows:

  1. Typedef StructDict {

  2. // Here, the typical C Writing Method of the DIC function pointer structure is encapsulated. If it is C ++, It is a class that is easier to read.

  3. Dicttype * type;

  4. Void* Privdata;

  5. // Two dictionaries. One empty dictionary needs to be written.

  6. Dictht HT [2];

  7. // If re-hash is used, this flag is resized and will be rewritten.

  8. IntRehashidx;

  9. IntIterators;

  10. } Dict;

    Rehashidx indicates the index value being indexed and the index number being assigned by the dictionary.

Digress:It is easier to understand code snippets by using C ++.

Dictionary iterator Discussion

Typedef StructDictiterator {
// Dictionary being iterated
Dict * D;
IntTable, // whether it is hash table 1 or 2
Index, // iterate the hash slot
SAFE;
Dictentry * entry, // hash Node
* Nextentry; // the next one
} Dictiterator;

 

Here, the iterator puts forward the safe field: The iterator's security

Iterator security: redis does not migrate all data at a time, but is migrated based on time slice. In this way, if the migration is not completed, if an iterator is inserted or deleted, this may cause missing or multiple copies.

In this case, we should adopt the best tactical mode: record the number of iterators that operate the DIC. Only when all are secure iterators can we perform the migration.

What if hashtable is multithreading in the production environment? The read and write operations on multiple threads will become uncontrollable ~! In addition, how can we ensure consistency with multiple threads ~!

  • After each migration, HT [I] will release the memory and then empty it. Before the Migration is complete, two dictionary buckets will be viewed.

Redis hash slot resizing Design

1) each time you perform the Add del and lookfor operations, you will execute the dicrehashstep function once and call dictrehash (d, 1) once, here, we will execute rehashidex for the next non-null value, that is, migrating a slot to HT [1, one execution is also a design that is considered to prevent redis from suspending the service for too long. However, the premise here is that the number of secure iterator iterators is 0, that is, the number of iterator operations that do not include add, delete, and modify ~! If an entry is added, deleted, or modified, the entry may be missing.

2) Here is the message indicating how many milliseconds are used as an interval for rehash operations, that is, migrating HT [0] to HT [1]. The base value for each operation is 100, the time is controlled by the server. This is the 2nd migration method. This migration method has many slots each time and requires more time. Therefore, the ms interval needs to be carefully evaluated, if it is not completed, it will cause a gap in time.

IntDictrehashmilliseconds (dict * D,IntMS ){
Long LongStart = timeinmilliseconds ();
IntRehashes = 0;
While(Dictrehash (D, 100 )){
Rehashes ++ = 100;
If(Timeinmilliseconds ()-Start> MS)Break;
}
ReturnRehashes;
}

 

 

 

Redis dictionary Data Structure

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.