NoSQL database: Redis memory usage optimization and storage

Source: Internet
Author: User
Tags compact redis serialization set set

Redis Common data types

The most commonly used data types of Redis are the following five kinds:

String

Hash

List

Set

Sorted Set

Before describing these types of data, let's look at a diagram of how these different data types are described in Redis internal memory management:

  

First, Redis internally uses a Redisobject object to represent all key and value,redisobject information as shown in: type represents what data type a value object is, Encoding is how different data types are stored inside the Redis, such as: Type=string represents a normal string for value, then the corresponding encoding can be raw or int, If it is an int, the actual redis interior is stored and represented by a numeric class, assuming that the string itself can be represented numerically, such as a string such as "123" "456".

Here you need to specify the VM field, only the virtual memory feature of Redis is turned on, this field will actually allocate memory, which is turned off by default, which is described later in this function. We can find that Redis uses Redisobject to indicate that all key/value data is a waste of memory, and of course, the cost of memory management is mainly to provide a unified management interface for different data types of Redis. The actual author also offers several ways to help us save memory as much as possible, which we'll discuss in detail later.

  Let's start with the analysis of the use of these five types of data and how to implement them internally:

  1, String

Common commands:

Set,get,decr,incr,mget and so on.

Application Scenarios:

String is the most commonly used type of data, and ordinary key/value storage can be categorized as such, which is not explained here.

Implementation method:

String in the Redis internal storage By default is a string, referenced by Redisobject, when encountered INCR,DECR and other operations will be converted to a numeric type for calculation, at this time Redisobject encoding field is an int.

 2, Hash

Common commands:

Hget,hset,hgetall and so on.

Application Scenarios:

Let's simply cite an example to describe the application scenario for a hash, such as storing a user information object data that contains the following information:

The user ID is the key to find, the stored value user object contains the name, age, birthday and other information, if the ordinary key/value structure to store, mainly has the following 2 kinds of storage methods:

  

The disadvantage of using the user ID as a lookup key to encapsulate other information as a serialized object is to increase the cost of serialization/deserialization and to retrieve the entire object when one of the information needs to be modified, and the modification operation requires concurrency protection. Introduce complex problems such as CAs.

  

The second method is how many members of this user information object will be saved into the number of key-value, with the user id+ the name of the corresponding property as a unique identifier to obtain the value of the corresponding property, although the cost of serialization and concurrency is omitted, but the user ID is repeated storage, if there is a large number of such data, The memory waste is still very considerable.

So the hash provided by Redis is a good solution to this problem, and the Redis hash is actually the internal stored value as a hashmap, and provides a direct access to the map member's interface, such as:

  

That is, the key is still the user ID, value is a map, the map key is a member of the property name, value is the property value, so that the data can be modified and accessed directly through its internal map key (Redis called internal map key field), This means that the corresponding attribute data can be manipulated by key (user ID) + field (attribute tag), without the need to store the data repeatedly and without the problem of serialization and concurrency modification control. A good solution to the problem.

It is also important to note that Redis provides an interface (Hgetall) that can fetch all of the property data directly, but if the internal map has a large number of members, it involves traversing the entire internal map, which can be time-consuming due to the Redis single-threaded model. The other client requests are not responding at all, which requires extra attention.

Implementation method:

The above has been said that the Redis hash corresponds to value inside the actual is a hashmap, actually there will be 2 different implementations, this hash of the members of the relatively small redis in order to save memory will be similar to a one-dimensional array to compact storage, without the use of a real HASHMAP structure , the encoding of the corresponding value Redisobject is Zipmap, and when the number of members increases, it automatically turns into a true hashmap, at which time encoding is HT.

  3. List

Common commands:

Lpush,rpush,lpop,rpop,lrange and so on.

Application Scenarios:

Redis list has many applications and is one of the most important data structures of redis, such as Twitter's watchlist, fan list, etc., which can be implemented using the REDIS list structure, which is better understood and not repeated here.

Implementation method:

The implementation of Redis list is a doubly linked list, which can support reverse lookup and traversal, but it is more convenient to operate, but it brings some additional memory overhead, and many implementations within Redis, including sending buffer queues, are also used in this data structure.

  4. Set

Common commands:

Sadd,spop,smembers,sunion and so on.

Application Scenarios:

The functionality provided by Redis set externally is a list-like feature, except that set is automatically weight-saving, and set is a good choice when you need to store a list of data and you don't want duplicate data. and set provides an important interface to determine whether a member is within a set set, which is not available in list.

Implementation method:

The internal implementation of set is a value that is always null hashmap, which is actually calculated by hashing the way to fast weight, which is also set to provide a judge whether a member is within the cause of the collection.

  5. Sorted Set

Common commands:

Zadd,zrange,zrem,zcard, etc.

Usage scenarios:

The usage scenario for Redis sorted set is similar to set, except that the set is not automatically ordered, and the sorted set can be ordered by the user with an additional priority (score) parameter, and is inserted in an orderly, automatic sort. When you need an ordered and non-repeating collection list, you can choose sorted set data structures, such as the public Timeline of Twitter, which can be stored as score in the publication time, which is automatically sorted by time.

Implementation method:

Redis sorted set internal use HashMap and jump Table (skiplist) to ensure the storage and ordering of data, HashMap in the member to score mapping, and the jumping table is all the members, sorted by HashMap in the score , the use of the structure of the jumping table can obtain a relatively high efficiency of finding, and it is relatively simple to implement.

Common memory optimization methods and parameters

With some of our implementations above, we can see that the actual memory management cost of Redis is very high, that is, it takes up too much memory, and the author is very clear about this, so it provides a series of parameters and means to control and save memory, we discuss it separately.

First, the most important thing is not to turn on the redis VM option, the virtual Memory feature, which is a persistent strategy to swap out the memory and disk for a Redis store of data that exceeds physical memory, but its memory management costs are also very high. And we will later analyze this persistence strategy is immature, so to turn off the VM function, please check your redis.conf file vm-enabled to No.

Next, it is best to set the MaxMemory option in Redis.conf, which tells Redis how much physical memory is used to start rejecting subsequent write requests, which is a good way to protect your redis from using too much physical memory to cause swap. Eventually seriously impacting performance or even crashing.

In addition, Redis provides a set of parameters for different data types to control memory usage, and we analyzed in detail that Redis hash is value internally as a hashmap, and if the map has a smaller number of members, it will be stored in a compact format similar to one-dimensional linear, This eliminates the memory overhead of a large number of pointers, and this parameter controls the following 2 entries in the redis.conf configuration file:

Hash-max-zipmap-entries 64

Hash-max-zipmap-value 512

Hash-max-zipmap-entries

The implication is that when value is not more than the number of members within the map is stored in a linear compact format, the default is 64, that is, the value of 64 members of the following is the use of linear compact storage, more than this value automatically into a real hashmap.

Hash-max-zipmap-value means that when value does not exceed the number of bytes per member within the map, linear compact storage is used to save space.

Above 2 conditions any one condition exceeds the set value to convert to the real HashMap, also will not save the memory again, then this value is not set the bigger the better, the answer is of course negative, the HashMap advantage is to find and operate the time complexity is O (1), While discarding a hash using one-dimensional storage is the time complexity of O (n), if

The number of members is small, the impact is not significant, otherwise it will seriously affect performance, so to weigh this value of the setting, overall is the most fundamental time cost and space cost tradeoff.

Also similar parameters are:

List-max-ziplist-entries 512

Description: The list data type is a compact storage format that uses a pointer to the following number of nodes.

List-max-ziplist-value 64

Description: The list data type node value size is less than how many bytes are in a compact storage format.

Set-max-intset-entries 512

Description: The Set data type internal data is stored in a compact format if it is all numeric and contains many nodes.

The last thing to say is that redis internal implementations do not optimize memory allocations to some extent, but in most cases this will not be a performance bottleneck for Redis, but if most of the data stored inside Redis is numeric, The Redis interior employs a shared integer to eliminate the overhead of allocating memory, which is to assign a number of numeric objects from 1~n to a pool when the system starts, and if the stored data happens to be data within that range, remove the object directly from the pool, And by reference counting way to share, so that the system stores a large number of values, but also to some extent save memory and improve performance, this parameter value n settings need to modify the source code a line of macro definition redis_shared_integers, the value is 10000 by default, Can be modified according to their own needs, modified and re-compiled on it.

The persistence mechanism of Redis

Since Redis supports very rich types of memory data structures, how to persist these complex memory organizations to disk is a challenge, so there are more differences in how redis is persisted than traditional databases, and Redis supports four persistence modes, namely:

Timed snapshot mode (snapshot)

How to append a file based on a statement (AOF)

Virtual Memory (VM)

Diskstore Way

In the design thinking, the first two are based on all the data in memory, that is, the small amount of data to provide disk landing function, the latter two ways is the author in the attempt to store data over physical memory, that is, the large amount of data storage, as of this article, the last two persistent mode is still in the experimental phase, And the VM way basically has been abandoned by the author, so the actual can be used in the production environment only the first two, in other words, Redis is currently only as a small amount of data storage (all data can be loaded in memory), massive data storage is not the domain that Redis excels at. Here are some examples of how to persist:

  Timed snapshot mode (snapshot):

This persistence is actually a timer event within Redis, every fixed time to check whether the current data has changed the number of times and time to meet the configured persistence trigger condition, if satisfied then through the operating system fork to create a child process, This child process, by default, shares the same address space as the parent process, where it can traverse the entire memory for storage operations, while the main process can still provide the service, when there is write by the operating system according to the Memory page (page) Copy-on-write for the unit to ensure that the parent-child process does not affect each other.

The main disadvantage of this persistence is that the timed snapshot represents a memory image for a period of time, so the system reboot loses all data between the last snapshot and the restart.

  Based on statement append mode (AOF):

The AOF approach is actually similar to MySQL's statement-based Binlog, where each command that causes Redis memory data to change is appended to a log file, which means that the log file is the persistent data for Redis.

The main disadvantage of the AOF approach is that appending the log file can lead to an excessive volume, and if it is aof when the system restarts the recovery data, it can be very slow to load data, and dozens of g of data may take a few hours to load, although this time is not due to slow disk file reads. Instead, all commands that are read are executed in memory. In addition, because each command has to write log, so the use of aof, Redis read and write performance will be reduced.

  Virtual Memory Mode:

The virtual memory mode is a redis for user space data exchange in a policy, this way in the implementation of the effect is poor, the main problem is the code complex, restart slow, replication slow, etc., has been abandoned by the author.

  Diskstore Way:

Diskstore Way is the author abandoned the virtual memory mode after the choice of a new way of implementation, that is, the traditional way of B-tree, is still in the experimental stage, the follow-up is available we can wait and see.

Redis Persistent disk IO mode and the problems it brings

People who have experience with Redis on-line operations will find that Redis has a lot of physical memory usage, but it has not exceeded the actual physical memory of the total capacity of the instability or even crash, some people think that is based on the snapshot persistence of the fork system calls resulting in double memory consumption, this view is inaccurate, Because the copy-on-write mechanism of the fork call is based on the unit of the operating system page, that is, only the dirty pages that are written will be copied, but generally your system does not write all pages in a short period of time, resulting in replication, then what causes Redis to crash?

The answer is that the persistence of Redis uses buffer Io, which means that Redis writes and reads to persisted files using the page Cache of physical memory, and most database systems use direct IO to bypass this layer of page Cache and maintain a cache of the data itself, and when a Redis persistent file is too large (especially a snapshot file) and read and write to it, the data in the disk file is loaded into physical memory as the operating system caches a layer of the file. And this cache of data and Redis in memory management data is actually repeatedly stored, although the kernel in the physical memory is tight when the page cache to do the culling work, but the kernel probably think that a piece of page cache more important, and let your process start swap, Your system will start to appear unstable or crash. Our experience is that when your Redis physical memory is using more than 3/5 of the total memory capacity, it starts to be more dangerous.

is the memory data graph of Redis after reading or writing to the snapshot file Dump.rdb:

  

Summarize:

1. Choose the right data type based on your business needs and set the appropriate compact storage parameters for different scenarios.

2. When the business scenario does not require data persistence, shutting down all persistence methods gives you the best performance and maximum memory usage.

3. If you need to use persistence, do not use virtual memory and Diskstore mode, depending on whether you can tolerate restarting the loss of some data between snapshot mode and statement append mode.

4. Do not let your redis machine physical memory use more than 3/5 of the total amount of actual memory.

NoSQL database: Redis memory usage optimization and storage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.