Redis data types
The most commonly used data types are five kinds: String, Hash, List, set, and SortedSet. Within Redis, a Redisobject object is used to represent all keys and value. The main information of Redisobject is as follows:
The type represents what data type a value object is, and encoding is how different data types are stored inside the Redis, such as: Type=string represents a normal string for value. Then the corresponding encoding can be raw or int, and if it is an int, it means that the actual redis interior is stored and represented by numeric type, assuming that the string itself can be represented numerically.
? Here's a special description of the VM field, which will only really allocate memory if the Redis virtual Memory feature is turned on, and the feature is turned off by default. VM Capabilities We'll discuss later, by discovering that Redis uses Redisobject to represent all key-value data as a waste of memory, and of course the cost of these memory management is also to provide a unified management interface for different Redis data types. The actual author also offers many ways to help us save memory usage as much as possible. This is also discussed later. Let us first analyze the use of these five types of data and the internal implementation method.
String
Common Commands : SET,GET,DECR,INCR, Mset, Mget and so on.
Scenario : String is the most common type of data, and normal Key-value storage can be categorized as such.
implementation : String in the Redis internal storage By default is a string, referenced by Redisobject, when encountered INCR,DECR and other operations, will be converted to numeric type for calculation, At this point the encoding field of Redisobject is int.
Hash
Common Commands: Hmset, Hmget, Hdel, Hlen and so on.
Application Scenarios
? Let's simply cite an example to describe the application scenario for a hash, such as storing a user information object data that contains the following information:
? The user ID is the key to find, the stored value user object contains the name, age, birthday and other information, if the ordinary key/value structure to store, mainly has the following 2 kinds of storage methods:
The first way is to find the user ID as key, to encapsulate other information as an object to be stored in a serialized way, the disadvantage is that the cost of serialization/deserialization is increased, and when one of the information needs to be modified, the whole object needs to be retrieved, and the modification operation needs to protect the concurrency. Introduce complex problems such as CAs.
The second method is how many members of this user information object will be saved into the number of key-value, with the user id+ the name of the corresponding property as a unique identifier to obtain the value of the corresponding property, although the cost of serialization and concurrency is omitted, but the user ID is duplicate storage, if there is a large number of such data, The memory waste is still very considerable.
? Then the hash provided by Redis is a good solution to this problem, and the Redis hash is actually the internal stored value as a hashmap, and provides direct access to the map member's interface, such as:
? That is, key is still the user ID, value is a map, the map key is a member of the property name, value is the property value, so that the data can be modified and accessed directly through its internal map key (Redis called internal map key field), This means that the corresponding attribute data can be manipulated by key (user ID) + field (attribute tag), without the need to store the data repeatedly and without the problem of serialization and concurrency modification control. A good solution to the problem.
At the same time, it is important to note that Redis provides an interface (Hgetall) that can fetch all the property data directly, but if the internal map has a large number of members, it involves traversing the entire internal map, which can be time-consuming due to the Redis single-threaded model. The other client requests are not responding at all, which requires extra attention.
? Implementation: The above has been said that the Redis hash corresponds to value inside the actual is a hashmap, actually there will be 2 different implementations, this hash of the members of the relatively small redis in order to save memory will be similar to a one-dimensional array to compact storage, Instead of a true hashmap structure, the corresponding value Redisobject encoding is Zipmap, and when the number of members increases, it automatically turns into a real hashmap, at which time encoding is HT.
List
Common Commands : Lpush,rpush,lpop,rpop,lrange and so on.
Application Scenario : The Redis list has many applications and is one of the most important data structures of redis, such as Twitter's watchlist, fan list, etc., which can be implemented using the REDIS list structure, which is better understood and not repeated here.
implementation : The implementation of the Redis list is a doubly linked list, which can support reverse lookup and traversal, more convenient operation, but with some additional memory overhead, many implementations within Redis, including sending buffer queues, are also used in this data structure.
Set
Common Commands : Sadd,spop,smembers,sunion and so on.
Scenario : Redis set provides functionality that is similar to a list, except that set is automatic, and when you need to store a list of data and you don't want duplicate data, set is a good choice. and set provides an important interface to determine whether a member is within a set set, which is not available in list.
implementation : The internal implementation of set is a value that is always null HASHMAP (PS: This is the same as the implementation of the set in Java), the actual is to calculate the hash of the way to fast weight, This is also why set can provide a reason for judging whether a member is within a set.
Sorted Set
Common commands: Zadd,zrange,zrem,zcard and so on.
Usage Scenario: The usage scenario for Redis sorted set is similar to set, except that set is not automatically ordered, and sorted set can sort members by providing an additional priority (score) parameter for the user, and is inserted in an orderly, automatic sort. When you need an ordered and non-repeating collection list, you can choose sorted set data structures, such as the public Timeline of Twitter, which can be stored as score in the publication time, which is automatically sorted by time.
Implementation: The internal use of Redis sorted set HashMap and Jump Table (skiplist) to ensure data storage and order, HashMap in the member to score mapping, and the jumping table is stored in all the members, Sorting is based on the score of HashMap, using the structure of the jumping table can obtain a relatively high efficiency, and the implementation is relatively simple.
Common memory optimization methods and parameters
? With some of our implementations above, we can see that the actual memory management cost of Redis is very high, that is, it takes up too much memory, and the author is very clear about this, so it provides a series of parameters and means to control and save memory, we discuss it separately.
? First, the most important thing is not to turn on the redis VM option, the virtual Memory feature, which is a persistent strategy to swap out the memory and disk for a Redis storage that exceeds physical memory data, but its memory management costs are also very high. And we will later analyze this persistence strategy is immature, so to turn off the VM function, please check your redis.conf file vm-enabled to No.
? Next, it is best to set the MaxMemory option in Redis.conf, which tells Redis how much physical memory is used to start rejecting subsequent write requests, which is good enough to protect your redis from using too much physical memory to cause swap. Eventually seriously impacting performance or even crashing.
? In addition, Redis provides a set of parameters for different data types to control memory usage, and we analyzed in detail that Redis hash is value internally as a hashmap, and if the map has a smaller number of members, it will be stored in a compact format similar to one-dimensional linear to store the map , which eliminates the memory overhead of a large number of pointers, this parameter control corresponds to the following 2 items in the redis.conf configuration file:
hash-max-zipmap-entries64 hash-max-zipmap-value512 hash-max-zipmap-entries
The meaning is that when value is not more than the number of members in the map is stored in a linear compact format, the default is 64, that is, the value of 64 members of the following is the use of linear compact storage, more than this value automatically turned into a true hashmap.
The meaning of Hash-max-zipmap-value is that when value is within the map, the length of each member is not more than a few bytes, which saves space by using linear compact storage.
The above 2 conditions any one condition above the set value will convert to the real HashMap, also will not save the memory again, then this value is not set the bigger the better, the answer is of course the negation, the HashMap advantage is to find and operate the time complexity is O (1), Discard hash using one-dimensional storage is O (n) time complexity, if the number of members is small, the impact is small, otherwise it will seriously affect performance, so to weigh the value of the setting, overall is the most fundamental time cost and space cost tradeoff.
? Similar parameters are:
list-max-ziplist-entries512
Description: The list data type how many nodes below will use a compact storage format for pointers.
list-max-ziplist-value64
Description: The list data type node value size is less than how many bytes are in a compact storage format.
set-max-intset-entries512
Description: Set data type internal data if all are numeric, and the number of nodes that are included below is stored in a compact format.
The last thing to say is that redis internal implementations do not optimize memory allocations to some extent, but in most cases this will not be a performance bottleneck for Redis, but if most of the data stored inside Redis is numeric, The Redis interior employs a shared integer to eliminate the overhead of allocating memory, which is to assign a number of numeric objects from 1~n to a pool when the system starts, and if the stored data happens to be data within that range, remove the object directly from the pool, And by reference counting way to share, so that the system stores a large number of values, but also to some extent save memory and improve performance, this parameter value n settings need to modify the source code a line of macro definition redis_shared_integers, the value is 10000 by default, Can be modified according to their own needs, modified and re-compiled on it.
Redis persistence mechanism
? Since Redis supports very rich types of memory data structures, how to persist these complex memory organizations to disk is a challenge, so there are more differences in how redis is persisted than traditional databases, and Redis supports four persistence modes, namely:
- Time snapshot mode (snapshot)
- How to append a file based on a statement (AOF)
- Virtual Memory (VM)
- Diskstore Way
In the design thinking, the first two are based on all the data in memory, that is, the small amount of data to provide disk landing function, the latter two ways is the author in the attempt to store data over physical memory, that is, the large data volume of data storage, as of this article, the last two persistent mode is still in the experimental phase, And the VM way basically has been abandoned by the author, so the actual can be used in the production environment only the first two, in other words, Redis is currently only as a small amount of data storage (all data can be loaded in memory), massive data storage is not the domain that Redis excels at. Here are some examples of how to persist:
timed Snapshot mode (snapshot):
? This persistence is actually a timer event inside the Redis, every fixed time to check whether the current data has changed the number of times and time to meet the configured persistence trigger condition, if satisfied then through the operating system fork to create a child process, This child process, by default, shares the same address space as the parent process, where it can traverse the entire memory for storage operations, while the main process can still provide the service, when there is write by the operating system according to the Memory page (page) Copy-on-write for the unit to ensure that the parent-child process does not affect each other.
? The main disadvantage of this persistence is that a timed snapshot represents a memory image for a period of time, so a system restart loses all data between the last snapshot and the restart.
based on statement append mode (AOF):
The AOF approach is actually similar to MySQL's statement-based Binlog approach, where each command that makes Redis memory data change is appended to a log file, which means that the log file is the persistent data for Redis.
The main disadvantage of the AOF approach is that appending the log file can cause the volume to be too large, and if it is aof when the system restarts the recovery data is very slow, the data of dozens of G may take several hours to load, of course, this time is not because the disk file read slow, Instead, all commands that are read are executed in memory. In addition, because each command has to write log, so the use of aof, Redis read and write performance will be reduced.
virtual memory mode:
? Virtual memory mode is a redis for user space data exchange in a policy, this way in the implementation of the effect is poor, the main problem is the code complex, restart slow, replication slow, etc., has been abandoned by the author.
Diskstore Way:
The Diskstore method is a new way of implementation that the author abandons the virtual memory mode, that is, the traditional b-tree way, which is still in the experimental stage, we can wait and see if the follow-up is available.
Redis Persistent disk IO mode and the problems it brings
People who have experience with Redis on-line operations will find that Redis has a lot of physical memory usage, but it has not exceeded the actual physical memory of the total capacity of the instability or even crash, some people think that the snapshot-based persistence of the fork system calls caused by double memory consumption, this view is inaccurate , because the copy-on-write mechanism of the fork call is based on the unit of the operating system page, that is, only the dirty pages that are written will be copied, but generally your system does not write all pages in a short period of time, resulting in replication, then what causes Redis to crash?
The answer is that the persistence of Redis uses buffer Io, which means that Redis writes and reads to persisted files using the page Cache of physical memory, and most database systems use direct IO to bypass this layer of page Cache and maintain a cache of the data itself, and when a Redis persistent file is too large (especially a snapshot file) and read and write to it, the data in the disk file is loaded into physical memory as the operating system caches a layer of the file. And this cache of data and Redis in memory management data is actually repeatedly stored, although the kernel in the physical memory is tight when the page cache to do the culling work, but the kernel probably think that a piece of page cache more important, and let your process start swap, Your system will start to appear unstable or crash. Our experience is that when your Redis physical memory is using more than 3/5 of the total memory capacity, it starts to be more dangerous.
is the memory data graph of Redis after reading or writing to the snapshot file Dump.rdb:
Summarize
- Choose the right data type based on your business needs and set the appropriate compact storage parameters for different scenarios.
- When a business scenario does not require data persistence, shutting down all persistence methods gives you the best performance and maximum memory usage.
- If you need to use persistence, do not use virtual memory and Diskstore mode, depending on whether you can tolerate restarting the loss of part of the data between snapshot and statement append mode.
- Do not let your redis machine use more than 3/5 of the physical memory of the actual memory.
Reprint Address: http://www.infoq.com/cn/articles/tq-redis-memory-usage-optimization-storage#anch104989
Redis memory usage optimization and storage