Common redis Data Types
Redis has the following data types:
- String
- Hash
- List
- Set
- Sorted set
Before describing these data types, let's take a look at how redis's internal memory management describes these different data types:
First, redis uses a redisobject object to represent all keys and values. The primary information of redisobject is shown in: Type represents the specific data type of a value object, encoding is the storage method of different data types in redis. For example, if type = string indicates that value is stored as a common string, the corresponding encoding can be raw or Int, if it is an int, it indicates that the actual redis instance stores and represents the string by numeric type. Of course, the premise is that the string itself can be represented by numerical values, for example: "123" "456.
Note the VM field here. This field is actually allocated only when the redis virtual memory function is enabled. This function is disabled by default, this function will be described later. Redis uses redisobject to indicate that all key/value data is a waste of memory, of course, these memory management costs are mainly paid to provide a unified management interface for different redis data types. The actual author also provides a variety of methods to help us minimize memory usage, we will discuss it in detail later.
Next we will analyze the usage and internal implementation of these five data types one by one:
- String
Common commands:
Set, get, decr, incr, mget, etc.
Application scenarios:
String is the most common data type. Common Key/value storage can be classified as this type, so we will not explain it here.
Implementation Method:
String is a string by default stored in redis and is referenced by redisobject. When an incr, decr, or other operation is performed, it is converted to a numeric type for calculation. In this case, the encoding field of redisobject is int.
- Hash
Common commands:
Hget, hset, hgetall, etc.
Application scenarios:
Let's take an example to describe the hash Application Scenario. For example, we want to store a user information object data that contains the following information:
The User ID is the search key. The stored value user object contains the name, age, birthday, and other information. If you use a common key/value structure for storage, there are two storage methods:
The first method is to use the user ID as the search key and encapsulate other information into an object for storage in serialized mode. The disadvantage of this method is that it increases the serialization/deserialization overhead, in addition, when you need to modify one of the items, You need to retrieve the entire object, and the modification operation needs to protect concurrency and introduce complicated problems such as CAS.
The second method is to store the key-value pair as many members of the user information object, and use the user ID + name of the corresponding attribute as a unique identifier to obtain the value of the corresponding attribute, although serialization overhead and concurrency problems are saved, the user ID is retained. If there is a large amount of such data, the memory waste is still considerable.
The hash provided by redis can solve this problem very well. redis's hash is actually a hashmap of internally stored values and provides interfaces for direct access to this map member, such:
That is to say, the key is still the user ID, the value is a map, the key of the map is the property name of the member, and the value is the property value, in this way, you can directly modify and access data through the internal map key (the internal map key in redis is called field), that is, through the key (User ID) + field (attribute tag) can operate the corresponding attribute data, neither storing data repeatedly nor causing serialization and concurrency modification control problems. Solved the problem.
At the same time, it should be noted that redis provides an interface (hgetall) to directly retrieve all attribute data. However, if there are many internal map members, it involves traversing the entire internal map operation, because of the redis single-threaded model, this traversal operation may be time-consuming, and requests from other clients do not respond at all. This requires special attention.
Implementation Method:
As mentioned above, the value corresponding to redis hash is actually a hashmap. There are two different implementations here, when there are few Members of this hash, redis will adopt a compact storage like a one-dimensional array to save memory, instead of a real hashmap structure. The corresponding value redisobject's encoding is zipmap, when the number of members increases, it is automatically converted to a real hashmap. In this case, encoding is ht.
- List
Common commands:
Lpush, rpush, lpop, rpop, and lrange.
Application scenarios:
Redis list has many application scenarios and is also one of redis's most important data structures. For example, the Twitter follow list and fans list can all be implemented using the redis list structure, which is easy to understand, this is not repeated here.
Implementation Method:
The implementation of redis list is a two-way linked list, that is, it supports reverse lookup and traversal to facilitate operations. However, it brings some additional memory overhead and many internal implementations of redis, this data structure is also used, including the sending Buffer Queue.
- Set
Common commands:
Sadd, spop, smembers, and sunion.
Application scenarios:
The functions provided by redis set are similar to those provided by list. The special feature is that set can automatically record duplicates. When you need to store a list of data, if you do not want duplicate data, set is a good choice, and set provides an important interface to determine whether a member is in a set. This is also not provided by list.
Implementation Method:
The internal implementation of set is a hashmap whose value is always null. Actually, it is to calculate the hash method to quickly remove duplicates, this is also why set can determine whether a member is in the set.
- Sorted set
Common commands:
Zadd, zrange, zrem, zcard, etc.
Use Cases:
The usage scenario of redis sorted set is similar to set. The difference is that set is not automatically ordered, while sorted set can sort members by providing an additional priority (score) parameter, it is insert-ordered, that is, automatic sorting. When you need an ordered and non-repeated list of sets, you can choose sorted set data structure. For example, Twitter's public timeline can be stored as score by posting time, in this way, the query is automatically sorted by time.
Implementation Method:
In redis sorted set, hashmap and skiplist are used internally to ensure data storage and order. In hashmap, Members are mapped to scores, the hop table stores all the members, and the sorting is based on the score saved in hashmap. using the structure of the hop table, you can get a high search efficiency and the implementation is relatively simple.
Common memory optimization methods and parameters
Through some of the above implementation analysis, we can see that redis actually has a very high memory management cost, that is, it occupies too much memory, and the author knows this very well, so we provide a series of parameters and means to control and save memory. Let's discuss them separately.
First, do not enable the VM option of redis, that is, the virtual memory function, this was originally a persistent policy for redis to replace memory and disk for storing data that exceeds the physical memory, but its memory management cost is also very high, we will analyze this persistence policy in the future, so to disable the VM function, please check your redis. in the conf file, Vm-enabled is no.
Next, we 'd better set up redis. maxmemory option in Conf, which indicates that redis starts to reject subsequent write requests after using the amount of physical memory, this parameter can effectively protect your redis against swap caused by excessive physical memory usage, which seriously affects performance and even crashes.
In addition, redis provides a set of parameters for different data types to control memory usage. We have analyzed in detail that redis hash is a hashmap inside the value, if the number of members of the map is small, the compact format similar to the one-dimensional linear format will be used to store the map, which saves the memory overhead of a large number of pointers. This parameter control corresponds to redis. the conf configuration file contains the following two items:
hash-max-zipmap-entries 64 hash-max-zipmap-value 512 hash-max-zipmap-entries
It means that when the value map contains no more than a few members, it will be stored in a linear compact format. The default value is 64. That is, if the value contains less than 64 members, it will use linear compact storage, if this value is exceeded, it is automatically converted to a real hashmap.
Hash-max-zipmap-value indicates that when the length of each member value in the map is no more than a few bytes, a linear compact storage is used to save space.
If any of the above two conditions exceeds the set value, it will be converted into a real hashmap, which will no longer save memory. Is this value a greater value, the better? Of course, the answer is no, the advantage of hashmap is that the time complexity of search and operation is O (1), while the time complexity of O (n) is used to discard hash. If
If the number of members is small, the impact will not be big; otherwise, the performance will be seriously affected. Therefore, we need to weigh the setting of this value, which is the most fundamental balance between the time cost and the space cost.
Similar parameters include:
list-max-ziplist-entries 512
Note: The number of nodes in the List data type follows the compact storage format of pointer removal.
list-max-ziplist-value 64
Note: The number of bytes smaller than the node value of the List data type adopts the compact storage format.
set-max-intset-entries 512
NOTE: If all the internal data of the set data type is of the numeric type, and the following nodes are stored in a compact format.
The last thing I want to talk about is that the internal implementation of redis has not made too much Optimization on memory allocation. To a certain extent, there will be memory fragments, but in most cases this will not become the performance bottleneck of redis, however, if most of the data stored in redis is numeric, redis uses a shared integer internally to save the memory allocation overhead, that is, when the system starts, it first allocates ~ N so many numeric objects are placed in a pool. If the stored data happens to be data within the value range, the object is taken directly from the pool, in addition, the system can share the data by referencing the count, which saves memory and improves performance to a certain extent when a large number of values are stored in the system, the setting of this parameter value n needs to modify a macro in the source code to define redis_shared_integers. The default value is 10000. You can modify the value according to your own needs, and then re-compile it.
Redis persistence Mechanism
Redis supports a wide range of data structures in the memory. It is difficult to persistently organize these complex memory types to disks, therefore, there are many differences between redis persistence methods and traditional databases. redis supports four persistence methods:
- Timed snapshot method (snapshot)
- Statement-based file appending method (AOF)
- Virtual Memory (VM)
- Diskstore Method
In terms of design ideas, the first two methods are based on the fact that all data is in the memory, that is, the disk landing function is provided for small data volumes, the next two methods are the authors trying to store data that exceeds the physical memory, that is, big data storage. As of this article, the last two persistence methods are still in the experimental phase, in addition, the VM method has basically been abandoned by the author, so only the first two methods can be used in the production environment, in other words, redis can only be used as storage of small data volumes (all data can be loaded into the memory). Massive Data Storage is not what redis is good. The persistence methods are described as follows:
Scheduled snapshot method (snapshot ):
This persistence method is actually a timer event in redis. It checks whether the number and time of changes to the current data meet the configured persistent triggering conditions at a fixed time, if yes, a sub-process is created through the fork call of the operating system. By default, the sub-process shares the same address space with the parent process, in this case, the sub-process can be used to traverse the entire memory for storage operations, while the main process can still provide services. When there is a write, the operating system will follow the Memory Page) to ensure that the parent and child processes do not affect each other.
The main disadvantage of this persistence is that the scheduled snapshot only represents the memory image for a period of time, so the system will lose all the data between the last snapshot And the restart.
Statement-based append (AOF ):
The aof method is similar to the statement-based BINLOG method of MySQL. That is, each command that changes the redis memory data is appended to a log file, that is to say, this log file is the persistent data of redis.
The main disadvantage of aof is that the append log file may lead to a large volume. When the system restarts to restore data, if aof is used, the data loading will be very slow, it may take several hours to load dozens of GB of data. Of course, this time is not because the disk file reading speed is slow, but because all the read commands must be executed in the memory. In addition, because every command has to write logs, the Read and Write Performance of apsaradb for redis will also decrease with the aof method.
Virtual Memory mode:
The virtual memory mode is a redis policy for switching data in and out of user space. This mode has poor implementation performance. The main problem is that the Code is complex, the restart is slow, and the replication is slow, it has been abandoned by the author.
Diskstore mode:
The diskstore method is a new implementation method that the author chooses after giving up the virtual memory method, that is, the traditional B-tree method. It is still in the experimental stage, we can wait and see if it will be available in the future.
Redis persistent disk Io mode and Problems
People with online redis O & M experience will find that redis uses a lot of physical memory, but it is unstable or even crashes if it does not exceed the actual total physical memory capacity, some people think that the snapshot-based persistent fork system call results in a doubling of memory usage, which is inaccurate, because the copy-on-write mechanism called by fork is based on the operating system page, that is, only dirty pages that have been written will be copied, but in general, your system won't write all pages in a short time, causing replication. What causes redis to crash?
The answer is that redis uses buffer Io for persistence. The so-called buffer Io means that redis will use the physical memory page cache for writing and reading persistent files, most database systems use direct Io to bypass this page cache and maintain a data cache on their own. When the persistent file in redis is too large (especially the snapshot file ), during read/write operations, the data in the disk files will be loaded into the physical memory as a cache for the file by the operating system, the data in this layer of cache is stored repeatedly with the data managed in redis memory. Although the kernel will remove the page cache when the physical memory is insufficient, but the kernel may think that a page cache is more important, and let your process start swap, then your system will begin to become unstable or crash. Our experience is that when your redis physical memory usage exceeds 3/5 of the total memory capacity, it will start to be dangerous.
It is the Memory Data graph of redis after reading or writing the snapshot file dump. RDB:
Summary:
- Select an appropriate data type based on business needs, and set corresponding compact storage parameters for different application scenarios.
- When data persistence is not required in business scenarios, disabling all persistence methods can achieve optimal performance and maximum memory usage.
- If persistence is required, choose one of snapshot mode and statement appending mode based on whether restart and loss of some data can be tolerated. Do not use virtual memory or diskstore mode.
- Do not use the physical memory of your apsaradb for redis host to more than 3/5 of the total actual memory.