Redis Data Type

Source: Internet
Author: User
Common redis Data Types

Redis has the following data types:

  • String
  • Hash
  • List
  • Set
  • Sorted set

Before describing these data types, let's take a look at how redis's internal memory management describes these different data types:

 

First, redis uses a redisobject object to represent all keys and values,

The most important information of redisobject is shown in: Type represents the specific data type of a value object, and encoding is the storage method of different data types in redis,

For example, if type = string indicates that value is stored as a normal string or byte, the corresponding encoding can be raw or Int,

If it is an int, it indicates that the actual redis instance stores and represents the string by numeric type. Of course, the premise is that the string itself can be represented by numerical values, for example: "123" "456.

Note the VM field here. This field is actually allocated only when the redis virtual memory function is enabled. This function is disabled by default, this function will be described later.

Redis uses redisobject to indicate that all key/value data is a waste of memory,

Of course, these memory management costs are mainly paid to provide a unified management interface for different redis data types. The actual author also provides a variety of methods to help us minimize memory usage, we will discuss it in detail later.

 

 

Next we will analyze the usage and internal implementation of these five data types one by one:

  • String

    Common commands:

    Set, get, decr, incr, mget, etc.

    Application scenarios:

    String is the most common data type. Common Key/value storage can be classified as this type, so we will not explain it here.

    Implementation Method:

    String is a string by default stored in redis and is referenced by redisobject. When an incr, decr, or other operation is performed, it is converted to a numeric type for calculation. In this case, the encoding field of redisobject is int.

  • Hash

    Common commands:

    Hget, hset, hgetall, etc.

    Application scenarios:

    Let's take an example to describe the hash Application Scenario. For example, we want to store a user information object data that contains the following information:

    The User ID is the search key. The stored value user object contains the name, age, birthday, and other information. If you use a common key/value structure for storage, there are two storage methods:

    The first method is to use the user ID as the search key, and encapsulate other information into an object for storage in serialized mode,

  • The disadvantage of this method is that it increases the serialization/deserialization overhead, and the entire object needs to be retrieved when one of the information needs to be modified, and the modification operation needs to protect concurrency, introduce CAS and other complex problems.

    The second method is to store the key-value pair as many members of the user information object, and use the user ID + name of the corresponding attribute as a unique identifier to obtain the value of the corresponding attribute,

  • Although serialization overhead and concurrency problems are saved, the user ID is retained. If there is a large amount of such data, the memory waste is still considerable.

    The hash provided by redis can solve this problem very well. redis's hash is actually a hashmap of internally stored values and provides interfaces for direct access to this map member, such:

    That is to say, the key is still the user ID, the value is a map, the key of the map is the property name of the member, and the value is the property value,

  • In this way, you can directly modify and access data through the internal map key (the internal map key in redis is called field ),

  • That is, key (User ID) + field (attribute tag) can be used to operate the corresponding attribute data. Data does not need to be stored repeatedly, and serialization and concurrent modification control are not involved. Solved the problem.

    At the same time, it should be noted that redis provides an interface (hgetall) to directly retrieve all attribute data. However, if there are many internal map members, it involves traversing the entire internal map operation,

  • Because of the redis single-threaded model, this traversal operation may be time-consuming, and requests from other clients do not respond at all. This requires special attention.

    Implementation Method:

    As mentioned above, the value corresponding to redis hash is actually a hashmap. There are two different implementations here,

  • When there are few Members of this hash, redis will adopt a compact storage method similar to one-dimensional array to save memory, instead of a real hashmap structure,

  • The encoding of the corresponding value redisobject is zipmap. When the number of members increases, it is automatically converted to a real hashmap. In this case, encoding is ht.

  • List

    Common commands:

    Lpush, rpush, lpop, rpop, and lrange.

    Application scenarios:

    Redis list has many application scenarios and is also one of redis's most important data structures. For example, the Twitter follow list and fans list can all be implemented using the redis list structure, which is easy to understand, this is not repeated here.

    Implementation Method:

    The implementation of redis list is a two-way linked list, that is, it supports reverse lookup and traversal to facilitate operations. However, it brings some additional memory overhead and many internal implementations of redis, this data structure is also used, including the sending Buffer Queue.

  • Set

    Common commands:

    Sadd, spop, smembers, and sunion.

    Application scenarios:

    The functions provided by redis set are similar to those provided by list. The special feature is that set can automatically record duplicates. When you need to store a list of data, if you do not want duplicate data, set is a good choice, and set provides an important interface to determine whether a member is in a set. This is also not provided by list.

    Implementation Method:

    The internal implementation of set is a hashmap whose value is always null. Actually, it is to calculate the hash method to quickly remove duplicates, this is also why set can determine whether a member is in the set.

  • Sorted set

    Common commands:

    Zadd, zrange, zrem, zcard, etc.

    Use Cases:

    The usage scenario of redis sorted set is similar to set. The difference is that set is not automatically ordered, while sorted set can sort members by providing an additional priority (score) parameter, it is insert-ordered, that is, automatic sorting. When you need an ordered and non-repeated list of sets, you can choose sorted set data structure. For example, Twitter's public timeline can be stored as score by posting time, in this way, the query is automatically sorted by time.

    Implementation Method:

    In redis sorted set, hashmap and skiplist are used internally to ensure data storage and order. In hashmap, Members are mapped to scores, the hop table stores all the members, and the sorting is based on the score saved in hashmap. using the structure of the hop table, you can get a high search efficiency and the implementation is relatively simple.

Common memory optimization methods and parameters

Through some of the above implementation analysis, we can see that redis actually has a very high memory management cost, that is, it occupies too much memory, and the author knows this very well, so we provide a series of parameters and means to control and save memory. Let's discuss them separately.

First, do not enable the VM option of redis, that is, the virtual memory function, this was originally a persistent policy for redis to replace memory and disk for storing data that exceeds the physical memory, but its memory management cost is also very high, we will analyze this persistence policy in the future, so to disable the VM function, please check your redis. in the conf file, Vm-enabled is no.

Next, we 'd better set up redis. maxmemory option in Conf, which indicates that redis starts to reject subsequent write requests after using the amount of physical memory, this parameter can effectively protect your redis against swap caused by excessive physical memory usage, which seriously affects performance and even crashes.

In addition, redis provides a set of parameters for different data types to control memory usage. We have analyzed in detail that redis hash is a hashmap inside the value, if the number of members of the map is small, the compact format similar to the one-dimensional linear format will be used to store the map, which saves the memory overhead of a large number of pointers. This parameter control corresponds to redis. the conf configuration file contains the following two items:

hash-max-zipmap-entries 64 hash-max-zipmap-value 512 hash-max-zipmap-entries

It means that when the value map contains no more than a few members, it will be stored in a linear compact format. The default value is 64. That is, if the value contains less than 64 members, it will use linear compact storage, if this value is exceeded, it is automatically converted to a real hashmap.

Hash-max-zipmap-value indicates that when the length of each member value in the map is no more than a few bytes, a linear compact storage is used to save space.

If any of the above two conditions exceeds the set value, it will be converted into a real hashmap, which will no longer save memory. Is this value a greater value, the better? Of course, the answer is no, the advantage of hashmap is that the time complexity of search and operation is O (1), while the time complexity of O (n) is used to discard hash. If

If the number of members is small, the impact will not be big; otherwise, the performance will be seriously affected. Therefore, we need to weigh the setting of this value, which is the most fundamental balance between the time cost and the space cost.

Similar parameters include:

list-max-ziplist-entries 512

Note: The number of nodes in the List data type follows the compact storage format of pointer removal.

list-max-ziplist-value 64 

Note: The number of bytes smaller than the node value of the List data type adopts the compact storage format.

set-max-intset-entries 512 

NOTE: If all the internal data of the set data type is of the numeric type, and the following nodes are stored in a compact format.

The last thing I want to talk about is that the internal implementation of redis has not made too much Optimization on memory allocation. To a certain extent, there will be memory fragments, but in most cases this will not become the performance bottleneck of redis, however, if most of the data stored in redis is numeric, redis uses a shared integer internally to save the memory allocation overhead, that is, when the system starts, it first allocates ~ N so many numeric objects are placed in a pool. If the stored data happens to be data within the value range, the object is taken directly from the pool, in addition, the system can share the data by referencing the count, which saves memory and improves performance to a certain extent when a large number of values are stored in the system, the setting of this parameter value n needs to modify a macro in the source code to define redis_shared_integers. The default value is 10000. You can modify the value according to your own needs, and then re-compile it.

Redis persistence Mechanism

Redis supports a wide range of data structures in the memory. It is difficult to persistently organize these complex memory types to disks, therefore, there are many differences between redis persistence methods and traditional databases. redis supports four persistence methods:

  • Timed snapshot method (snapshot)
  • Statement-based file appending method (AOF)
  • Virtual Memory (VM)
  • Diskstore Method

In terms of design ideas, the first two methods are based on the fact that all data is in the memory, that is, the disk landing function is provided for small data volumes, the next two methods are the authors trying to store data that exceeds the physical memory, that is, big data storage. As of this article, the last two persistence methods are still in the experimental phase, in addition, the VM method has basically been abandoned by the author, so only the first two methods can be used in the production environment, in other words, redis can only be used as storage of small data volumes (all data can be loaded into the memory). Massive Data Storage is not what redis is good. The persistence methods are described as follows:

Scheduled snapshot method (snapshot ):

This persistence method is actually a timer event in redis. It checks whether the number and time of changes to the current data meet the configured persistent triggering conditions at a fixed time, if yes, a sub-process is created through the fork call of the operating system. By default, the sub-process shares the same address space with the parent process, in this case, the sub-process can be used to traverse the entire memory for storage operations, while the main process can still provide services. When there is a write, the operating system will follow the Memory Page) to ensure that the parent and child processes do not affect each other.

The main disadvantage of this persistence is that the scheduled snapshot only represents the memory image for a period of time, so the system will lose all the data between the last snapshot And the restart.

Statement-based append (AOF ):

The aof method is similar to the statement-based BINLOG method of MySQL. That is, each command that changes the redis memory data is appended to a log file, that is to say, this log file is the persistent data of redis.

The main disadvantage of aof is that the append log file may lead to a large volume. When the system restarts to restore data, if aof is used, the data loading will be very slow, it may take several hours to load dozens of GB of data. Of course, this time is not because the disk file reading speed is slow, but because all the read commands must be executed in the memory. In addition, because every command has to write logs, the Read and Write Performance of apsaradb for redis will also decrease with the aof method.

Virtual Memory mode:

The virtual memory mode is a redis policy for switching data in and out of user space. This mode has poor implementation performance. The main problem is that the Code is complex, the restart is slow, and the replication is slow, it has been abandoned by the author.

Diskstore mode:

The diskstore method is a new implementation method that the author chooses after giving up the virtual memory method, that is, the traditional B-tree method. It is still in the experimental stage, we can wait and see if it will be available in the future.

Redis persistent disk Io mode and Problems

People with online redis O & M experience will find that redis uses a lot of physical memory, but it is unstable or even crashes if it does not exceed the actual total physical memory capacity, some people think that the snapshot-based persistent fork system call results in a doubling of memory usage, which is inaccurate, because the copy-on-write mechanism called by fork is based on the operating system page, that is, only dirty pages that have been written will be copied, but in general, your system won't write all pages in a short time, causing replication. What causes redis to crash?

The answer is that redis uses buffer Io for persistence. The so-called buffer Io means that redis will use the physical memory page cache for writing and reading persistent files, most database systems use direct Io to bypass this page cache and maintain a data cache on their own. When the persistent file in redis is too large (especially the snapshot file ), during read/write operations, the data in the disk files will be loaded into the physical memory as a cache for the file by the operating system, the data in this layer of cache is stored repeatedly with the data managed in redis memory. Although the kernel will remove the page cache when the physical memory is insufficient, but the kernel may think that a page cache is more important, and let your process start swap, then your system will begin to become unstable or crash. Our experience is that when your redis physical memory usage exceeds 3/5 of the total memory capacity, it will start to be dangerous.

It is the Memory Data graph of redis after reading or writing the snapshot file dump. RDB:

Summary:
  1. Select an appropriate data type based on business needs, and set corresponding compact storage parameters for different application scenarios.
  2. When data persistence is not required in business scenarios, disabling all persistence methods can achieve optimal performance and maximum memory usage.
  3. If persistence is required, choose one of snapshot mode and statement appending mode based on whether restart and loss of some data can be tolerated. Do not use virtual memory or diskstore mode.
  4. Do not use the physical memory of your apsaradb for redis host to more than 3/5 of the total actual memory.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.