Design and Implementation of redis

Source: Internet
Author: User
Design and Implementation of redis

I like the creation process of this book. It is hosted on git in an open-source manner for creation;
The author has read through the redis source code and shared the detailed annotated source code, making it easier for me to learn about redis;
Reading excellent source code works can quickly improve the Code's internal strength. However, a small amount of Code such as redis (more than 20 thousand lines) is a delicate work, of course, not to be missed;
If you are interested, enjoy it;
Source code: https://github.com/huangz1990/annotated_redis_source

The following are the Reading Notes of this book;

Internal string Implementation of redis

Redis uses self-implemented SDS type to represent strings:
Cause: append and length can be efficiently calculated, and it is still binary secure.
In redis, string append and length calculation are not uncommon, while append and strlen are the direct ing of these two operations in redis commands, these two simple operations should not be the performance bottleneck. In addition to processing the C string, redis also needs to process pure byte arrays and Server protocols. For convenience, the string representation of redis should also be binary secure:
The program should not make any assumptions about the data stored in the string. The data can be a C string ending with \ 0, a pure byte array, or data in other formats.
For more information about SDS, see:
Http://origin.redisbook.com/en/latest/internal-datastruct/sds.html#sds

Differences between the internal ing data structure and the Memory Data Structure

Memory ing Data Structure: Integer Set, compressed linked list
Internal data structure: simple string (SDS), double-ended linked list, dictionary, and skip table
The internal data structure is very powerful, but creating a complete series of data structures is also a very memory-consuming task. When an object contains a small number of elements, or if the size of an element is not large, it is not the best to use a costly internal data structure. To solve this problem, redis uses the memory ing data structure to replace the internal data structure when conditions permit. The memory ing data structure is a series of specially encoded byte sequences, which consume much less memory than the internal data structure similar to the function. if used properly, the memory ing data structure can save a lot of memory for users. However, because the encoding and operation methods of the memory ing data structure are much more complex than the internal data structure, therefore, the CPU time occupied by the memory ing data structure is much higher than the internal data structure with similar functions.

Intersection of union and summation

Collection is easy to use. redis sets support intersection and Union operations, greatly improving the application scope of the Set;
However, it should be noted that the complexity of the algorithm used to calculate the Union set is O (n), while the complexity of the algorithm used to calculate the intersection is O (n square ), when designing a collection storage policy, we should minimize the use of intersection operations;

ACID properties of transactions

In traditional relational databases, acid is often used to verify the security of transaction functions. Redis transactions ensure consistency (C) and isolation (I), but they do not guarantee atomicity (A) and durability (d ).
The execution of a single redis command is atomic, but redis does not add any mechanism to maintain Atomicity in transactions, so the execution of redis transactions is not atomic. If all the commands in a transaction queue are successfully executed, the transaction is successfully executed. On the other hand, if the redis server process is stopped during the transaction execution process, such as receiving the kill signal, host machine shutdown, and so on, the transaction fails to be executed. When a transaction fails, redis does not retry or roll back.
Because a transaction uses a queue to wrap up a group of redis commands and does not provide any additional persistence features, the transaction persistence is determined by the persistence mode used by redis.
See: http://origin.redisbook.com/en/latest/feature/transaction.html#acid

Supports Lua scripts

The Lua script feature is the biggest highlight of reids 2.6. With embedded support for the Lua environment, redis has overcome the disadvantages of failing to efficiently process CAS (check-and-set) commands for a long time, in addition, you can use multiple commands in combination to easily implement a previously hard-to-achieve or inefficient-to-achieve mode.

Interaction between Lua scripts and redis through pseudo Terminals

Because redis commands must be executed through the client, you need to create a non-Network-connected pseudo client (fake client) in the server State to execute the redis commands contained in the Lua script: when the Lua script needs to execute the redis command, it sends a command request to the server through the pseudo client. After executing the command, the server returns the result to the pseudo client, the pseudo client then returns the command result to the Lua script.
Note that this pseudo client has no network connection. How does it communicate with redis? Is it in a process?

Eliminate the randomness of Script Execution

Similar to the random nature, if a script execution depends on any side effects, the results of each execution of the script may be different. To solve this problem, redis imposes a strict limit on the scripts that can be executed in the Lua environment-all scripts must be pure functions without side effects ). To this end, redis has made some corresponding measures for the Lua environment:
? Database that accesses the system status (such as the system time Database) is not provided ).
? Disable load? Le function.
? If the script tries to execute a write command (such as set) after executing a command with a random nature (such as randomkey) or a command with a side effect (such as time ), redis will stop the script from running and return an error.
? If the script executes a random read command (such as smembers), an automatic lexicographically ordered command is executed before the Script output is returned to redis, this ensures that the output results are ordered.
? Use the random generation function defined by redis to replace the original math of the math table in the Lua environment. the random function and math. the randomseed function. The new function has the following properties: Each Lua script is executed unless math is explicitly called. randomseed, otherwise math. the pseudo-random number sequence generated by random is always the same.

Key expiration time

Through the expire, pexpire, expireat, and pexpireat commands, the client can set an expiration time for an existing key. When the key expiration time reaches, the key will no longer be available;
When the stored key is used for caching, we usually need to set an expiration time, which will be deleted by redis after expiration;
Generally two steps:
Set key value
Expire key seconds
With setex, you only need to set the value and expiration time in one step:
Setex key seconds Value
I think it would be nice if all the commands that add value to the database have corresponding functions for setting the expiration time? Of course, this is not the case. Except for setex in set and other operations such as sadd in set, there is no such one-step operation command;

Clear expired keys

If a key expires, when will it be deleted?
There are three possible answers to this question:

  1. Scheduled deletion: When the expiration time of the key is set, a scheduled event is created. When the expiration time is reached, the event processor automatically deletes the key.
  2. Inert deletion: if the key expires, check whether the key expires every time you retrieve the key value from the dict dictionary. If the key expires, delete it and return null; if it does not expire, return the key value.
  3. Regular deletion: Check the expires dictionary and delete the expired keys at intervals. Regular deletion is a compromise between the two policies:
    ? It performs a delete operation at intervals and limits the duration and frequency of the delete operation to reduce the impact of the delete operation on CPU time.
    ? On the other hand, by regularly deleting the expired key, it effectively reduces the memory waste caused by the inert deletion.

Regular deletion and inert deletion have obvious defects in a single use: scheduled deletion takes too much CPU time and too much memory is wasted;
The expiration key deletion policy used by redis is inert deletion plus Regular deletion. These two policies work together to achieve a good balance between the reasonable use of CPU time and memory saving.
Reference: http://origin.redisbook.com/en/latest/internal/db.html#id20

RDB persistence

The rdbsave function saves the database data in the memory to the disk in RDB format. If the RDB file already exists, the new RDB file replaces the existing RDB file. During the saving of the RDB file, the main process will be blocked until the storage is complete. Both the SAVE and bgsave commands call the rdbsave function, but they call Different Methods :? Save calls rdbsave directly to block the redis main process until the storage is complete. During the congestion of the main process, the server cannot process any client requests .? Bgsave then fork generates a sub-process. The sub-process is responsible for calling rdbsave and sending a signal to the main process after saving. The notification is saved. Because rdbsave is called in the sub-process, the redis server can continue to process client requests during bgsave execution.

Save, bgsave, aof write, and bgrewriteaof

When the Save command is executed, the redis server is blocked. Therefore, when the Save command is being executed, the new save, bgsave, or bgrewriteaof call will not function. The new save, bgsave, or bgrewriteaof command will be processed only after the previous SAVE execution is complete and redis starts to accept the request again. In addition, because aof write is completed by the background thread and bgrewriteaof is completed by the sub-process, aof write and bgrewriteaof can be performed simultaneously during the Save execution.

Before running the Save command, the server will check whether bgsave is being executed. If yes, the server will not call rdbsave, but will return an error message to the client, informing you that during bgsave execution, you cannot run save. This avoids the cross-execution of the two rdbsave called by Save and bgsave, resulting in competition conditions. On the other hand, when bgsave is being executed, the client that calls the new bgsave command will receive an error message informing the client that bgsave is being executed.
Bgrewriteaof and bgsave cannot be executed simultaneously:
? If bgsave is being executed, the bgrewriteaof rewrite request will be delayed until bgsave is executed. The client that executes the bgrewriteaof command will receive a delayed response.
? If bgrewriteaof is being executed, the client that calls bgsave will receive an error message indicating that the two commands cannot be executed simultaneously. The bgrewriteaof and bgsave commands do not conflict with each other in terms of operation. It is only a performance consideration to execute them simultaneously: and issue two sub-processes, in addition, the two sub-processes simultaneously perform a large number of disk write operations, which is not a good idea.

In general:
Rdbsave saves the database data to the RDB file and blocks the caller before the SAVE is complete.
? The Save command calls rdbsave directly to block the main redis process. bgsave uses a sub-process to call rdbsave, and the main process can continue to process the command request.
? During the Save execution, aof write can be performed in the background thread, and bgrewriteaof can be performed in sub-processes. Therefore, these three operations can be performed simultaneously.
? To avoid competition, the SAVE command cannot be executed during bgsave execution.
? To avoid performance problems, bgsave and bgrewriteaof cannot be executed simultaneously.

Process requests that arrive during data loading

During the loading, the server processes all incoming requests every time the server loads 1000 keys. However, requests with only the publish, subscribe, psubscribe, UNSUBSCRIBE, and punsubscribe commands are correctly processed, all other commands return errors. After loading is complete, the server starts to process all commands normally.

Aof is better than RDB

Because aof files are usually stored more frequently than RDB files, data in aof files is generally newer than that in RDB files. Therefore, if the aof function is enabled when the server is started, the program preferentially uses the aof file to restore data. Redis uses the RDB file to restore data only when the aof function is not enabled.

Aof three phases of file writing

The entire process from a command to an aof file can be divided into three phases:

  1. Command propagation: redis sends information such as executed commands, command parameters, and number of command parameters to the aof program. 2. cache append: Based on the received command data, the aof program converts the command to the format of the network communication protocol, and then appends the Protocol content to the aof cache of the server.
  2. File writing and storage: The aof cache content is written to the end of the aof file. If the set aof storage conditions are met, the fsync function or fdatasync function will be called, save the written content to the disk.
Effect of aof storage mode on performance and security

Redis currently supports three aof storage modes:

  1. Aof_fsync_no: Do not save.
  2. Aof_fsync_everysec: save every second.
  3. Aof_fsync_always: each time a command is executed, it is saved.

Three aof storage modes:

  1. Do not save (aof_fsync_no): both write and save are executed by the main process, and both operations will block the main process.
  2. Save every second (aof_fsync_everysec): write operations are executed by the main process, blocking the main process. The Save operation is executed by the sub-thread and does not directly block the main process. However, the speed of the Save operation will affect the blocking duration of the write operation.
  3. Save each command (aof_fsync_always): Same as Mode 1. Because blocking operations will make the redis main process unable to continuously process requests, generally speaking, the fewer blocking operations, the faster the execution, the better the redis performance.
Aof File Reading and data restoration

The Save operation in Mode 1 is only executed when aof is disabled or redis is disabled, or triggered by the operating system. In general, this mode only needs to block write operations, therefore, its write performance is higher than the other two modes. Of course, this performance improvement is at the cost of reducing security: In this mode, if a shutdown occurs in the middle of the Operation, the number of lost data is determined by the cache flushing policy of the operating system.
Mode 2 is superior to Mode 3 in terms of performance, and in general, this mode can lose no more than 2 seconds of data, so its security is higher than Mode 1, this is a storage solution that combines performance and security.
Mode 3 has the highest security, but the performance is also the worst, because the server must be blocked until the command information is written and saved to the disk before it can continue to process requests.

Aof background Rewriting

The aof rewrite program can well complete the task of creating a new aof file. However, when executing this program, the caller thread will be blocked. Obviously, as an auxiliary maintenance method, redis does not want aof rewrite to cause the server to be unable to process the request. Therefore, redis decided to put the aof rewrite program in the (background) sub-process for execution, the biggest benefit of such processing is:

  1. During aof rewriting, the main process can continue to process command requests.
  2. A child process carries a copy of the data of the master process. Using a child process instead of a thread can ensure data security without locking. However, there is also a problem to solve when using a sub-process: Because the sub-process needs to continue to process commands during aof rewrite, the new command may modify the existing data, which will make the data in the current database inconsistent with the data in the overwritten aof file. To solve this problem, redis adds an aof rewrite cache, Which is enabled after the Fork sub-process is released. After the redis master process receives a new write command, in addition to appending the Protocol content of this write command to the existing aof file, it will also be appended to the cache.
    Note: Aren't all sub-processes and threads needing to be locked when accessing data?
    Ref: http://blog.csdn.net/wangkehuai/article/details/7089323
Aof background rewriting trigger conditions

After the child process completes aof rewriting, it will send a completion signal to the parent process. After receiving the completion signal, the parent process will call a signal processing function and complete the following work:

  1. Write all the content in the aof rewrite cache to the new aof file.
  2. Rename the new aof file to overwrite the original aof file. After step 1 is completed, the status of the existing aof file, new aof file, and database is completely consistent. After step 2 is completed, the program completes the alternation of the New and Old aof files.
    During the entire aof background rewriting process, only the final write cache and rename operations will cause the main process to be blocked. In other cases, aof background rewriting will not block the main process, this minimizes the impact of aof rewriting on performance.

When the servercron function is executed, it checks whether all of the following conditions are met. If yes, it triggers automatic aof Rewriting:

  1. No bgsave command is in progress.
  2. No bgrewriteaof is in progress.
  3. The current aof file size is greater than server. aof_rewrite_min_size (default value: 1 MB ).
  4. The ratio between the current aof file size and the size after the last aof rewrite is greater than or equal to the specified growth percentage. By default, the growth percentage is 100%, that is, if the first three conditions are met and the current aof file size is twice the size of the last aof rewrite, then the automatic aof rewrite is triggered.
Event

Events are the core of the redis server. They process two important tasks:

  1. Process file events: Implement multiplexing among multiple clients, accept the command requests sent from them, and return the command execution results to the client.
  2. Time event: Server cron job)
File events

Redis server achieves efficient command request processing by multiplexing between multiple clients: Multiple Clients connect to the redis server through sockets, however, the server interacts with these clients only when the socket can be read or written without interruption.

When the server has command results to return to the client, and the client has a new command request to enter, the server first processes the new command request.

Event execution and Scheduling

Redis contains both file events and time events, so how to schedule these two events becomes a key issue. To put it simply, two events in redis are in a cooperative relationship, and they contain the following three attributes:

  1. An event is executed only after it is executed.
  2. The event processor processes the file event (Processing Command requests) first, and then executes the time event (calling servercron)
  3. The waiting time of file events (the maximum blocking time of poll functions) is determined by the time event with the shortest distance.

Note:
? A time event is divided into a single execution event and a cyclic execution event. A general servercron operation on the server is a cyclic event.
? The relationship between file events and time events is a cooperative relationship: an event will be executed after another event is completed without preemption.

Command request, processing, and result return

Redis uses multiplexing to process multiple clients. To separate multiple clients and avoid mutual interference, the server maintains a redisclient structure for each connected client, to separately Save the status information of the client.
After the client connects to the server, the client can send a command request to the server. From sending a command request from a client to processing the command by the server, and returning the result to the client, you can perform the following steps:

  1. The client transmits command protocol data to the server through a socket.
  2. The server processes incoming data through read events and stores the data in the query cache of the corresponding redisclient structure of the client.
  3. Based on the content in the cache, the program searches the command table for the implementation functions of the corresponding command.
  4. The program executes the command implementation function, modifies the global state server variable of the server, and saves the command execution result to the reply cache of the client redisclient structure, then, write events for the FD Association of the client.
  5. When the write event of the client FD is ready, return the command result in the response cache to the client. So far, the command has been executed.

Posted by: Large CC | 11jul, 2014
Blog: blog.me115.com [subscription]
Weibo: Sina Weibo

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.