Implementation of distributed system lightweight coordination technology using Redis

Last Update:2016-06-01 Source: Internet

Author: User

Tags lua

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.ibm.com/developerworks/cn/opensource/os-cn-redis-coordinate/index.html

In a distributed system, each process (this article uses a process to describe a running body in a distributed system, which can be on the same physical node or on a different physical node) is usually coordinated, and sometimes it is dependent on the data processed by different processes and must be processed in a certain order, Sometimes a process is required to process certain transactions at certain times, and so on, people often use techniques such as distributed locks and election algorithms to coordinate the behavior of each process. Because of the complex nature of the distributed system itself, as well as the requirements for fault tolerance, these techniques are usually heavyweight, such as the Paxos algorithm, the bully election algorithm, ZooKeeper, etc., focusing on message communication rather than shared memory, which is often known as complex and difficult to understand, It is a challenge when problems are encountered in specific implementations and implementations.

Redis is often considered a NoSQL software, but it is essentially a distributed data structure Server software that provides a distributed memory-based data structure storage service. On the implementation, only one thread is used to handle the specific memory data structure, and the atomic characteristics of its data manipulation commands are guaranteed; it also supports LUA-based scripting, where each Redis instance uses the same LUA interpreter to interpret running LUA scripts, and the Lua scripts have atomic properties. This atomic manipulation makes it possible to reconcile distributed systems based on shared-memory patterns, and is attractive, unlike complex message-based mechanisms, where shared-memory-based patterns are significantly easier to understand for many technicians, especially those who already know about multithreading or multi-process technology. In practice, not all distributed systems are like distributed database systems need strict model, and the technology used does not necessarily have a solid theoretical basis and mathematical proof, which makes the implementation of the distributed system based on REDIS coordination technology has a certain practical value, in fact, People have also made a lot of attempts. In this paper, some of the coordination techniques are introduced.

Signal/wait Actions (publish-subscribe)

In a distributed system, some processes need to wait for the state of other processes to change, or to inform other processes of their own state changes, for example, the process has an operational dependency order, there is a process to wait, there is a process need to send signals to notify the waiting process for subsequent operations, which can be done through the Redis Pub/sub Series commands to complete, such as:

Import Redis, Timerc = Redis. Redis () def wait (wait_for): PS = Rc.pubsub () ps.subscribe (wait_for) ps.get_message () wait_msg = None while true:msg = Ps.get_message () If MSG and msg[' type '] = = ' message ': wait_msg = msg break  time.sleep (0.001) ps.close () return Wait_ms Gdef Signal_broadcast (wait_in, data): Wait_count = Rc.publish (wait_in, data) return Wait_count

With this method it is easy to extend the implementation of other wait policies, such as the try wait,wait timeout, wait for multiple signals when waiting for all signals or any signal arrives to return and so on. Because Redis natively supports message subscriptions based on pattern matching (using the Psubscribe command), setting the wait signal can also be done in pattern matching mode.

Unlike other data operations, subscription messages are instantly perishable, not stored in memory, persisted, and will not be re-sent if the client-to-server connection is broken, but when the Master/slave node is configured, the Publish command is synchronized to the slave node , so that we can subscribe to a channel at the same time on the connection of master and slave nodes, so that we can receive the publisher's message at the same time, even if master fails during use, or if the connection to master fails, we can still get from slave The node obtains the subscribed message for better robustness. In addition, because the data is not written to disk, this method has an advantage in performance.

The signal in the above method is broadcast, all processes in wait are signaled, if the signal is to be set to unicast and only one of them is allowed to receive the signal, it can be implemented by agreeing the channel name pattern, for example:

Channel name = channel name prefix (channel) + Subscriber Global unique ID (myID)

The unique ID can be a UUID or a random number string, ensuring global uniqueness. Use the "PubSub channels channel*" command to get all the subscribers ' subscribed channels before sending signal, and then send a signal to one of the randomly designated channels; you need to pass your own unique ID when waiting, merge channel name prefix and unique ID into one channel name , and then wait the same as the previous example. Examples are as follows:

Import randomsingle_cast_script= "" "Local channels = Redis.call (' pubsub ', ' channels ', argv[1] ... *‘); If #channels = = 0 then return 0; End Local index= Math.mod (Math.floor (Tonumber (argv[2)), #channels) + 1; Return Redis.call (' Publish ', Channels[index], argv[3]);  Def wait_single (channel, myID): Return Wait (Channel + myID) def signal_single (channel, data): rand_num = Int (Random.ran Dom () * 65535) return Rc.eval (single_cast_script, 0, Channel, str (rand_num), str (data))

Distributed lock Distributed Locks (Setnx DEL)

The implementation of distributed locks is one of the most explored directions, and on the official Redis website there is a document on the Redis-based distributed lock, which presents the Redlock algorithm and lists the implementation cases of multiple languages, as a brief introduction.

The Redlock algorithm looks at three elements that satisfy a distributed lock:

Security: Guaranteed mutual exclusion, at most only one client can hold a lock at any time
Lock-Free: Even if the client that currently holds the lock crashes or is separated from the cluster, the other client will always be able to obtain the lock.
Fault tolerance: As long as the majority of Redis nodes are online, the client is able to acquire and release locks.

A simple and straightforward way to implement a lock is to use the Set NX command to set a key that sets the lifetime TTL to acquire the lock, to release the lock by removing the key, and to ensure that the deadlock is avoided through the survival cycle. However, this approach has a single point of failure risk, and if the Master/slave node is deployed, it can lead to security violations under certain conditions, such as:

Client A obtains A lock from the master node
The master node crashed before copying the key to the slave node
Slave node promotion to the new master node
Client B obtains the lock from the new master node, which is actually held by client A, causing two clients in the system to hold the same mutex during the same time period, which destroys the security of the mutex.

In the Redlock algorithm, the lock is made by a command similar to the following:

SET resource_name my_random_value NX PX 30000

The my_random_value here are globally different random numbers, and each client needs to generate the random number itself and remember it, which is needed when it is unlocked.

Unlocking is done through a Lua script, not simply deleting the Key directly, or it may release the lock that someone else holds:

If Redis.call ("Get", keys[1]) = = Argv[1] then return Redis.call ("Del", Keys[1]) else return 0end

The value of this argv[1] is the value of the My_random_value when the lock is in front.

If you need better fault tolerance, you can create a cluster with n (n odd) independent Redis redundant nodes, in which case a client obtains the lock and release lock algorithm as follows:

Gets the current timestamp timestamp_1, in milliseconds.
The same Key and random values, sequentially from N nodes to obtain the lock, each acquisition of a lock set a time-out, the time-out period to ensure that the lock is less than the automatic release time on all nodes, so as not to take too long on a node, usually set relatively short.
The client calculates the total time to acquire the lock by subtracting the current timestamp from the timestamp in the first step timestamp_1. The client is considered to have successfully acquired a lock only if the client has acquired more than half of the locks of the nodes and the total time is less than the lock lifetime.
If a lock is obtained, its survival time is the start of the preset lock survival time minus the total elapsed time of the acquisition lock.
If the client cannot obtain the lock, it should be unlocked on all nodes immediately.
If you want to retry, retrieve the lock again after a random delay.
The client that acquired the lock will release the lock and simply unlock it on all nodes.

The Redlock algorithm does not need to ensure that the clocks between Redis nodes are synchronous (whether physical or logical), which differs from some traditional distributed locking algorithms based on synchronous clocks. The specific details of the Redlock algorithm can be found in the official Redis documentation, as well as the implementation of the various language versions listed in the documentation.

Election algorithm

In a distributed system, often some transactions need to be completed in a certain time period by a process, or by a process as a leader to coordinate other processes, this time need to use the election algorithm, the traditional election algorithm has bullying election algorithm (overbearing election algorithm), the ring election algorithm, Paxos algorithm, Zab algorithm (ZooKeeper), some of these algorithms rely on the reliable delivery of messages and clock synchronization, some too complex, difficult to implement and verify. The new Raft algorithm is much easier to compare than other algorithms, but it still relies on heartbeat broadcasts and logic clocks, leader need to continuously broadcast messages to follower to maintain dependencies, and additional algorithm mates are required for node expansion.

Election algorithms and distributed locks are somewhat similar, with at most one leader resource at any time. Of course, we can also use the previously described distributed lock to implement, set up a leader resource, get this resource lock for leader, the life cycle of the lock after, and then re-compete for this resource lock. This is a competitive algorithm, this method will lead to more than the gap period without leader, and not to achieve leader re-election, and leader re-election is a relatively large advantage, such as leader perform tasks can be compared on time, It's also a lot easier to view logs and troubleshoot problems, and if we need an algorithm that leader can be re-elected, you can use this approach:

Import redisrc = Redis. Redis () Local_selector = 0def Master (): global Local_selector master_selector = RC.INCR (' master_selector ') if Master_ selector = = 1: # initial/restarted local_selector = master_selector else:if local_selector > 0: # I ' m the master BEF Ore if Local_selector > Master_selector: # Lost, maybe the db is fail-overed. Local_selector = 0 Else: # continue to being the master Local_selector = master_selector if local_selector > 0: # I ' m the Current Master Rc.expire (' master_selector ', +) return local_selector > 0

This algorithm encourages re-election, and only if the current leader fails or the time taken to perform a task exceeds the term of office, or if the Redis node fails to recover, the new leader will need to be re-elected. In Master/slave mode, if the master node fails, a Slave node is promoted to the new master node, even if the Master_selector value is not successfully synchronized, it does not cause two leader. If a leader has been re-elected, the value of Master_selector will continue to increment, considering that Master_selector is a 64-bit integer type, it is not possible to overflow in the foreseeable time, plus each time the leader is replaced the master The _selector is reset to 1, which is acceptable, but it needs to be handled when a Redis client (such as node. js) does not support a 64-bit integer type. If the current leader process is longer than the term, other processes can regenerate the new leader process, after the old leader process has finished processing the transaction, if the new leader process experiences more than or equal to the number of terms of the old leader process, There may be two leader processes, in order to avoid this situation, each leader process should check whether its processing time exceeds the term of office after processing the term transaction, if it exceeds the term of office, you should set the Local_selector to 0 before calling Master Check if you are a leader process.

Message Queuing (LIST)

Message Queuing is a communication infrastructure between distributed systems, which can be used to construct complex inter-process coordination operations and interoperability. Redis also provides a primitive for constructing Message Queuing, such as the Pub/sub Series command, which provides a subscription/release mode-based messaging method, but the pub/sub message is not persisted within Redis, and thus is not persistent. Applies to scenarios in which the transmitted message is not relevant even if it is lost.

If you want to consider the persistence, you can consider the list series Operation command, with the Push series command (Lpush, Rpush, etc.) push the message to a list, with the POP Series command (Lpop, Rpop,blpop,brpop, etc.) to get a list of messages, by not The same combination can be fifo,filo, such as:

Import redisrc = Redis. Redis () def fifo_push (q, data): Rc.lpush (q, data) def fifo_pop (q): Return Rc.rpop (q) def filo_push (q, data): Rc.lpush (q, DAT A) def filo_pop (q): Return Rc.lpop (q)

If you replace Lpop with the Blpop,brpop command, Rpop also supports blocking waits when the list is empty. However, even if the persistence is implemented in this way, if the network failure occurs when the POP message returns, the message loss will still occur, and for this requirement Redis provides the Rpoplpush and Brpoplpush commands to save the extracted message in a second list first. The client can view and process the message data from this list first, and then delete the message data from the list to ensure that the message is not lost, as shown in the following example:

def safe_fifo_push (q, data): Rc.lpush (q, data) def safe_fifo_pop (q, cache): msg = Rc.rpoplpush (q, cache) # Check and do som Ething on MSG rc.lrem (cache, 1) # Remove the MSG in the cache list. Return msg

If you use the Brpoplpush command instead of the Rpoplpush command, you can block the wait when Q is empty.

Conclusion

Using Redis as a shared memory for distributed systems, the Distributed system coordination technology is based on shared memory mode, although it is not a solid theoretical basis for traditional messaging-based technologies, but it is a simple and practical lightweight solution in some less demanding situations. After all, not every system requires strict fault tolerance requirements, and not every system will frequently occur process anomalies, and Redis itself has withstood the industry's many years of practice and test. In addition, the use of Redis technology has some additional benefits, such as in the development process and production environment can be directly observed in the lock, the content of the queue, the implementation of the time does not require additional special configuration process, etc., it is simple enough to debug the problem when the logic clear, troubleshooting and temporary intervention is also more convenient. In terms of scalability, it is also better to dynamically scale the number of processes in a distributed system without having to pre-order the number of processes in advance.

Redis supports clusters based on key-valued hashes, and it is recommended to deploy a dedicated Redis node (or redundant Redis node cluster) for use in clusters where the technology described in this article is used, because in a cluster based on the hash value of the key value, the different key values are distributed to Different cluster nodes, and support for Lua scripting is also limited, and it is difficult to guarantee the atomicity of some operations, which is something to consider. The advantage of using a dedicated node is that the data volume of the dedicated node is much less, when the Master/slave deployment or AOF mode is applied, because of the small amount of data, the synchronization between master and slave is much less, and the AOF mode writes to disk much less data. This can also greatly improve usability.

Implementation of distributed system lightweight coordination technology using Redis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More