Redis setnx for distributed and stand-alone locks

Source: Internet
Author: User
Tags current time lua redis require serialization zookeeper mysql database redis cluster
Corresponds to the given keys to their corresponding values. As long as a key already exists, MSETNX will not execute an operation. Because of this feature, MSETNX can implement either all operations succeed or none, which can be used to set different keys to represent different fields of a unique object.

In Redis, the so-called SETNX, is the abbreviation of "set if not exists", that is, only when there is no time to set, you can use it to achieve the effect of the lock, but many people do not realize that setnx have traps.


For example: A query database interface, because the call volume is large, so add a cache, and set the cache expires after the refresh, the problem is when the concurrency is larger than the time, if there is no lock mechanism, then cache expired instantaneous, a large number of concurrent requests will penetrate the cache directly query the database, causing avalanche effect, if there is a lock mechanism, Then you can control only one request to update the cache, and the other requests will either wait for the situation or use an outdated cache.

The following is an example of the most popular Phpredis extension in the PHP community today, implementing a demo code:

<?php

$ok = $redis->setnx ($key, $value);

if ($ok) {
$cache->update ();
$redis->del ($key);
}

?>
When the cache expires, the lock is acquired through SETNX, and if successful, the cache is updated and then the lock is deleted. It seems that logic is very simple, but there is a problem: if the request execution exits unexpectedly for some reason, causing the lock to be created but the lock is not deleted, the lock will persist so that the cache is never updated again. So we need to add an expiration time to the lock safekeeping:

<?php

$redis->multi ();
$redis->setnx ($key, $value);
$redis->expire ($key, $ttl);
$redis->exec ();

?>
Because SETNX does not have the ability to set the expiration time, we need to use the Expire to set up, and we need to wrap the two with multi/exec to ensure the atomicity of the request, so that setnx success Expire failed. Unfortunately, there is a problem: When multiple requests arrive, although only one of the requested SETNX can be successful, but the Expire of any request can be successful, this means that even if the lock is not acquired, the expiration time can be refreshed, if the request is more dense, then the expiration time will be refreshed, Causes the lock to remain valid. So we need to implement Expire conditionally while guaranteeing atomicity, and then we have the following Lua code:

Local key = Keys[1]
Local value = keys[2]
Local TTL = keys[3]

Local OK = redis.call (' setnx ', key, value)

If ok = = 1 Then
Redis.call (' Expire ', Key, TTL)
End

return OK
It was a bit of a hassle to realize that a seemingly simple feature would need a Lua script. In fact, Redis has taken into account the plight of everyone, starting from 2.6.12, set covers the function of Setex, and the set itself already contains the function of setting expiration time, that is to say, the function that we need before can be implemented only with set.

<?php

$ok = $redis->set ($key, $value, Array (' NX ', ' ex ' = + $ttl));//Note This controls the non-existent only setting and the time-out

if ($ok) {
$cache->update ();
$redis->del ($key);
}

?>
Is the code perfect? The answer is almost as yet. Imagine if a request to update the cache for a long time, or even longer than the duration of the lock, resulting in the cache during the update process, the lock is invalidated, the other request will acquire the lock, but the previous request when the cache is updated, if you do not judge to delete the lock directly, There is a case where the lock created by the other request is mistakenly deleted, so we need to introduce a random value when creating the lock:

<?php

$ok = $redis->set ($key, $random, Array (' NX ', ' ex ' = + $ttl));

if ($ok) {
$cache->update ();

if ($redis->get ($key) = = $random) {
$redis->del ($key);
}
}

?>

So basically realize the single-machine lock, if you want to achieve the distribution lock, please refer to: Distributed locks with Redis, here is not in-depth discussion, summary: Avoid falling into the setnx trap the best way is never to use it









The so-called concurrency control means that the system must be able to control the interaction between concurrent operations and correctly coordinate the execution of concurrent operations to obtain the correct results. If the operation is executed serially, there is no conflict of concurrency execution, so the nature of the concurrency control mechanism is to serialize the conflicting concurrency operations somewhere.

Refund system using rogue node peer cluster structure, unified external provision of refunds business, all nodes are the same role, can independently complete a single refund request, therefore, can not be managed by the management node to perform unified scheduling, to achieve concurrency control, only by using protocols or rules to ensure serialization, called Concurrency control Rules, If this rule is adhered to by each concurrent operation, it will ensure that all participating concurrent operations are serialized. 2.1 Distributed Lock Service

When it comes to concurrency control, the first thought is of course the lock service. The refund system is a cluster system, facing the lock problem between multiple nodes, so what is needed is a lock service that can be used by different nodes of the cluster, that is, distributed lock service.

In addition, the current refund system for a single refund request is synchronous processing, if unable to acquire the lock need to return immediately, that is, can only use non-blocking mode lock (non blocking lock), so that the longer wait for the refund call time-out, through the refund process script to complete the subsequent work asynchronously. 2.1.1 Implementation of distributed lock service with Redis

Redis is an open source, Key-value database that is written in C, supports networks, and can be persisted based on memory. It does not have the concept of locking itself, using its single-process single-threaded structure, using the queue mode to turn concurrent access into serial access, thereby implementing distributed lock service. 2.1.1.1 Implementation Principle

The Redis setnx command (SET if not eXists) can be used as the Ka Yuan language (locking primitive). The SETNX key value command sets the value of key to value if and only if key does not exist, and if the given key already exists, then setnx does nothing, returns 1 successfully, and returns 0 for failure.

For example, to lock a resource (key) bank_id_4001, the refund node can try the following ways:

Setnx bank_id_4001 <current Unix time + lock timeout + 1>

If Setnx returns 1, indicating that the refund node obtains the lock, the timestamp of the key setting specifies the time that the lock expires, that is, the time-out, after which the lock can be released via del bank_id_4001.

If Setnx returns 0, the key is already locked by the other node. 2.1.1.2 processing deadlock (deadlock)

The lock logic above is a problem: if a refund node holding a lock fails, crashes, or executes timeout, it cannot release the lock.

This can be determined by the time-out timestamp of the key, if the current timestamp is greater than the timeout timestamp of the key value, indicating that the lock is invalidated and can be reused.

However, when this happens, it cannot be simply brutally del deadlock key, then locked with SETNX, because the race condition (race condition) has been formed when multiple nodes are simultaneously detecting whether a lock is out of date and attempting to release it:

1. Request 0 holds the lock, but crashes.

2. Request 1 and request 2 send get key to get the timestamp, check that the discovery has timed out.

3. Request 1 to send Del key.

4. Request 1 sends Setnx key and succeeds, request 1 to obtain the lock.

5. Request 2 to Send Del key.

6. Request 2 sends Setnx key and succeeds, request 2 to obtain the lock.

Because of the race condition relationship, request 1 and request 22 nodes all acquired the lock. For this issue, you can avoid the following actions:

1. Request 1 sends Setnx key to get the lock, because request 0 also holds the lock, so Redis returns 0

2. Request 1 sends a GET key to check if the lock timed out and returns the call if no timeout has been made.

3. Conversely, if it has timed out, request 1 continues to try to acquire the lock by doing the following:

Getset Key <current Unix time + lock timeout + 1> This command sets the value of the given key to value and returns the old value of key (the original), which is when the key does not have an old value, that is, when the key does not exist. Nil

4. Through Getset, request 1 to get the timestamp if it is still timed out, that means that the request 1 get the lock.

5. If there is an additional request before request 1 to perform the above operation faster than the request 1, then request 1 gets the timestamp is not expired, at this time, request 1 did not obtain the lock as expected. Note that although request 1 does not get the lock, it overwrites the timeout timestamp of the key that was set by the other request, but the effect of this very small error can be negligible.

Note: In order to make the algorithm of the distributed lock more stable, the node that holds the lock should check once again whether its lock has timed out before unlocking, and then do the Del operation, because the node may be suspended because of a time-consuming operation, when the lock is finished, because the timeout has already been acquired by others, it does not have to be unlocked. 2.1.1.3 Advantages and disadvantages

Advantages: Simple implementation, direct deployment of Redis can be used, scale-out is easy, you can arbitrarily increase the resource key, simply store the key-value pair in the cache.

Disadvantage: Redis can only be deployed on a single machine, and failure will cause the entire lock service to be unavailable. If it is deployed as a peer cluster structure, additional mechanisms are needed to ensure the data consistency of the Redis cluster, which increases the difficulty of implementation. 2.1.1 Implementation of distributed shared locks with the uniqueness of zookeeper node names

Zookeeper is a reliable coordination system for large distributed systems based on the Google Chubby principle, which includes: Configuration maintenance, name services, distributed synchronization, group services, etc. 2.1.1.1 implementation principles

The Zookeeper abstract node structure is a small, tree-like directory structure similar to the Unix file system, and stipulates that there can be only one unique file node name in the same directory. For example: Two clients want to create a node with key bank_id_4001 in the/lock directory of zookeeper, and only one succeeds.

There is also a special node in zookeeper: a temporary node that is created by a client that is automatically deleted when the client disconnects from the zookeeper cluster.

The distributed lock service can be implemented with the zookeeper name uniqueness and temporal node features:

1. The client creates a key child node under a lock directory with the type ephemeral (temporary node).

2. If the creation succeeds, the lock is obtained, and the node is deleted after processing is completed to unlock it.

3. If the creation fails to indicate that it has been acquired by another request, the call is returned. 2.1.1.2 Advantages and disadvantages

Advantages: Simple implementation, deployment can be used, its correctness and reliability is guaranteed by the zookeeper mechanism; there is no single point of redis problem; When the refund node that gets the lock crashes or goes down, no deadlock problem occurs and the temporary node is automatically deleted. Scale-out is easy, and you can create different sub-nodes based on different keys.

Disadvantage: Need to add additional service cluster, increase the implementation cost; no timeout mechanism, the refund node that gets the lock cannot hold the lock for a long time, otherwise it can cause other requests that need to use the resource to be delayed. 2.1.2 Using a database for shared locks 2.1.2.1 Implementation Principle

The InnoDB engine of the MySQL database has a row lock feature that enables the distributed lock service: First, create a table with a lock identity in the database, and the SQL statement that establishes the table is as follows:

CREATE TABLE Mbs_locks

(

Lock_key VARCHAR2 (+) not NULL,

Imary KEY (Lock_key)

After the table is created, you can insert some records before use for a fixed channel ID and merchant ID key, and for a transaction note key, you cannot be informed exactly before the refund, only the record is inserted in the refund request execution. Note that the wallet system guarantees that the same type of key does not duplicate the problem, but in order to prevent the different types of key duplication, you can add some special types of strings in front of the key, such as trans_id,band_id and so on, when the node needs to acquire a lock, first go to the table to query operation related key corresponding lock, The SQL that executes the query is like

SELECT * from mbs_locks where t.lock_key= ' trans_id_xxx ' for update; If the execution succeeds, the lock is obtained, and if the execution fails, the lock is fetched, and the call is returned. 2.1.2.2 Advantages and disadvantages

Advantages: Simple and convenient, can be built on the existing system, no need to add additional system implementation, the refund system for less changes, low implementation costs; scale-out is easy, and when you add a key, you only need to add a record to the database.

Disadvantage: For a large database access, especially in accordance with the transaction orders for concurrency control, almost every transaction of a partial refund will need to access the database. Since the online database does not support delete operations, it is necessary to periodically delete the already refunded transaction order key, since the corresponding transaction order key will not be used again after the refund is completed. 2.2 Distributed Queues

Since the nature of concurrency control is the serialization of concurrency conflict Operations, the FIFO queue is certainly a good choice, similar to the lock service, where the refund system still needs a distributed queue that can be used between clusters 2.2.1 distributed queues with zookeeper sequential nodes 2.2.2 Implementation Principle

There is a node in the zookeeper called the sequential node, so the name Incredibles, if you create 3 sub-nodes in/queue_fifo/key directory, zookeeper cluster will create the nodes in the order they were created, the nodes are/queue_fifo/key/ 0000000001,/queue_fifo/key/0000000002,/queue_fifo/key/0000000003.

With the characteristics of this sequential node, a distributed queue can be implemented: the refund node receives a refund request, creates a child node under the corresponding queue directory, type sequential (sequential node) to ensure that all members are numbered when they join the queue. The refund node consumption process gets all the child node elements under each queue directory. If a child node is present, the request with the smallest number of nodes is consumed and the child node is deleted to ensure the FIFO, or wait if it does not exist.

Note: The Refund cluster node common consumption distributed queue, in order to achieve global interval send, need to add a time field for each request child node, to mark the execution time of the previous request, whenever a request is taken from the queue, the current time is recorded in the next Request Time field, Enables serialization of concurrent conflict operations. 2.2.2.1 Advantages and disadvantages

Advantage: The refund system, in addition to concurrency control, also requires the implementation of asynchronous processing. A single synchronous refund request that involves too many RPC calls during execution takes a long time and requires that the synchronization request be returned immediately after execution to a certain stage, and that the subsequent work is guaranteed to be done asynchronously within the refund system. As a result, asynchronous processing can be implemented using distributed queues and concurrency control is possible. For refund requests that do not require concurrency control, they are added to the normal queue, and requests that require concurrent control are added to the interval send queue.

Cons: Need to add additional service clusters, increase the implementation cost, not easy to scale out, increase the resource key, you need to create a new queue directory. three PostScript

The current concurrency control in the refund business is still in the research phase and is inconclusive. The above mentioned four kinds of concurrency control mechanisms have pros and cons, need to consider a variety of factors in order to formulate a specific implementation plan. The following table compares four scenarios only from the ease of deployment, cost, reliability, and other factors that are relatively heavy:

concurrency control Implementation Mechanism

Ease of deployment

Deployment costs

Is easy to scale out

Reliability

Degree of impact on existing systems

Redis Distributed Lock Service

Smaller

Smaller

Easy

More reliable

Small

Zookeeper Distributed lock Service

Larger

Larger

More easily

Reliable

Small

Database Distributed Lock Service

Small

Small

Easy

Reliable

Smaller

Zookeeper Distributed Queue Service

Larger

Larger

More difficult

Reliable

Large (while meeting the needs of Asynchrony)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.