Some ideas on how to deal with high concurrency query db due to memcached cache invalidation

Last Update:2018-07-26 Source: Internet

Author: User

Tags current time memcached dataloader

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently saw the Nginx merge back source, this and the following ideas a bit like. However, Nginx's idea is to control concurrent requests when the cache fails, rather than updating the cache in a timely manner when the cache is about to expire.

Nginx Merge back source, reference: http://blog.csdn.net/brainkick/article/details/8570698

Update:2015-04-23

======================

When the memcached cache fails, it is prone to high concurrent query db, resulting in a sudden rise in db pressure.

This blog focuses on how to update the cache in a timely manner when the cache is about to expire, rather than how to prevent high-concurrency db queries after the cache is invalidated.

personally, when the cache is about to expire, the new data will be promptly brushed into the memcached, which is the best way to solve the cache failure moment high concurrency check db. So how to know in time that the cache is going to expire.

There are several ways to solve this problem:

For example, a key is AAA and the expiration time is 30s.

1. Periodically check the data from DB and then swipe into memcached

One drawback of this approach is that some business keys may be variable and indeterminate.

And it's hard to define what data should be queried and put into the cache, which makes it difficult to distinguish hot and cold data.

2. When the cache is null, lock to query db, allowing only one thread to query the DB

This method is not very reliable, not much discussion. And if there are multiple Web servers, there is still the possibility of concurrent operations.

3. When writing value to memcached, write the current machine at the time as the expiration Time

When get gets data, if the current time-expiration time > 5s, a task is started in the background to query the DB and update the cache.

Of course, the background task here must guarantee the same key, only one thread is executing the task of querying DB, otherwise this is still high concurrency query db.

The disadvantage is to serialize the expiration time and value together, and then deserialize the data after it is fetched. It's not convenient.

Most of the online articles refer to the previous two ways, and a few articles refer to the 3rd way. A method based on two keys is presented below: 4. Two keys, one key for storing data, and another to mark the expiration Time

For example, the key is AAA, set the expiration time is 30s, then another key is EXPIRE_AAA, the expiration time is 25s.

When fetching data, use Multiget, and take out AAA and EXPIRE_AAA, if expire_aaa's value = = NULL, then the background starts a task to query DB, update the cache. Similar to the above.

For the background to start a task to query the DB, update the cache, to ensure that a key only one thread is executing, this how to implement.

For the same process, simply lock. Get the lock to update the DB, do not get the direct return of the lock.

For clustered deployments, how to implement only one task execution is allowed.

Memcached's add command will be used here.

The Add command is set successfully if no key exists, returns true if a key already exists, does not store, and returns false.

When get EXPIRED_AAA is null, the add EXPIRED_AAA expiration time is handled flexibly by itself. For example, set to 3 seconds.

If successful, then go to query the DB, after the data, then set EXPIRED_AAA for 25 seconds. Set AAA is 30 seconds.

To sum up, to comb the next process:

For example, a key is AAA and the expiration time is 30s. Query db is within 1s.

When put data, set AAA expiration time of 30s, set EXPIRE_AAA expiration time of 25s; get data, multiget AAA and EXPIRE_AAA if EXPIRED_AAA corresponds to value! = NULL, Return the AAA corresponding data directly to the user. If expire_aaa returns value = = NULL, the background starts a task, attempts to add expire_aaa, and sets the timeout over between 3s. This is set to 3s to prevent background tasks from failing or blocking, if this task fails, then after 3 seconds, if there is another user access, then you can try to query db again. If add succeeds, query db, update the AAA's cache, and set the EXPIRE_AAA time-out to 25s. 5. Time is stored in value and combined with the add command to ensure that only one thread refreshes the data

update:2014-06-29

The question has been re-thought recently. Finding the 4th two key method is more memcached memory, because the key number is doubled. Combined with the 3rd way, redesigned the next, the idea is as follows:

Scenarios that still use the two key:

Key

__load_{key}

where__load_{key} This key is equivalent to a lock , only allow the add successful thread to update the data, and This key timeout time is relatively short, does not always occupy memcached memory .

In the set-to-memcached value, add a time, the time, value, when the key on the memcached expires in the future, not the current system time. When get to data, check whether the time is about to timeout: Time-now < 5 * 1000, assuming that the time to timeout is set to 5 seconds.

* If yes, a new thread is started in the background:
* Try add __load_{key},
* If successful, the new data is loaded and set to memcached.

* The original thread returned value directly to the caller.

According to the above thinking, with xmemcached encapsulated the following:

Dataloader, the callback interface that the user wants to implement to load the data:

Public interface Dataloader {public
	<T> T load ();
}

Refreshcachemanager, the user only needs to care about these two interface functions:

public class Refreshcachemanager {
	static public <T> T Tryget (memcachedclient memcachedclient, final String key , final int expire, final Dataloader Dataloader);
	static public <T> T Autoretryget (memcachedclient memcachedclient, Final String key, final int expire, final dataload Er dataloader);
}

Where the Autoretryget function if get to IS null, the internal will automatically retry 4 times, each interval 500ms.

Refreshcachemanager internal automatic processing of data fast expiration, re-refresh to memcached logic.

The detailed encapsulation code is here: HTTPS://GIST.GITHUB.COM/HENGYUNABC/CC57478BFCB4CD0553C2

Summary:

I am personally inclined to the 5th way, because it is very simple and intuitive. Save memory than the 4th way, and without mget, don't worry about trouble when using memcached clusters.

One of the advantages of this two-key approach is that the data is naturally hot and cold adapted. If the data is cold and no one is accessing it for 30 seconds, the data expires.

If it is a hot data that has been accessed for a long time, then the data is always hot and the data will never expire.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More