If the cache fails, an instant large number of requests may have direct access to the database, how should I handle it at the code level?

Source: Internet
Author: User
Keywords Cache data return database write
Recently, I have been asked this question, I personally do not have the practical experience in this field. My personal idea is that it takes some time to access the database and write to the cache. May result in an earlier partial request to read the database directly, when this part of the data is written to the cache, to determine whether the cache exists, does not exist to write, the presence is not written, and returns the result.
if ($cache) {
return $cache;
} else {
$data = Read database;
if (! $cache) write $cache $data;
return $data;
}

But after thinking, that the answer does not seem to answer the question of multiple requests to read the database at the same time, although you can block the late request direct access to the database, but there is more than the previous link directly to access the database. I wonder if you have a better solution. For advice

Reply content:

The answer is in the question.

"If the cache fails"

The cache can be designed to be persistent at design time, avoiding the buffer being penetrated because of failure. If there is a higher requirement for stability, work on the cache for disaster preparedness. such as Redis master-slave mode, RDB dump and so on.

If the system or the feature that involves the cache is first on-line, you need to preheat the cache in advance. There are many ways to write scripts or to publish in grayscale.


"A large number of requests will access the database at a moment"

Here are two questions, one is how to deal with a large number of requests in a moment, and the other is how to avoid the pressure on the database by so many requests.

How to implement also depends on the business scenario, but the solution is to avoid the sudden large number of requests, and then a large number of requests generated do not let the database pressure too much.

The method of current limit depends on the business scenario, not all scenarios are suitable for current limit so there is not much to say here. So here is to limit the maximum number of connections back to the database itself to accept the maximum number of connections. As long as the request database design is reasonable, even if there is a high instantaneous concurrency generally will not cause any practical problems.

So how to solve this problem from the code level depends on how you design the system. Second-level cache. Put the data in a key that has a longer expiration time. When the avalanche lock, ensure that only one PHP process access to the database, the rest of the see Lock is no longer accessible, directly return the data in the cache. The effect is that although several people do not see the latest data, then brush it. Grab the lock to see the latest data. The lock can be implemented with the add of the MC. When I was in the big slag, this thing carried the NBA live, C good dozens of K, the database is OK, the bandwidth is the problem Facebook spared a paper "Scaling Memcache at Facebook" to discuss this issue:

3.2.1 Leases
We introduce a new mechanism we call leases to Addresstwo Problems:stale sets and thundering herds.

Which "thundering herds" is the landlord mentioned the database penetration problem, a hot cache if the failure of the first access to the database request to get results write cache, the period of a large number of requests to wear to the database, and then "stale set" is a data consistency problem, If an instance updates the data to refresh the cache while another instance reads Miss attempts to read the database, the two cache write order is not guaranteed and may result in stale data being written to the cache.

Both of these issues are intrinsic to the look-aside cache and need to provide a mechanism to coordinate the write of the cache, and this paper gives the lease mechanism to limit the ability of a key to be written to the cache at the same time only if the client has a unique lease:
    • If get a key read miss, return the client a 64-bit lease;
    • Then the key before writing, if you receive a GET request, will return a hot Miss error, the client based on it to determine that they want to retry later, instead of reading data to the database;
    • If the key receives a delete request, it will invalidate the lease, and the set request that holds the failed lease will still succeed, but then the GET request will get a hot miss error and carry a new lease; here's the hot Miss error with the last value, but think It is in the stale state, left to the client to determine whether to adopt it, in the case of inconsistent conformance requirements can further reduce database requests;

This allows the Memcache server to coordinate access to the database to address these two issues.

But the lease scheme is not perfect because 1. Need to change memcache;2. Still leaking logic to the client, requiring the client to follow the conventions of lease and hot miss.

Tao:facebook's distributed Data Store for the social Graph, which is in the back of Facebook, describes one of the problems the TAO system tries to address:

Distributed Control Logic: In a lookaside cache architecturethe control logic was run on clients that don ' t co Mmunicatewith each of the other. This increases the number offailure modes, and makes it difficult to avoid thunderingherds. Nishtala et al provide an in-depth discussion ofthe problems and present leases, a general solution [21]. For objects and associations the fixed API allows us tomove the control logic into the cache itself, where theproblem can Be solved more efficiently.

In other words, we do not have to look aside cache, if the cache of the modified portal package, go to write though cache, there is no need to distribute to coordinate all the clients, in one place queued enough.

References
    • Scaling Memcache at Facebook
    • Tao:facebook ' s distributed Data Store for the social Graph
    • https://www. quora.com/how-does-the- lease-token-solve-the-stale-sets-problem-in-facebooks-memcached-servers
According to the actual scenario selection scheme, the request processing of accessing the database using queue and locking the same request is a more secure solution.
This problem in the industry has a noun, called the avalanche effect, that is, the cluster in the normal load when there is no problem, once several servers collapsed, the server overload pressure to the back end of the database, it will cause the whole cluster like an avalanche complete collapse.
read more and write less business scenarios
For this kind of problem, the cache block in front of the database server is required, that is, one of the solution of the problem, according to the conditions of SQL query to do lock processing at the cache server level, the same request is placed only one to the back-end database, other requests block waiting for data updates.
The second is to block the peak of these cache failures, do not allow direct connection to the database for query, the cache server to the database initiated by the request processing all use the queue, the peak allocation to a longer period of time to avoid the impact of the backend database.
In addition, in order to avoid a large number of cache failures and re-generation of these caches again, the cache server data cache time needs to go to a time token distribution server request tokens, based on the basic cache time, plus a token cache time, the peak allocation to a longer time.
Such processing can prevent the database load from being overwhelmed by peak requests.
write to a larger business scenario:
When the data is updated frequently and the real-time requirements are high, such as the live room scene, snapping up the scene, the cache failure time is very short. Business segmentation needs to be considered, routing the same business to different servers and independently. Allocate the write load to a single point cluster as much as possible. Technology above the optional strategy and business correlation is very large, similar to the scene of the live room, even do not need to go to the database server, the business is in the cache above the turnover, persistent to the database of the business is going to take the asynchronous throw queue slowly processing. E-commerce to buy this kind of business must be verified is not the same, but the idea is the same, the second to kill business instantaneous come in too large, then add verification code plus queue, in short, as long as possible to delay the submission of the order time, the instantaneous load allocation to the system can accept the time period, the database level is still locking and queue However, in the team on the team also need to make a lock, once the queue length of more than a multiple of the total number of goods (in particular, according to the requirements of the past order payment success rate calculation), then lock the subsequent request block no longer the queue.
There are a lot of writing business is very extreme scenarios, similar to the location of LBS product updates, SNS product message push, the general idea is similar, lock and queue, database level as far as possible to write the performance of NoSQL to assume these persistent needs. Otherwise, minimize the use of the index, such as MySQL when the Key-value database in the form of, as far as possible to reduce the write performance consumption increased. In DB monitoring or interface layer processing high concurrency is actually a distinction between macro and micro, you understand the concurrency is the same time multiple requests read the database, but in fact, the request for the computer must not be the same moment, but the macro is at the same time. For example, between 1-10 milliseconds or microseconds, how does the computer handle it? Frankly, it is possible to do a separate process for computing operations, IO operations to allocate processor tasks, memory allocation, such as when you are reading the database at the same time, there are other IO operations at the same time, then you can also simulate the computer this design.

For example, when you write the cache, push the write operation to a separate process, write a cache queue, and then do a detection process to organize the queue processing order and failure branches, such as early and back row, such as the write cache operation failed, then the next task in the queue or try again. All successful write cache data is pushed back into the cache pool, so the queue retains all the tasks of updating the cache, and does not block the main process, that is, the main process that the user accesses is not blocked. They just read the cache pool data and don't care about aging. If you write slowly to the old, even the database hangs, will not affect the user to read the data.

The concept of concurrency and think about it all in a relatively time-to-look. The computer will handle itself, you don't have to worry too much, business code only need to solve the problem of blocking is OK.

PS: I am a front-end, big New Year boring nonsense ... Fold it.

To add, if the cache fails, the problem is to solve the cache disaster tolerant approach, such as multiple cache pool fast switching, load balancing and so on. Super-large number of requests directly access the database, this can be multi-machine multi-database, multiple processes to read, to solve. I think from the code level is not very good to do ...

Because the solution is to expand the allocation of requests to multi-machine, not real-time expansion and want to support high concurrent large requests is not possible ah ... The light lets the donkey run not to eat the grass, thinks how to give the donkey beauty not to use AH ...

-------

Professional this avalanche after the cache penetration directly to the DB layer, the actual first hanging is the bandwidth, DB will not be the bottleneck, after the resolution can use the maximum number of connections to control the read database, over the use of Level 2 cache to do, after the warning, or dynamic expansion mechanism to solve. (Consulting back-end colleague) throttle solution is in the book, in (the title forgot, the cover of the car on the front) that little book was introduced this question, and then proposed several kinds of solutions. Three books, operating system, database principle, distributed system. It's an old question.
1, assume the single-machine buffer. The lock mechanism is fine, because your problem is reading data, not anything else. Whether it's multi-process or multi-threading, locking is available.
2. Use buffer pool to reduce waiting delay.
3, the above is the contents of the operating system.
4, if there are updates at the same time, to consider the problem of data consistency, that is pessimistic lock or optimistic lock. 1, the service itself to provide overload protection, fuse mechanism, in the final analysis is limited to flow, to know how to protect themselves
2, the cache when rebuilding to lock, cache mutex
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.