[Original] experiences in microapplication of cache in Distributed Systems (iv) [interactive scenarios]

Last Update:2018-10-29 Source: Internet

Author: User

Tags aliyun

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to cache micro-Application Experience in Distributed Systems (iv) [interactive scenarios]

Preface

I have been busy with things in recent months, and I have never been idle for almost years. Busy in the middle of the fall of 2018, I have to sigh that time is always blank, and I do not know what I have gained or lost. I recently took a break and bought two books unrelated to technology. One was the high mountains of Portugal written by Yann Martel ), I found that reading this book requires some patience. It is a deep metaphor for my life and I have enough white space. If you are interested, you can take a look. Now let's get back to the question and try to write some practical experience and thoughts on the caching technology in the work.

Body

In distributed Web programming, the key technologies to solve high concurrency and internal decoupling cannot be separated from the cache and queue. The cache role is similar to the cache at all levels of CPU in computer hardware. Today's Internet projects with a slightly larger business scale will be reserved even during the initial beta development. However, in many application scenarios, there are also some high-cost technical problems that need to be carefully weighed. This series focuses on the server cache-related technologies in distributed systems, and will also discuss with friends the details of their thoughts. If there is anything wrong with this article, please correct me.

　　In order to facilitate independent writing, forgive a little bit of personal obsessive-compulsive disorder in content formatting.

The fourth article is intended to be the final part of the series. Here we will talk about some of the cache's concurrent interaction scenarios, including the interaction with databases (RDBMS in particular, and some independent supplementary processing solutions for high concurrency scenarios (if specific applications are involved, we will mainly use redis as an example ).

For more information, see distributed system cache micro-Application Experience (III) (Data sharding and cluster)
Https://yq.aliyun.com/u/autumnbing)
Https://www.cnblogs.com/bsfz)

　　I. briefly discuss the interaction process between cache and database

To facilitate subsequent discussions, the database mentioned in this Article refers to the traditional RDBMS, which uses the DB identifier and must be different from the DB partitioning space in the cache.

In my previous article on cache design details, I explained some specific details about the cache's own curd. Here I will combine the database, we will discuss the parallel curd operations between DB and cache. Of course, at the interaction level, this will certainly involve consistency topics related to distributed transactions (Distributed Transaction). However, to avoid fuzzy expressions and unnecessary boundary amplification, here I try to strip it apart and focus on Cache-based processing.

A basic scenario is abstracted in advance: There is a capital Association Table (FT) in the DB, where the FT stores hotspot entries (which are frequently accessed data). During system design, the data in ft will be associated with the corresponding cache service C1 for storage (here only refers to a level-1 cache) to improve the concurrent query performance.

　　　1.1 add (create) A data record to ft

Insert a piece of data to ft through SQL: If the insertion fails, you do not need to perform any operations on C1. If the insertion is successful, you need to determine whether to insert data synchronously in C1.

This scenario is generally relatively simple. If there is no special case, you do not need to perform active insertion of C1 at the moment, but will still perform passive insertion (as will be mentioned later ). However, if you only delete the data inserted in ft, And the FT data is often operated in batches, you are advised to insert C1 synchronously.

(PS: By the way, if you want to insert data to C1 but insertion fails, add a Retry Mechanism Based on the business scenario. The subsequent operations on the cache contain this potential action. If the retry fails, for example, if you insert a piece of data into C1, we recommend that you stop over-processing. By default, the overall operation is successful and the corresponding status is returned. Do not make a mixed analogy with the consistency of distributed transactions .)

　　　　1.2 prepare to update a piece of data

When you need to update a piece of data in ft, it means that the data in C1 is invalid, but in a high concurrency environment, C1 cannot be directly updated in a unified manner. The first thing to consider is whether the C1 data is actively updated or passively updated. After the FT is updated, the data is overwritten into C1, passive update means that data in C1 is eliminated immediately after ft is updated and re-written to C1 after the next query.

As long as the above request action has any concurrency, such as two identical actions, Action 1 and Action 2 both have requests at the same time, there will be an inconsistent problem: Action 1 first operates ft, after Action 2, ft is operated. Then Action 2 first operates C1, and Action 1 then operates C1.

When more than one thread concurrently updates ft data, you cannot confirm whether the order of FT update is consistent with that of C1 update, the result is that there will certainly be a large amount of FT and C1 data with Phantom reads, and this probability will be greatly improved when there is a master-slave cache (see the master-slave replication section in the previous chapter ). The recommended method is to directly remove data in C1 without considering the cache overwrites for multiple times, it also takes care of the cache to fully hit (hit) when appropriate ).

In fact, this is not over yet. When we decide to eliminate C1 data, we need to select a time for elimination: one is to first update FT and then perform elimination on C1; the other is, perform elimination on C1 before updating ft.

Both methods have a suitable scenario, but here we need to weigh a probability problem: when C1 is eliminated, another query operation on C1 is concurrently executed. At this time, c1 pulls data from the database and writes it again, so the data in C1 is dirty data. When the concurrency increases, there is a higher probability that the data remains "dirty. Therefore, we recommend that you select the former.

(Note: There are some details to be discussed here that are not intended to be extended on this topic, such as the atomicity between C1 and ft, whether two-phase/Three-phase commit can be used to simulate the transaction mode and the impact on the business .)

　　　1.3 start reading (read) a piece of data

There is not much special here. After all, the purpose of applying the cache already shows that when reading data, you only need to follow the "read the cache first and then read the DB" rule ". That is, the data is first retrieved from C1. If C1 does not contain the data, the data is searched from ft. If the data still does not exist after the search is complete, the empty status is directly returned. If yes, the data is saved to C1 and the corresponding status is returned.

Some people may say that in some scenarios, even if C1 has data, it should first be obtained from Ft first. I agree. That's right, but be sure not to confuse the topic here. This is essentially a business result-based orientation, just like in the case of traditional RDBMS read/write splitting, in the verification of key data, directly request the master database to obtain and operate. Therefore, what we have mentioned above is actually not contradictory. We should be clear in our discussion and avoid confusion.

　　　1.4 delete a piece of data from ft

In contrast to create, remove a piece of data from ft through SQL: if the removal fails, you do not need to perform any operations on C1. For example, if the deletion is successful, the corresponding data in C1 will be removed (also, please compare to some details in 1.2 ).

　Ii. discuss problems related to cache penetration avalanche

From the development of the project to the later stage, some business scenarios are all in high concurrency status, and a large number of QPS have high requirements on the overall service load. To avoid the original intention of Architecture Optimization in many cases, there is also a lot of advance avoidance and detailed control in the project.

　　　2.1 optimized to prevent cache breakdown

When the query key sent by the request exists in the cache, but the data has expired at a certain time and a large number of concurrent requests are generated, the cache misses, the database will be searched in a unified manner, resulting in a sharp increase in QPS pressure in a very short period of time.

The prevention and optimization of such problems often start from two aspects: first, the lock/signal with a small granularity in the Program (last year, I wrote an article about inventory concurrency control in the mall system, there are specific topic details of the extension, see: https://www.cnblogs.com/bsfz/); the second is to DB read latency and cache write time as much as possible to minimize; the third is to take a large expiration time for data that is too popular and make it random (this is not necessary, you can weigh it yourself ). In fact, in a few cases, you can add an appropriate automatic refresh policy based on the restrictions of the scenario. Here you can also consider enabling fixed thread notification maintenance in the program.

　　　2.2 prevent high cache penetration

When the query key sent by the request is missing in the cache, it will naturally be searched in the DB. This is no problem, but if the query key does not exist in the DB, this means that each request actually falls into the database. This problem is common, and even if the concurrency is not very large, the number of DB connections can easily reach the upper limit, and it is not in line with our design to improve QPS.

The solution to this vulnerability can also start from two aspects: first, the program can directly cache null or an identifier sign when it searches for data from the database for null for the first time, at the same time, I suggest setting a small range of random expiration time to avoid unnecessary long-term memory usage. Second, the program restricts the filtering of some impossible data keys, for example, using the bloom filter idea, especially when the frontend request is sent to the backend, try to make an intermediate judgment (for example, if null is directly returned for an invalid key ).

2.3 Control cache avalanche

Some details are similar to the above, but not completely. When the cache becomes unavailable, or a large amount of data becomes invalid at the same time in the same scenario, batch requests directly access the database. At this moment, it is equivalent that no cache measures are taken.

To avoid this extreme problem, we can mainly consider three aspects: First, improve the high availability mechanism of cache, and have a separate O & M monitoring and warning; second, it is similar to the previous random operation on cache time, especially in scenarios that contain push and batch operations. (PS: You can see similar designs in many places to reduce the probability. When designing a project, a simplified version is included in the initial stage of the project .); Third, multi-level cache is added in some scenarios, but other problems (such as synchronization issues before multi-level synchronization issues) are often added. Therefore, we recommend that you add them to Level 2 first, then, adjust the time to a level-1 cache.

Conclusion

Because my personal abilities and experience are limited, I am also continuing to learn and practice. If there is anything wrong with this article, please correct me. This series has come to an end, and we are also busy with some things. We may not write related things for the time being.

Current standby address of an individual:
Community 1: https://yq.aliyun.com/u/autumnbing
Community 2: https://www.cnblogs.com/bsfz/

End.

[Original] experiences in microapplication of cache in Distributed Systems (iv) [interactive scenarios]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More