[Original] Experience on micro-application of Distributed system cache (i) "Design Details"

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Experience on micro-application of Distributed system cache (i) "Basic details"

Preface

I've been busy with trifles in recent months, and I haven't been idle for nearly a year. Busy in the fall of 2018 years, have to sigh time is always like fleeting, also do not know what harvest and lost what. Recently a little rest, bought two books unrelated to technology, one is Yann Martel wrote "The High mountains of Portugal" (Portugal's high mountains), found that reading this book is to need some patience, the metaphor of life is very deep, there is enough white, Interested friends can be fine under the goods. OK, here's to the point, try writing some practical experience and thinking about caching technology in the work.

Body

In distributed Web programming, the key technology to solve high concurrency and internal decoupling cannot be separated from cache and queue, and cache role is similar to CPU cache in computer hardware. Today's larger-scale internet projects, even in the initial beta version of the development, will be reserved for design. But in many application scenarios, there are some high-cost technical problems that need careful tradeoffs. This series focuses on the service-side caching-related technologies in distributed systems, and also refers to their own thinking details in the context of friends. If there are any irregularities in the text, please correct me.

　　The first one tries to talk as much as possible about the underlying design of the cache itself, as well as the relevant operational details, such as the use of Redis for example.

　　A, slightly explain the basic characteristics and technical costs (this article mainly refers to the server data cache)

　　　1.1 A distinction

Caching is based on a variety of different criteria, and the local cache and distributed cache (distributed cache) are a common classification that contains many fine-grained classes.

Local does not mean that the program is on the local server (from the strict concept), but more granular refers to the internal storage space of the program itself, and distributed more emphasis on the process of storage on one or more servers, mutual communication, in the design and application of specific software projects, most of the time is mixed.

(Of course, the individual believes that the understanding of the nature of the cache is the most important, as to the conceptual classification is only a different understanding of the Division)

1.2 Some technical costs

In the specific project architecture design, the development cost of using the former (local cache) is undoubtedly very low, the main consideration is the local memory load or a very small amount of disk I/O impact. The latter is designed to facilitate the efficient sharing and management of cached data between distributed programs, in addition to considering the memory load of the server itself, the design needs to take into account the network I/O, the load of the CPU, and the cost of disk I/O in some scenarios, beginner's mind It also avoids and balances overall stability and efficiency as much as possible at design time, not just as a cache server's hardware cost and technical maintenance. The underlying issues that need careful consideration include the details of the tradeoffs between cache communication, Network load, and latency.

In fact, if you understand the nature of caching, you should know that any storage medium can act as an efficient cache role and cluster between projects and caches in an appropriate scenario. Common mainstream memcached and Redis are among the latter categories, and can even include document databases such as MongoDB based on NoSQL design (but this is from a role point of view, and this is a disk-based repository, which requires attention and specialization). These third-party caches are in the project integration and cache cluster, and some issues need to be addressed. Even when the project is iterative to a later stage, it often requires a high level of expertise to participate in the operation, and in the development of logical design and code implementation will also increase a certain amount of work. So sometimes in the design of specific projects, on the one hand as far as possible to reserve, on the one hand, according to the actual situation as concise as possible.

Additional said other experience: in the personal limited technical learning and practice, about the node data interaction, especially inter-service communication, there is no perfect closed-loop, in theory is in the "current stage" for the "high consistent" tradeoff (probably with life is the same, ah, write off).

Ii. Some design details of the cache database structure

(due to the fact that Redis 3.x is currently used in most of the personal work, the following feature associations are used as a reference.) ）

2.1 Example (Instance)

Depending on the business scenario, the public data and the business coupling data must use different instances respectively. If it is a single instance, you can consider dividing it in db. When you are using Redis, DB has data isolation in Redis, but there is no strict permission limit, so the library is just a choice. In the cluster cluster, you keep the default single library, but in practice I try to adjust to the size of the project, and at which stage of the development is the reservation design.

It is important to note that, as a cache product that relies heavily on server memory, if persistence is turned on (as mentioned later) and the server hardware resources are heavily preempted in support of a massively concurrent service, consider whether the instance is being provisioned with a persistent policy, taking into account the availability of sub-disk storage. The essence of persistence is that memory data is written synchronously to the hard disk (brush disk), and disk I/O is limited, and forced write blocking can cause additional exceptions and even ripple other underlying dependent services, in addition to thread blocking and service timeouts. Of course, my advice is that if conditions permit, it is best to plan and determine when the project is initially designed.

2.2 Caching "tables" (table)

The general cache does not have an intuitive table concept in a traditional RDBMS (often in the form of a key value of "KV"), but structurally, the key-value pair itself can be assembled into a variety of table structures. Normally I would, sir. into a database table diagram, and then analyze when to store the string, when to store the object, and then use the cache key (key) to split the table and field (column).

Suppose you need to store a login server table data that contains fields (columns): Name, sign, addr, then consider splitting the data structure into the following form:
{key: "Server:name", Value: "XXXX"}
{key: "Server:sign", Value: "YYYY"}
{key: "Server:addr", Value: "Zzzz"}

It is important to note that, often in distributed cache products, such as Redis, there are many data structures (such as string, hash, etc.), but also need to choose the corresponding cache data structure according to the correlation and the number of columns, the related storage space and time complexity is completely different, And this is hard to feel in the early stages.

At the same time, even if the cache memory settings are large enough, there are many remaining, but also need to consider a similar RDBMS in the single-table capacity problem, the number of control items can not be unlimited growth (such as predictable to the storage of items to easily reach millions), "sub-database table" design ideas are interlinked.

2.3 Cache Key (key)

The above refers to the design of the table based on the cache key, and here is a separate description of the key-related personal specifications. Under the premise that the key length is short enough, if the same business module is associated, it must be designed to start with the same identification (codename), which is easy to find and statistic management.
If the user logs on to the server list:
{key: "Ul:server:a", Value: "XXXX"}
{key: "Ul:server:b", Value: "YYYY"}

In addition, each independent business system may consider configuring a unique generic prefix identifier, which, of course, is not required, and can be ignored if you are using a different library in your actual work.

　　　　2.4 Cache values (value)

The size of the value in the cache (which refers to a single entry) is not average, but size is naturally as small as possible (if you are using Redis, a larger value of one operation will directly affect the response time of the entire redis, not just the network I/O). If storage footprint is direct to 10m+, it is recommended to consider whether the associated business scenario can be split into hot and non-hot data.

　　　　2.5 Persistence (permanence)

The above is also briefly mentioned below, generally speaking, the persistence and the cache itself is not directly related, can be roughly imagined as a hard disk facing memory. But in today's Web projects, some business scenarios are highly dependent on caching, and persistence can help improve fast recovery after the cache service restarts, providing storage features under specific scenarios on the one hand. Of course, persistence requires sacrificing some performance, including CPU preemption and hard-disk I/O impact. Most of the time, however, the benefits outweigh the disadvantages, and it is recommended that when the cache is applied, there is no special case, as long as possible with persistence, whether using its own mechanism or a third party to achieve.

In the case of Redis, which has a related persistence policy on its own, including aof and RDB, I have in most cases been configured both (and, of course, the latest official version itself provides mixed mode). If in some non-high concurrency scenarios, or in some small and medium-sized projects in the management module, just as a means of optimization, it is not necessary to determine the long-lasting, can also be directly set off, save performance cost loss, but it is recommended in the program to label the instance, to ensure that the public use of the instance.

2.6 Elimination (eliminate)

If the cache grows indefinitely, even if a short expiration is set (expiration), at some point in time, high concurrent batches of big data will reach the peak of the memory available in a short period of time, and there will be a lot of delays and errors in the interaction with the cache server in the program. Even to the server itself has brought serious instability. Therefore, in the production environment, try to configure the maximum memory limit for the cache, and the appropriate elimination strategy.

If you are using Redis, you have a more flexible choice of self-elimination strategies. The personal design is that, in the case of data rendering similar power law distribution, there is always a large amount of data access is low, I will choose to configure ALLKEYS-LRU, VOLATILE-LRU, the least accessed data to retire. Again, for example, the cache is used as a log application, then I am generally the project is configured No-enviction, later will be configured as Volatile-ttl. Of course, I have also seen a special business design, the cache is directly used as a lightweight persistent database use, but also the terminal, began to feel some novelty, and later found to be very consistent with the business design (such as almost no complex logic and strong transaction). Therefore, it is reasonable to not imprison the traditional design, after all, architecture is always based on the real-time combination of business and change.

Third, the base of the cache curd and other related (here I mainly discuss the first level cache)

　　3.1 New (Create)

If there is no special business requirement (as mentioned above), the insertion must set the expiration time. At the same time, try to ensure the expiration randomness. In the case of bulk caching, the personal approach is to ensure that the expiration time of the settings is at least fragmented, in order to reduce the risk and impact of cache avalanches (I'll try to explain them in later extensions).

For example, the bulk cache object is a result set with 100,000 entries and a cache time base of 60*60*2 (SEC), which now needs to be cached simultaneously. My practice is to generate a random number by default, such as random (range 0-1000), and the expiration time is set to (60*60*2 + random).

3.2 Modification (Update)

Update a cached data, and note whether you need to readjust the expiration time. At the same time, in many cases, such as synchronization between multiple caches, it is recommended to delete the cache directly instead of updating the cache. Modification operations are often associated to the synchronous operation between the DB, relatively more sophisticated, need to weigh the problem of distributed transactions, the subsequent article will be written.

　　　　3.3 Reads (read)

When looking for a cache, if there are multiple bars and you determine that the amount of data is small, be sure to use a pattern that closely matches key and try not to use wildcards. Although the key data of the sending instruction is getting longer, it avoids the unnecessary search performance loss in the cache.
For example, simply believe in Redis's own storage optimization, unlimited use of the keys pattern regardless of time complexity, while causing a lot of thread blocking (this is not related to master-slave replication). If the compromise using scan paging substitution, is not a "worry-free" implementation, one needs to be in the package of program code to set a lower capacity, the second is to be sure in the program logic to the data, such as the magic of the potential problems related to control processing.

In addition, an additional analogy can be used to manipulate large tables in the DB, and the hit hotspot data is distributed back.

3.4 Delete/empty (delete/clear)

Delete the cache, generally have a direct removal and set the time to expire (not all the time the sliding increase expires) two ways, no details on the description. (I have heard of a special business situation, bulk request of the same kind of data, and the immediacy is not very high requirements, set the expiration time and a little scattered time.) ）

Empty the cache, I do not currently apply in the project, and even do not advocate direct use. But if you use it, you need to consider two places carefully. One is to clean up the time, the second is to clear the aging (if in redis, whether it is flushdb or flushall, will form a certain block)

3.5 Lock/Signal (Locking)

Independent of the cache, belonging to some of the concurrency characteristics of implementation, there are certain applicable scenarios. This has some atom-based implementations in Redis, but is not relevant to this series of discussions. I wrote last year a related to share, see: Mall system under the single Inventory Control series (ii) www.cnblogs.com/bsfz/p/7824428.html), but here do not repeat.

　　　　3.6 Publish-Subscribe (publish-subscribe)

Why do you refer to this action related to production and consumption (Produce-consume)? The mechanism itself is not part of the cache itself, but is more relevant to Message Queuing. The reason for this is that today's mainstream cache products come with this feature, and many scenarios are easy to use, easy to configure, and efficient. It's just that it often causes abuse. The key is that unnecessary strong coupling also reduces overall flexibility and performance, and scalability is limited. Of course, this is my current view.

My advice is: if there is no special scene application, try not to use. At least I would not recommend using the cache's own publish subscription, even in the cache cluster system, need more sophisticated details. The recommended approach is to use other specialized middleware solutions, such as MQ-based product alternatives. Specific candidates have excellent open source works such as RABBITMQ, Kafka and so on, including friends mentioned in the last two years domestic Ali developed rocketmq and so on, but the personal use of more is still rabbitmq. Of course, here do not go too much to repeat, according to the scene selection, the right scene to choose the most appropriate technical solution can be.

Conclusion

This article is first written here, the next one will try to expand the elaboration around the related topic.

PS: Because of personal ability and experience are limited, oneself also in continuous learning and practice, if there is inappropriate in the text, please correct me.

"Reserved placeholder: Experience on micro-application of Distributed System Cache" (ii) "Interactive scenario chapter" Www.cnblogs.com/bsfz/p/9568951.html "

End.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Original] Experience on micro-application of Distributed system cache (i) "Design Details"

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Original] Experience on micro-application of Distributed system cache (i) "Design Details"

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support