Distributed cache refers to the cache deployed in a server cluster of multiple servers, providing the caching service in a clustered way, with two main architectures, a distributed cache, which is represented by JBoss Cache, and a distributed cache of non-communication represented by memchached.
1. JBoss Cache
The distributed cache of Jboss cache caches the same data for each server in the cluster, and notifies other servers in the cluster to update or clear the cache when the cached data for one of the servers in the cluster is updated. JBoss cache typically deploys applications and caches on the same server, where applications can quickly retrieve cached data locally, but the problem is that the number of cached data is limited to the memory space of a single server, and when the cluster is large, Cache update information needs to be notified to other machines in the cluster for simultaneous updates, which costs a staggering price for both server and network bandwidth. As a result, most of these programs are found in general enterprise applications and are seldom used in large web sites.
2, memchached
memchached was once synonymous with distributed caches of Web sites and was used by a large number of websites. Its simple design, excellent performance, non-communicating server clusters, and massive data-scalable architectures have made site architects flock to them.
Remote communication design needs to consider two aspects of the elements, one is the communication protocol, that is, the choice of TCP protocol or UDP protocol, or HTTP protocol, one is the communication serialization protocol, data transmission at both ends, must use mutually identifiable data serialization way to enable communication to complete, such as XML, A text serialization protocol such as JSON, or a binary serialization protocol such as Google's Protobuffer. Memecached uses the TCP protocol (also supported by UDP) to communicate, its serialization protocol is a set of text-based automatic protocol, very simple, starting with a command keyword, followed by a set of command operands. For example, a command protocol that reads a data is get<key>. After memecached, many NoSQL products have borrowed or directly used the protocol.
Memecached communication protocol is very simple, as long as the client supporting the protocol can communicate with the memecached server, so memecached developed a very rich client program, almost all major web site programming language, so in a mixed multi-programming language of the site, Memecached is a duck.
Memcached server-side communication module is based on Libevent, a library of network communication programs that supports event triggering. Libevent's design and implementation have many areas to improve, but his performance in a stable long connection is exactly what memecached needs. For more detailed information about libevent, we will explain it in detail when we talk about memecached later.
In the previous article we said that caching is storing data in a relatively high-speed storage medium, so the pass-through cache is stored in memory. Then the cache data is stored in memory, it will inevitably involve a problem, that is, memory management. In memory management, the most troubling problem is the memory fragmentation management. Operating system, virtual machine garbage collection In this regard think of a lot of ways: compression, replication and so on. Memecached uses a very simple method, that is, a fixed memory space allocation.
Memecached divides the memory space into a set of slab, each slab contains a set of chunk, the size of each slab inside the same chunk is fixed, and chunk with the same size slab is organized together called Slab_class.
When storing data, look for a minimum chunk larger than size to write data based on the size of the data. This method of memory management avoids the problem of memory fragmentation management, and the allocation and release of memory are in chunk units. As with other caches, memcached is also using the LRU (most recently unused algorithm) algorithm to free up space occupied by the most recently accessed data, and the released chunk is marked as unused, waiting for the next appropriate data to be written.
Of course memecached this memory management mechanism will also bring about memory waste, the data can only exist in a larger chunk, and a chunk can only save one data, the other space is wasted. If the parameters are not properly configured at startup, the waste will be even more alarming, and the space will be gone without the amount of data being cached.
Memcached characteristics of non-communication Yes memecached from many distributed cache products, such as JBoss cache, Oscache, to meet the site's need for massive cache data. Its client-side routing algorithm consistent hash becomes the classic paradigm of data storage scalability Architecture design. In fact, it is the distributed cache servers in the cluster that do not communicate with each other, allowing the cluster to achieve virtually unlimited linear scaling, which is the basic architectural feature of many big data technologies that are prevalent today.
Although many NoSQL products in recent years, in data persistence, support complex data structure, and even performance of many are better than memecached, but memecached because of its simple, stable, focused on the characteristics of the distributed cache still occupies an important position.
For memecached Related technical knowledge, we will explain in detail in future articles.
Distributed cache-Distributed cache architecture for application Server performance optimization