Author srini Penchikala translator Song posted on October 10, 2008 10:6 P.M. Community Java theme Cluster and cache
Memcached is a distributed memory object caching system for dynamic Web applications to reduce database load. It reduces the number of read databases by caching data and objects in memory, thereby increasing the speed of dynamic, database-driven Web sites. Memcached is based on a hashmap that stores key/value pairs. Its daemon (daemon) is written in C, but the client can write in any language and communicate with the daemon through the memcached protocol. However, it does not provide redundancy (for example, copying its hashmap entries), and when a server s stops running and crashes, all key/value pairs placed on s will be lost.
Bela Ban,jboss's jgroups and clustering team leader recently wrote a jgroups based memcached implementation that allows Java clients to access memcached directly. This implementation is written entirely in Java and has a small number of features that are superior to the memcached framework: Java clients and Partitionedhashmap (ORG.JGROUPS.BLOCKS.PARTITIONEDHASHMAP) can run in the same address space, so there is no need to communicate using the Memcached protocol. This allows the servlet to access the cache directly without having to serialize it. All Partitionedhashmap processes know each other, and when a cluster member changes, they can decide what to do. For example, a server that stops a service can migrate all the keys it manages to the next server. With memcached, the entries stored on the s server are lost when s shuts down. When a cluster member changes (for example, a new server S is started), all servers check to see if one of their saved entries should actually be stored on S. They will transfer all the entries to S. The advantage is that there is no need to reread the entries from DB again and insert them into the cache (memcached is doing this), but the cache will automatically rebalance itself. Partitionedhashmap has a first-level cache (Levels 1 CACHE--L1 cache). This allows the cached data to be close to where it really needs to be. For example, if we have several servers for a, B, C, D, and E, and a client adds a (to be highly accessed) newspaper article to C, memcached always transfers all single requests to the article to C. This way, a client that is accessing D always triggers a GET request from D to C and returns an article. JGroups caches this article in D's L1 cache on its first visit so that all other clients accessing the article from D will get the cached article, so we can avoid another round of access to C. Note that each entry has its expiration time, which causes the entry to be removed from the L1 cache when it is invalidated, so the next access will have to retrieve the article from C again and place it again in the L1 cache of D. This expiration time is defined by the author of the article. Because the RPC for GET, set, and remove uses jgroups as a transmission, the type of transmission and the quality of the service can be controlled and customized by defining the underlying XML file for transmission. For example, we can compress or encrypt all RPC information. It also allows us to choose UDP (IP multicast and/or UDP datagrams) or TCP. The connector (org.jgroups.blocks.MemcachedConnector) is responsible for parsing the memcached protocol and calling The request on the Partitionedhashmap (Partitionedhashmap represents the implementation of the memcached), the server (Org.jgroups.demos.MemcachedServer) and L1 and L2 caches (Org.jgroups.blocks.Cache) can be arbitrarily assembled or substituted. So the custom jgroups memcached implementation is simple, such as using a different memcachedconnector to handle the binary protocol (and of course it needs to match the client code). All management information and operations are exposed via JMX.
The main class that initiates the jgroups memcached implementation is org.jgroups.demos.MemcachedServer. It creates a L1 cache (if configured), a L2 cache (which stores the default hashmap for all entries), and a memcachedconnector. The API is very simple and contains the following caching methods: public void put (K key, V Val): Cache the key/value pairs by default cache time public void puts (K key, V Val, long Caching_time): With the above method The same, but you can define the cache expiration time. 0 means forever caching,-1 indicates no cache, and any positive value represents the number of milliseconds that the entry is to be cached public V get (k key): Gets the values of key k corresponding to public void remove (k key): Deletes a key/value pair from the cache (L2 and L1, if enabled)
Infoq interviewed Bela Ban on the motives behind Memcached's jgroups implementation. He says Memcached's jgroups implementation allows them to experiment with distributed caching and see how different caching strategies fit into the JBoss cluster. He also clarified the comparison between the new memcached implementation and the Jbosscache caching framework: We think of caching as a continuum: from distributed caching (data spanning multiple nodes in a cluster, but without redundancy) to a fully replicated data cache (each data entry is replicated to each cluster node as a whole). Between distributed and overall replication, we also have buddy replication, which replicates data only to selected backup nodes. This can be compared to Raid,raid 0 without redundancy (distributed), RAID 0+1 is full redundancy, and RAID 5 is partially redundant.
Currently, JGroups's Partitionedhashmap provides distributed caching, Jbosscache provides full replication and partial replication (using Buddy Replication) caching. The idea is for users to define the K (--per data item) value that they want to put in the cluster, k=0 to represent distributed, but if one node holds one or more entries, the data is lost if a node crashes; K=x (here X
Memcached's jgroups implementation is the first step in trying to k=0, which is a pure data distributed cache with no redundancy. It will eventually be incorporated into the Jbosscache.
memcached implement which module is appropriate for the JBoss application server. It will become part of the clustering subsystem, provided by Jbosscache. Note that our implementation is actually written to the "Java" client, so it is not necessary to use those very inefficient memcached protocols, but rather to use marshalling (marshalling)/interpretation (unmarshalling)/replication (copying) on the upper level.
Referring to the typical use scenario for jgroups implementations using memcached, Bela said: Server-side code (such as Servlets) running on JBoss or Tomcat clusters accesses a DB and needs caching to improve speed and avoid db bottlenecks. Other usage scenarios are similar, except that access is not db but file systems. For example, an HTML page caching server (squid immediately emerges in the mind).
There is no plan to introduce memcached into the JBoss application server in the future. Of course. The data partition (partitioning) feature allows users to configure the cache to their own needs. This makes the distributed cache look not like a new feature, but rather a jbosscache configuration. The cool thing is that it's dynamic, so developers can decide which redundancy feature to use for each data item (per Jbosscache) that they put into it (none=distribution,full=total replication or partial).
As for the future direction of the new features of the project, Bela a list of things to do: provide an eviction strategy based on the number of bytes in the cache rather than the number of elements. Take the from remote server to the element store as byte[] buffer instead of an object. On the first visit, the byte buffer was interpreted as an object. This is used in JBoss's HTTP session-copying code and always behaves well: because it does not require an interpretation process and therefore does not affect performance. Implement all memcached protocols: Now I only provide get, get-multi, set, and delete. While other (APPEND, prepend, CAS) are easy to implement, I haven't done that because the main use scenario for Java clients is in the same JVM that we memcached implement, so the memcached protocol is not required. Provides a better consistent hash of the implementation.
Memcached's jgroups implementation and its dependency class library can be downloaded from its SourceForge site. Here is the command to run the program:
Java-jar Memcached-jgroups.jar
Bela is looking forward to feedback from the community. He says this is an experimental feature, but will be a feature of Jbosscache support, and community opinion will greatly affect the direction of this feature.
View English Original: JGroups implementation of Memcached Supports Failover and JMX