Ehcache is now the most popular pure Java open Source caching framework, the configuration is simple, clear structure, powerful, initially know it, from the hibernate cache began. Chinese Ehcache materials on the Internet to introduce and configure most of the methods, if you have this problem, please Google; for the API, the official online introduction is very clear, please see official website; however, there are few features and analysis of the principle of implementation, so in this article, I will detail and analyze the characteristics of Ehcache, plus some of their own understanding and thinking, I hope that the cache interested in the harvest of friends.
first , the characteristics of the list, from the official website, a simple translation:
1. Fast light weight
Over the past few years, many tests have shown that Ehcache is one of the fastest Java caches.
Ehcache's threading mechanism is designed for large, high concurrency systems.
A large number of performance test cases ensure consistency in performance between different versions of Ehcache.
Many users do not know that they are using ehcache because they do not need any special configuration.
APIs are easy to use, so it's easy to deploy online and run.
Very small jar bag, Ehcache 2.2.3 only 668kb.
The smallest dependence: the only dependency is slf4j.
2, scalability
Caching in memory and disk storage can be scaled to several g,ehcache to optimize for large data storage.
In the case of large memory, all processes can support hundreds of G throughput.
Optimized for high concurrency and large multiple CPU servers.
Thread safety and performance are always a pair of contradictions, and the Ehcache threading mechanism is designed to take Doug Lea's idea to achieve higher performance.
Multiple cache managers are supported on a single virtual machine.
By terracotta the server matrix, you can scale to hundreds of nodes.
3. Flexibility
Ehcache 1.2 has an Object API interface and a serializable API interface.
Objects that cannot be serialized can use all of the features except disk storage Ehcache.
In addition to the element's return method, the API is uniform. Only these two methods are inconsistent: Getobjectvalue and Getkeyvalue. This makes it easy to cache objects and serialize objects to get new features.
Cache and element based expiration policies are supported, and each cache lifetime can be set and controlled.
The LRU, LFU and FIFO cache elimination algorithms are provided, and Ehcache 1.2 introduces the least-used and advanced first out cache elimination algorithm, which constitutes a complete cache elimination algorithm.
Provides memory and disk storage, and Ehcache, like most cache solutions, delivers high-performance memory and disk storage.
Dynamic, Run-time cache configurations, the maximum number of live, idle, memory, and disk cache caches can be modified at run time.
4. Standard Support
Ehcache provides the most complete implementation of the JSR107 Jcache API. Because the implementation of Ehcache (such as Net.sf.jsr107cache) has been released before the release of Jcache.
Implementing the Jcache API facilitates portability of other cache solutions in the future.
Ehcache's defender, Greg Luck, is JSR107 's expert committee member.
5, scalability
The listener can be plug-in. The Ehcache 1.2 provides a cachemanagereventlistener and Cacheeventlistener interface that can be plug-in and configurable in Ehcache.xml.
The nodes found that both the redundancy and the listener could be plug-in.
Distributed caching, introduced from Ehcache 1.2, contains a number of trade-offs. Ehcache's team believes that nothing is a one-size-fits-all configuration.
Implementations can use built-in mechanisms or implement them entirely, as there is a complete plugin development guide.
The scalability of the cache can be plug-in. Create your own cache extension, which can hold a cached reference and bind to the cached lifecycle.
The cache loader can be plug-in. To create your own cache loader, you can use some asynchronous methods to load data into the cache.
The cache exception handler can be plug-in. Create an exception handler that can perform certain actions when an exception occurs.
6, the application of persistence
After the VM restarts, storage that is persisted to disk can recover data.
Ehcache is the first open source Java caching framework to introduce a persistent storage of cached data. Cached data can be restarted from the disk after the machine restarts.
Brush the cache to disk as needed. The process of brushing cache entries to disk can be performed through the Cache.flush () method, which greatly facilitates the use of ehcache.
7. Listener
The Cache Manager listener. Allows you to register listeners that implement the Cachemanagereventlistener interface:
Notifycacheadded ()
Notifycacheremoved ()
Cache event listeners. Allows you to register listeners that implement the Cacheeventlistener interface, which provides a number of processing mechanisms after a cache event occurs:
Notifyelementremoved/put/updated/expired
8, open JMX
The Ehcache jmx feature is turned on by default, and you can monitor and manage the following mbean:
CacheManager, Cache, Cacheconfiguration, Cachestatistics
9. Distributed caching
Starting with Ehcache 1.2, it supports high-performance, distributed caching with both flexibility and scalability.
Options for distributed caching include:
Cache clustering via terracotta: setting and using the Ehcache cache of the terracotta mode. Cache discovery is done automatically, and there are a number of options that you can use to debug caching behavior and performance.
Redundant caching of data using RMI, JGroups, or JMS: nodes can be manually configured through multicast or discovery. Status updates can be done asynchronously or synchronously through RMI connections.
Custom: A comprehensive plug-in mechanism that supports discovery and replication capabilities.
The available cache replication options. Supported asynchronous or synchronous cache replication through RMI, jgroups, or JMS.
Reliable distribution: Using the built-in distribution mechanism of TCP.
Node discovery: Nodes can be manually configured or used for automatic discovery by multicast, and nodes can be automatically added and removed. In the case of multicast blocking, manual configuration can be well controlled.
The distributed cache can join or leave the cluster at any time. Caching can be configured to execute the boot programmer at initialization time.
The Bootstrapcacheloaderfactory abstract factory implements the Bootstrapcacheloader interface (RMI implementation).
Cache service side. Ehcache provides a cache server, a war package, to support most web containers or stand-alone servers.
There are two sets of APIs on the caching server: resource-oriented restful, and soap. The client does not implement language restrictions.
RESTful cache server: the implementation of ehcached strictly follows restful resource-oriented architecture style.
SOAP cache server: The Ehcache RESTFul Web Services API exposes a single example of CacheManager that can be configured in Ehcache.xml or IOC containers.
The standard service side contains an embedded GlassFish Web container. It is a war package that can be deployed to the Web container that supports servlet 2.5. Glassfish V2/3, Tomcat 6 and Jetty 6 have all been tested.
10, search
Standard distributed search uses a streaming query interface, see documentation.
11. Java EE and application caching
Provides a high quality implementation for common cache scenarios and schemas.
Blocking caching: Its mechanism avoids the problem of concurrent operation of replication processes.
Selfpopulatingcache is especially useful when caching some expensive operations, which is a cache for read optimization. It does not require the caller to know how the cached element is returned, and it also supports refreshing the cached entry without blocking the read.
Cachingfilter: An abstract, extensible cache filter.
Simplepagecachingfilter: Used to cache pages based on the request URI and query string. It can select or not use gzip compression to send pages to the browser side based on the value of the HTTP request header. You can use it to cache the entire Servlet page, whether you're using JSP, velocity, or other page rendering techniques.
Simplepagefragmentcachingfilter: Caches page fragments, based on the request URI and query String. Use the jsp:include tag in the JSP to include.
has been tested with Orion and Tomcat, compatible with servlet 2.3, servlet 2.4 specification.
cacheable command: This is an old command-line pattern that supports asynchronous behavior and fault tolerance.
Compatible hibernate, compatible with Google App Engine.
Transaction support based on JTA, support for transaction resource management, two-phase commit and rollback, and local transactions.
12, open Source agreement
Apache 2.0 License
second, Ehcache load module list , they are independent libraries, each add new features for Ehcache, you can download here:
EHCACHE-CORE:API, standard cache engine, RMI replication and hibernate support Ehcache: Distributed Ehcache, including Ehcache core and terracotta Library Ehcache-monitor: enterprise-level monitoring and management Ehcache-web: Implementation of the filters ehcache-jcache:jsr107 Jcache providing caching, gzip compression support for the Java Servlet container Ehcache-jgroupsreplication: Using Jgroup replication ehcache-jmsreplication: Using JMS for replication EHCACHE-OPENJPA:OPENJPA plug-ins RESTful cache server Ehcache-unlockedreadsview, either deployed in Ehcache-server:war or deployed separately: allows terracotta cache to be read without locks Ehcache-debugger: Logging RMI distributed call Events Ehcache for Ruby:jruby and rails support
Overview of the structural design of Ehcache:
Third, the core definition :
Cache Manager: Caching manager, which previously allowed only Singleton, but now can also be multiple instances
Cache: Buffer Manager can place a number of cache, the essence of data storage, all cache has implemented the Ehcache interface
Element: The constituent unit of a single cached data
System of Record (SOR): A component that can fetch real data, such as a real business logic, an external interface invocation, a database that holds real data, and so on, that the cache is read from SOR or written to SOR.
Code example: Java code cachemanager manager = cachemanager.newinstance ("Src/config/ehcache.xml"); Manager.addcache ("Testcache"); Cache test = singletonmanager.getcache (" Testcache "); Test.put (new element (" Key1 ", " value1 ")); Manager.shutdown ();
Of course, this kind of DSL-like configuration is supported, and configuration can be dynamically modified at runtime: Java code Cache testcache = new cache ( new cacheconfiguration ("Testcache", maxelements) . Memorystoreevictionpolicy (MEMORYSTOREEVICTIONPOLICY.LFU) .overflowtodisk ( true) .eternal (false) .timetoliveseconds ( .timetoidleseconds) . Diskpersistent (False) .diskexpirythreadintervalseconds (0));
Example of a transaction: Java code Ehcache cache = Cachemanager.getehcache ("Xacache"); Transactionmanager.begin (); try {Element e = cache.get (key); Object result = Complexservice.dostuff (Element.getvalue ()); Cache.put (New Element (key, result)); Complexservice.domorestuff (result); Transactionmanager.commit (); catch (Exception e) {transactionmanager.rollback (); }
Four, the consistency model :
When it comes to consistency, what is the consistency of the database? You may want to review several isolation levels for your database:
Uncommitted read (READ UNCOMMITTED): No locks are checked or used when reading data. Therefore, data that is not committed may be read in this isolation level. Dirty reads, non-repeatable reads, and phantom reads occur.
Read Committed: Read only the submitted data and wait for other transactions to release exclusive locks. Shared locks that read data are released immediately after the read operation completes. Committed reading is the default isolation level for the database. There will be no repeat reading, phantom reading.
REPEATABLE READ (REPEATABLE Read): Read data as read level has been committed, but keep the shared lock until the end of the transaction. There will be illusions to read.
Serializable (Serializable): works in a way that is similar to repeatable reading. Not only does it lock down the affected data, but it also locks the scope, which prevents the new data from being inserted into the query.
Based on the above, consider the following consistency model:
1. Strong consistency model: After a successful update of a data in the system (a successful return of the transaction), any subsequent read operations on the data are updated. This is the consistency model provided by traditional relational database, and it is one of the reasons that relational database is loved by people. The performance consumption under the strong consistency model is usually the largest.
2. Weak consistency model: After one of the data in the system is updated, subsequent read operations on the data do not necessarily have the updated value, in which case there is usually an "inconsistency time window" exists: that is, after the data update is completed after the time window, subsequent read operations can get the updated value.
3, the final consistency model: one of the weak consistency, that is, after a certain data is updated, if the data subsequent to not be updated again, then all the final read operations will return the updated value.
The final consistency model contains the following essential attributes that are better understood:
Read and write consistent: A thread A, after updating a piece of data, subsequent access to all can get the updated data. Consistent within a session: it is essentially the same as the above, a user changes the data, as long as the session still exists, subsequent to all the data he obtained must be changed data. Monotone read consistent: If a process can see the current value, then subsequent accesses cannot return the previous value. Monotone write consistent: write behavior in the same process must be guaranteed, otherwise, the result is not expected to finish.
4, Bulk load: This model is based on the batch loading data into the cache of the scene and optimized, without the introduction of locks and conventional elimination algorithms to reduce the performance of things, it and the final consistency model is very similar, but there are batch, high sketch and weak consistency guarantee mechanism.
These APIs also affect the consistency of the results:
1, explicit lock (Explicit locking): If we are configured for strong consistency, then naturally all caching operations have a transactional nature. And if we configure the final consistency, and then use the explicit lock API externally, we can also achieve the effect of the transaction. Of course such locks can be controlled more fine-grained, but there may still be competition and thread blocking.
2. Non-lock readable view (Unlockedreadsview): A decorator that allows dirty reads, which can only be used in a strongly consistent configuration, which improves performance by applying for a special write lock rather than full strong consistency configuration.
For example, XML is configured as a strong consistency model: XML code <cache name= "Mycache" maxelementsinmemory= "" eternal= "false" Overflo Wtodisk= "false" <terracotta clustered= "true" consistency= "strong"/> </cache>
But using Unlockedreadsview:java code cache cache = Cachemanager.getehcache ("Mycache"); Unlockedreadsview Unlockedreadsview = new Unlockedreadsview (cache, "Myunlockedcache");
3, Atomic method (Atomic Methods): Method execution is atomized, that is, CAS operations (Compare and Swap). The CAS eventually achieved strong consistency, but the difference was that it was implemented with optimistic locks rather than pessimistic locks. Under the optimistic locking mechanism, the updated operation may not be successful because other threads may have changed the same data during this process, and the update operation will need to be performed after the failure. Modern CPUs support the CAS primitives. Java code cache.putifabsent (element Element); Cache.replace (element Oldone, element Newone); Cache.remove (Element);
v. Cache topology Type :
1. Independent caching (Standalone Ehcache): Such caching application nodes are independent and do not communicate with each other.
2. Distributed caching (distributed Ehcache): Data is stored in Terracotta server arrays (Terracotta server Array,tsa), but most recently used data can be stored in individual application nodes.
Logical Perspective:
The L1 cache is on each application node, while the L2 cache is placed in the cache server array.
Group Network perspective:
Model Storage Perspective:
There is no persistent storage for the L1 level cache. In addition, the server side is much larger than the application node from the amount of cached data.
3. Replication cache (Replicated Ehcache): When caching data at the same time in multiple application nodes, data replication and failure events in the form of synchronous or asynchronous transmission between the nodes of each cluster. When these events arrive, the write-thread operation is blocked. In this mode, there is only a weak consistency model.
It has the following kinds of event propagation mechanisms: RMI, JGroups, JMS, and cache Server.
All nodes are equal in RMI mode:
Jgroup mode: Can configure unicast or multicast, protocol stack and configuration are very flexible.
XML code <cachemanagerpeerproviderfactory class= " Net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory "properties=" CONNECT=UDP (mcast_addr= 231.12.21.132;mcast_port=45566):P ING:MERGE2:FD_SOCK:VERIFY_SUSPECT:pbcast. NAKACK:UNICAST:pbcast. STABLE:FRAG:pbcast. GMS "propertyseparator=":: "/>
JMS mode: The core of this pattern is a message queue, where each application node subscribes to a predefined theme, and when the node has an element update, it publishes the updated element to the topic. On the JMS specification implementation, the Ehcache compatibility has been tested for two of Open MQ and active MQ.
Cache Server Mode: There is a master-slave node in this mode, and communication can be via restful APIs or soap.
Regardless of which mode is above, the update event can be divided into updateviacopy or updateviainvalidate, which simply sends an expired message, which is much more efficient.
Replication caching is prone to data inconsistencies, and if this becomes a problem, consider the mechanism for using data for synchronous distribution.
Even if you do not use distributed caching and replication caching, there are still some bad behaviors, such as:
Cache Drift (Cached drift): Each application node only manages its own cache, and when a node is updated it does not affect other nodes, so the data may be out of sync. This is particularly the case in the Web session data cache.
Database bottlenecks: For single instance applications, caching protects the read storms of the database, but in a cluster environment, each application node keeps the data up to date, and the more nodes there are, the greater the cost to maintain the database.
Six, storage mode :
1, Heap storage: fast, but limited capacity.
2, heap outside (offheapstore) storage: Known as Bigmemory, only in the enterprise version of the Ehcache, the principle is to use the directbytebuffers implementation of NIO, faster than storage to disk, and completely unaffected by the GC, Can guarantee the stability of the response time, but direct buffer is more expensive to allocate than heap buffer, and requires that the object must be stored in a byte array, so the objects must be serialized in the stored procedure, and the reads will be deserialized. Its speed is about one order of magnitude slower than in-heap storage.
(Note: Direct buffer is not affected by GC, but the Java object to which direct buffer belongs is on the heap and can be reclaimed by GC, and once it is reclaimed, the JVM releases the outer space of direct buffer.) )
3, disk storage.
vii. Caching usage patterns :
Cache-aside: direct operation. First ask the cache whether there is a cached data, existing words directly from the cache to return data, bypassing Sor; if not, get the data from the Sor, and then put it in the cache.
Java Code Public V readsomedata (K key) {element element; if (element = Cache.get (key))!= null) {return element.getvalue (); } if (value = Readdatafromdatastore (key))!= null) {Cache.put (New Element (key, value)); return value; }
Cache-as-sor: Combined with read-through, Write-through, or write-behind operations, by adding a layer of proxies to SOR, it does not have to distinguish whether data is obtained from the cache or from the Sor for external application access.
Read-through.
Write-through.
Write-behind (Write-back): Writes the process into the asynchronous, and further delays the writing of the data.
Copy Cache's two modes: Copyonread and Copyonwrite.
Copyonread refers to when the request to read cached data arrives, if the data is found to have expired, need to be retrieved from the source, the operation of the initiated copy element (pull);
Copyonwrite is the action (push) that initiates the update of the copy element of the other node when the real data is written to the cache.
The former is suitable for use when multiple threads are not allowed to access the same element, while the latter allows you to freely control the timing of cache update notifications.
More push and pull changes and differences can also be found here.
Eight, a variety of configuration methods :
Includes configuration files, declarative configuration, programming configuration, and even configuration by specifying constructor parameters, and the principles of configuration design include:
All configurations to be put together
Cached configurations can easily be modified at development time, runtime
The wrong configuration can be found at the start of the program and a Run-time exception needs to be thrown when the error is modified at run time
Provides default configuration, almost all configurations are optional, with default values
ix. Automatic Resource control (Automatic Resource Control,arc):
It provides an intelligent way to control caching, tuning performance. Features include:
In-Memory cache object size control to prevent Oom from appearing
Cache size acquisition of the pool (cache manager level) to avoid the individual calculation of cache size consumption
Flexible independent based on the level of the size of the calculation capabilities, the following figure can be seen, the size of different layers can be individually controlled
You can count the byte size, the number of cached entries, and the percentage
Optimize the acquisition of high hit data to improve performance, see the introduction of the next cache data flow between different layers
The flow of cached data includes several behaviors:
Flush: Cache entry moves to low-level.
Fault: Copy an object from the lower level to the top. The fault behavior is triggered when a layer discovers that its own cache entry is invalidated during cache acquisition.
Eviction: Remove the cached entry.
Expiration: Failure status.
Pinning: Forces the cache entry to remain at a certain level.
The following figure reflects the flow of data between tiers, and also reflects the lifecycle of the data:
10, monitoring function :
Monitored topology:
Each application node deploys a monitoring probe that contacts the monitoring server through the TCP protocol, eventually providing the data to the rich-text client or monitoring the operating server.
11, WAN replication :
For caching data replication, the Ehcache allows two geographically diverse nodes to maintain data consistency across a wide area network, while providing several scenarios (note: The following example plots only two nodes and can actually be extended to n nodes):
The first scenario: Terracotta active/mirror Replication.
Under this scheme, the service side includes an active node, a backup node, and each application node provides read-write service by the active node. This is the easiest and easiest way to manage, but it is most unstable to expect the ideal network situation, where there is a WAN between servers and between the client and the server.
The second scenario: Transactional Cache Manager Replication.
In this scenario, data reads do not need to go through the WAN, writing data to write two copies, respectively, by two cache manager processing, one in the local server, one to the other server. This scheme has high throughput and low latency, but requires the introduction of an XA transaction manager, with two cache managers writing two data that results in a large write overhead, and the write latency over the WAN can still cause the system to respond to bottlenecks.
The third scenario: Messaging based (AMQ) replication.
In this scenario, batch processing and queues are introduced to mitigate the bottleneck of the WAN, while processing read requests and replication logic are physically stripped away from the server array, avoiding the impact of WAN deterioration on the node reading business. This scheme requires higher throughput and lower latency, and the separation of read/copy guarantees the availability of complete message distribution assurance and conflict processing, but it is more complex and needs a message bus.
Some of the Ehcache features are less or more marginalized, not mentioned, such as support for JMX, and others with similar features and introductions, such as for web support, see my interpretation of Oscache, which has a detailed analysis of the principles in the Web Support section.
Finally, with respect to the performance of Ehcache, the following is a blog from Ehcache's founder, Greg Luck:
Put/get on the ehcache to 500-1000 times faster than memcached. why. He himself analyzed: "In-process caching and asynchronous replication are a clear performance winner". For more information about it, please refer to his blog.