Ehcache is now the most popular Java open-source caching framework, simple to configure, well-structured, powerful, and initially known to start with Hibernate's cache. On-line Chinese Ehcache materials to simple introduction and configuration methods, if you have this problem, please Google, for the API, the official website is very clear, please refer to the official site, but rarely see the description of features and analysis of the implementation principle, so in this article, I will introduce and analyze the characteristics of Ehcache, plus some of my own understanding and thinking, I hope the cache interested in friends to gain.
first, features at a glance , from the official website, simple translation:
1. Fast and light weight
Over the past few years, many tests have shown that Ehcache is one of the fastest Java caches.
Ehcache's threading mechanism is designed for large, high-concurrency systems.
A large number of performance test cases ensure that ehcache performance is consistent across different versions.
Many users do not know that they are using ehcache because they do not need any special configuration.
The API is easy to use, which makes it easy to deploy on-line and run.
Very small jar package, Ehcache 2.2.3 668kb.
Minimal dependency: The only dependency is slf4j.
2. Flexibility
Caching in memory and disk storage can be scaled up to several g,ehcache for big data storage optimizations.
In case of large memory, all processes can support hundreds of g of throughput.
Optimized for high concurrency and large multi-CPU servers.
Thread safety and performance are always conflicting, and Ehcache's threading mechanism design uses Doug Lea's idea to achieve higher performance.
Multiple cache managers are supported on a single virtual machine.
With the terracotta server matrix, you can scale to hundreds of nodes.
3. Flexibility
Ehcache 1.2 has an Object API interface and a serializable API interface.
An object that cannot be serialized can use all features except the disk storage Ehcache.
In addition to the element's return method, the API is unified. Only these two methods are inconsistent: Getobjectvalue and Getkeyvalue. This makes the process of caching objects, serializing objects, and acquiring new features simple.
Cache-based and element-based expiration policies are supported, and the time to live for each cache can be set and controlled.
The LRU, LFU and FIFO cache culling algorithms are provided, and the Ehcache 1.2 introduces a least-used and first-in-a-go cache culling algorithm, which constitutes a complete cache-culling algorithm.
Providing memory and disk storage, Ehcache provides high-performance memory and disk storage like most cache solutions.
Dynamic, runtime cache configuration, the maximum number of live, idle, memory, and disk storage caches can be modified at run time.
4. Standard Support
Ehcache provides the most complete implementation of the JSR107 Jcache API. Since Jcache was released, Ehcache implementations (such as Net.sf.jsr107cache) have been released.
Implementing the Jcache API facilitates portability of other caching solutions in the future.
Ehcache's maintainer, Greg Luck, is a member of JSR107 's expert committee.
5. Scalability
The listener can be plug-in. The Ehcache 1.2 provides Cachemanagereventlistener and Cacheeventlistener interfaces, which can be plug-in and configured in Ehcache.xml.
Node discovery, both the redundancy and the listener can be plug-in.
Distributed cache, introduced from Ehcache 1.2, contains a number of tradeoff options. Ehcache's team believes that nothing is a universal configuration.
The implementation can use the built-in mechanism or implement it entirely on its own, as there is a complete plug-in development guide.
The scalability of the cache can be plug-in. Create your own cache extension, which can hold a cached reference and bind to the cache life cycle.
The cache loader can be plug-in. To create your own cache loader, you can use some asynchronous methods to load data into the cache.
The cache exception handler can be plug-in. Creates an exception handler that can perform certain actions when an exception occurs.
6, the application of persistent
After a VM restarts, storage that is persisted to disk can recover data.
Ehcache is the first open-source Java caching framework that introduces cache data persistence storage. The cached data can be re-acquired from disk after the machine restarts.
Swipe the cache to disk as needed. The operation of swiping a cache entry to disk can be performed through the Cache.flush () method, which greatly facilitates the use of ehcache.
7. Listener
Cache Manager Listener. Allows registering a listener that implements the Cachemanagereventlistener interface:
Notifycacheadded ()
Notifycacheremoved ()
Caches event listeners. Allows registering a listener that implements the Cacheeventlistener interface, which provides many of the processing mechanisms that occur after a cache event occurs:
Notifyelementremoved/put/updated/expired
8. Open JMX
The JMX functionality of Ehcache is enabled by default, and you can monitor and manage the following mbean:
CacheManager, Cache, Cacheconfiguration, Cachestatistics
9. Distributed Cache
Starting with Ehcache 1.2, it supports high-performance distributed caching with flexibility and scalability.
Options for distributed caching include:
Cache cluster via terracotta: Sets and uses the Ehcache cache of the terracotta mode. Cache discovery is done automatically, and there are many options for debugging cache behavior and performance.
Use RMI, jgroups, or JMS to cache data redundantly: Nodes can be configured manually by multicast or by a discovery person. Status updates can be done asynchronously or synchronously through RMI connections.
Custom: A comprehensive plug-in mechanism that supports the ability to discover and replicate.
The available cache replication options. Support for asynchronous or synchronous cache replication over RMI, jgroups, or JMS.
Reliable distribution: Use TCP's built-in distribution mechanism.
Node discovery: Nodes can be configured manually or using multicast Autodiscover, and nodes can be added and removed automatically. In the case of multicast blocking, manual configuration can be very well controlled.
The distributed cache can join or leave the cluster at any time. The cache can be configured to execute the bootstrap programmer at initialization time.
Bootstrapcacheloaderfactory Abstract Factory, implements the Bootstrapcacheloader interface (RMI implementation).
Cache service side. Ehcache provides a cache server, a war package, to support the vast majority of web containers or standalone servers.
The cache server has two sets of APIs: resource-oriented restful, and soap. The client does not implement language restrictions.
RESTful cache server: the implementation of ehcached strictly follows the restful resource-oriented architectural style.
SOAP cache server: The Ehcache RESTFul Web Services API exposes a singleton CacheManager, which can be configured in Ehcache.xml or IOC containers.
The standard service side contains an embedded GlassFish Web container. It was made into a war package and can be deployed arbitrarily into a web container that supports servlet 2.5. Glassfish V2/3, Tomcat 6 and Jetty 6 have all been tested.
10. Search
Standard distributed search uses the way the streaming query interface is used, see the documentation.
11. Java EE and application cache
Provides a high-quality implementation for normal cache scenarios and patterns.
Blocking caching: Its mechanism avoids the problem of concurrent operation of the replication process.
Selfpopulatingcache is especially useful when caching some expensive operations, which is a cache for read-optimized caching. It does not require the caller to know how the cache element is returned, and it also supports refreshing the cache entry without blocking the read.
Cachingfilter: An abstract, extensible cache filter.
Simplepagecachingfilter: Used to cache pages based on the request URI and query string. It can choose to use or not use gzip compression to send pages to the browser based on the value of the HTTP request header. You can use it to cache the entire Servlet page, whether you're using JSP, velocity, or other page rendering techniques.
Simplepagefragmentcachingfilter: Cache page fragments, based on request URI and query String. Use the jsp:include tag included in the JSP.
Already tested with Orion and Tomcat, compatible with servlet 2.3, servlet 2.4 specification.
cacheable command: This is an old command-line pattern that supports asynchronous behavior, fault tolerance.
Compatible with hibernate, compatible with Google App Engine.
JTA-based transaction support, support for transactional resource management, two-phase commit and rollback, and local transactions.
12. Open Source Agreement
Apache 2.0 License
Second, ehcache the list of loading modules , they are independent libraries, each to add new features for Ehcache, can be downloaded here:
- EHCACHE-CORE:API, standard cache engine, RMI replication, and hibernate support
- Ehcache: Distributed Ehcache, including Ehcache's core and terracotta libraries
- Ehcache-monitor: Enterprise-level monitoring and management
- Ehcache-web: Provides caching, gzip compression support for Java Servlet container filters
- The realization of ehcache-jcache:jsr107 Jcache
- Ehcache-jgroupsreplication: Copying with Jgroup
- Ehcache-jmsreplication: Using JMS for replication
- EHCACHE-OPENJPA:OPENJPA Plug-in
- RESTful cache server deployed in Ehcache-server:war or deployed separately
- Ehcache-unlockedreadsview: Allow terracotta cache for lock-free read
- Ehcache-debugger: Logging RMI distributed call events
- Ehcache for Ruby:jruby and rails support
Overview of Ehcache structure design:
Third, the core definition :
Cache Manager: Caching manager, previously only allowed Singleton, but now also can be more instances of
Cache: Caches can be placed in the buffer manager to hold the essence of the data, all caches have implemented the Ehcache interface
Element: The constituent unit of a single cached data
System of Record (SOR): A component that can fetch real data, a real business logic, an external interface call, a database that holds real data, and so on, which is read from the Sor or written to the Sor.
code example:
Java code
- CacheManager manager = cachemanager.newinstance ("Src/config/ehcache.xml");
- Manager.addcache ("Testcache");
- Cache test = Singletonmanager.getcache ("Testcache");
- Test.put (new Element ("Key1", "value1"));
- Manager.shutdown ();
Of course, this kind of DSL-like configuration is also supported, and the configuration can be dynamically modified at runtime:
Java code
- cache testcache = new cache (
- new cacheconfiguration ( " Testcache ", maxelements)
- . Memorystoreevictionpolicy (MEMORYSTOREEVICTIONPOLICY.LFU)
- . Overflowtodisk (true)
- .eternal (false)
- .timetoliveseconds ( 60)
- .timetoidleseconds ( 30)
- .diskpersistent ( false)
- . Diskexpirythreadintervalseconds (0);
Examples of transactions:
Java code
- Ehcache cache = Cachemanager.getehcache ("Xacache");
- Transactionmanager.begin ();
- try {
- Element e = Cache.get (key);
- Object result = Complexservice.dostuff (Element.getvalue ());
- Cache.put (new Element (key, result));
- Complexservice.domorestuff (result);
- Transactionmanager.commit ();
- } catch (Exception e) {
- Transactionmanager.rollback ();
- }
Iv. Conformance Model :
Speaking of consistency, what is the consistency of the database? You might want to review some of the database's isolation levels first:
Uncommitted read (READ UNCOMMITTED): No locks are checked or used when reading data. Therefore, data that is not committed may be read in this isolation level. Dirty reads, non-repeatable reads, Phantom reads can occur.
Read Committed: Read committed data only and wait for other transactions to release exclusive locks. Shared locks for read data are released as soon as the read operation is complete. Read Committed is the default isolation level for the database. There will be non-repeatable read, Phantom read.
REPEATABLE READ (REPEATABLE READ): Reads data as read-committed, but retains shared locks until the end of the transaction. Phantom reads will appear.
Serializable (Serializable): works like repeatable reads. But not only does it lock the affected data, it also locks the range, which prevents new data from being inserted into the query.
Based on the above, consider the following consistency model:
1. Strong consistency model: After a successful update of a data in the system (the transaction returns successfully), any subsequent read operations to that data are updated values. This is the consistency model provided by the traditional relational database, and is one of the reasons why the relational database is loved by people. Performance consumption is usually the largest under a strong consistency model.
2, weak consistency model: After the data in the system is updated, the subsequent read operation of the data is not necessarily the updated value, in this case there is usually an "inconsistency window" exists: that is, after the data update is completed in this time window, subsequent read operations can be updated value.
3, the final consistency model: one of the weak consistency, that is, when a data is updated, if the data is not updated again later, then all the read operations will return the updated value.
The final consistency model contains the following necessary properties, which are better understood:
- Read-write consistent: A thread A, after updating a piece of data, subsequent access to all can obtain updated data.
- Session Consistency: It is essentially the same as the one above, a user changes the data, as long as the session is still there, all the data that he obtains must be the changed data.
- Monotonic Read consistency: If a process can see the current value, subsequent accesses cannot return the previous value.
- Monotone write Consistency: The write behavior within the same process must be guaranteed, otherwise, the finished result is not expected.
4, Bulk load: This model is based on the bulk loading of data into the cache inside the scene and optimized, there is no lock and general elimination algorithm to reduce the performance of things, it is similar to the final consistency model, but there are batch, high sketch and weak consistency guarantee mechanism.
These APIs can also affect the results of consistency:
1, explicit Lock (Explicit Locking): If we are configured for strong consistency, then naturally all cache operations have transactional nature. And if we are configured for eventual consistency, then the explicit lock API is used externally, and the effect of the transaction can be achieved. Of course, such locks can be finer grained, but there may still be competition and thread blocking.
2. Lock-Free readable view (Unlockedreadsview): A decorator that allows dirty reads, it can only be used in a strong consistency configuration, and it improves performance by requesting a special write lock that is better than a full strong consistency configuration.
For example, the XML is configured as a strong consistency model:
XML code
- <cache name="Mycache"
- maxelementsinmemory="
- eternal="false"
- overflowtodisk="false"
- <terracotta clustered="true" consistency="Strong" />
- </Cache>
But using Unlockedreadsview:
Java code
- Cache cache = Cachemanager.getehcache ("Mycache");
- Unlockedreadsview Unlockedreadsview = new Unlockedreadsview (Cache, "Myunlockedcache");
3, Atomic method (Atomic Methods): Method execution is atomized, that is, the CAS operation (Compare and Swap). CAs eventually achieved strong consistency, but the difference was that it was implemented using optimistic locks rather than pessimistic locks. Under the optimistic locking mechanism, the updated operation may not be successful because there may be other threads in the process that make changes to the same data, and then the update operation needs to be re-executed after the failure. The modern CPU supports CAS primitives.
Java code
- Cache.putifabsent (element Element);
- Cache.replace (element Oldone, element Newone);
- Cache.remove (Element);
v. Cache topology Type :
1, independent cache (Standalone Ehcache): Such a cache application node is independent, do not communicate with each other.
2. Distributed cache (distributed Ehcache): The data is stored in a Terracotta server array (Terracotta server Array,tsa), but the most recently used data can be stored in each application node.
Logical Perspective:
The L1 cache is on each application node, and the L2 cache is placed in the cache server array.
Networking perspective:
Model Storage Perspective:
L1-level caches are not persisted. Also, from the amount of cached data, the server side is much larger than the application node.
3, replication cache (replicated Ehcache): Cache data at the same time in multiple application nodes, data replication and failure events in a synchronous or asynchronous form across the cluster nodes spread. When the above event arrives, the operation of the write thread is blocked. In this mode, only the weak consistency model.
It has the following kinds of event propagation mechanisms: RMI, JGroups, JMS, and cache Server.
In RMI mode, all nodes are equal:
Jgroup mode: Can be configured for unicast or multicast, the protocol stack and configuration are very flexible.
XML code
- <cachemanagerpeerproviderfactory
- class="Net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
- properties= "connect=UDP (mcast_addr=231.12.21.132; mcast_port=45566;):P ing:
- MERGE2:FD_SOCK:VERIFY_SUSPECT:pbcast. NAKACK:UNICAST:pbcast. STABLE:FRAG:pbcast. GMS "
- Propertyseparator="::"
- />
JMS mode: The core of this pattern is a message queue, each application node subscribes to a predefined theme, and when the node has an element update, it also publishes the update element to the topic. On the JMS specification implementation, both Open MQ and active MQ are two, and Ehcache compatibility has been tested.
Cache Server Mode: The master-slave node exists in this mode, and communication can be done through restful APIs or soap.
Regardless of which mode, update events can be divided into updateviacopy or updateviainvalidate, the latter just send an expired message, the efficiency is much higher.
Replication caching is prone to inconsistent data issues, and if this becomes a problem, consider using the mechanism of data synchronization distribution.
Even without the use of distributed caches and replicated caches, there are still some bad behaviors, such as:
Caching Drift (Cache Drift): Each application node manages its own cache, and when a node is updated, it does not affect other nodes, so the data may be out of sync. This is especially the case in the Web session data cache.
Database bottlenecks: For single-instance applications, caching protects the database from read storms; however, in a clustered environment, each application node must keep the data up-to-date at regular intervals, and the more nodes there are, the greater the overhead of maintaining such a situation for the database.
Six, storage mode :
1, in-heap storage: fast, but limited capacity.
2, out-of-heap (offheapstore) storage: Known as Bigmemory, only available in the Enterprise version of Ehcache, the principle is the use of NiO directbytebuffers implementation, faster than storage to disk, and completely unaffected by GC, The stability of the response time can be guaranteed, but the overhead of direct buffer is larger than heap buffer, and the requirement must be stored in a byte array, so the object must be serialized in the stored procedure, and the read is deserialized. Its speed is about one order of magnitude slower than the heap storage.
(Note: Direct buffer is not affected by GC, but the Java object that the direct buffer belongs to is on the heap and can be reclaimed by GC, and once it is reclaimed, the JVM frees the space outside of direct buffer.) )
3, disk storage.
seven, Cache usage mode :
Cache-aside: direct operation. Ask the cache if a cached data exists, return data directly from the cache, bypass the SOR, if it does not exist, get the data from the Sor, and then put it in the cache.
Java code
- Public V Readsomedata (K key)
- {
- element element;
- if (element = Cache.get (key)) = null) {
- return Element.getvalue ();
- }
- if (value = Readdatafromdatastore (key)) = = null) {
- Cache.put (the new Element (key, value));
- }
- return value;
- }
Cache-as-sor: Combined with read-through, Write-through, or write-behind operations, by adding a layer of proxies to the SOR, it does not differentiate whether the data is obtained from the cache or from the Sor for external application access.
Read-through.
Write-through.
Write-behind (Write-back): Writes the process of writing to asynchronous, and further delays the process of writing the data.
Two modes of Copy cache: Copyonread and copyonwrite.
Copyonread refers to the operation of the copy element (pull) that is retrieved from the source if the data is found to have expired when the request to read the cached data arrives.
Copyonwrite is the action (push) of the copy element that initiates the update of the other node when the real data is written to the cache.
The former is suitable for use when multiple threads are not allowed to access the same element, which allows you to freely control the timing of cache update notifications.
More changes and differences in push and pull can also be found here.
Eight, a variety of configuration methods :
Including configuration files, declarative configuration, programmatic configuration, and even configuration by specifying the parameters of the constructor, the principles of configuration design include:
All configurations to be put together
Cached configurations can be easily modified at the development stage, at run time
Incorrect configuration can be found at program startup, and runtime exceptions will be thrown when errors are modified at run time
Provides default configuration, almost all configurations are optional, with default values
ix. Automatic Resource control (Automatic Resource Control,arc):
It provides an intelligent way to control caching, tuning performance. Features include:
In-Memory cache object size control to avoid oom occurrences
Cache size acquisition for pooled (cache manager level), avoiding the consumption of separate compute cache size
Flexible independent layer-based size calculation capability, you can see that the size of the different layers can be individually controlled
You can count the byte size, the number of cache entries, and the percentage
Optimize the acquisition of high-hit data to improve performance, see the introduction to the flow of cached data between different tiers
The flow of cached data includes several behaviors:
Flush: The cache entry moves to the lower level.
Fault: Copy an object from the lower layer to the top. The fault behavior is triggered when a layer discovers its own cache entry in the process of acquiring the cache that has failed.
Eviction: Remove the cache entry.
Expiration: Failure status.
Pinning: Forces the cache entry to remain at a certain level.
The graph below reflects the flow of data between layers, and also reflects the life cycle of the data:
10. Monitoring function :
Monitored topologies:
Each application node deploys a monitoring probe that contacts the monitoring server via the TCP protocol, eventually providing the data to a rich text client or monitoring the operating server.
11. WAN Replication :
In terms of cache data replication, Ehcache allows two geographically diverse nodes to maintain data consistency under a wide area network, and it provides several scenarios (note: The following example plots only two nodes, which can actually be generalized to n nodes):
The first scenario: Terracotta active/mirror Replication.
In this scenario, the server contains an active node, a backup node, and each application node provides read and write services by the active node. This is the simplest and easiest to manage, but it is the most unstable scenario that needs to be expected in an ideal network situation where there is a WAN between server and client to server.
Second scenario: Transactional Cache Manager Replication.
In this scenario, the data read does not need to pass through the WAN, write two copies of the data, respectively, by two cache manager, one in the local server, one to the other server. This scenario reads higher throughput and lower latency, but requires the introduction of an XA transaction manager, with two cache managers writing two of data resulting in a large write overhead, and a write delay over the WAN that can still cause a system response bottleneck.
Third scenario: Messaging based (AMQ) replication.
In this scenario, batch processing and queuing are introduced to slow down the WAN bottleneck, while the processing read request and replication logic are physically stripped away from the server array, avoiding the impact of the WAN deterioration on the node read business. With higher throughput and lower latency, the separation of read/copy guarantees a complete message distribution guarantee and conflict handling, but it is more complex and requires a message bus.
There are some Ehcache features that are less or more marginalized, not mentioned, such as support for JMX, and others with similar features and descriptions, such as Web support, see my interpretation of Oscache, where the "Web Support" section has detailed analysis of the principles.
Finally, on the performance of Ehcache, the following figure is from Ehcache's founder, Greg Luck's blog:
Put/get on the ehcache to 500-1000 times faster than memcached. Why? His own analysis: "In-process Caching and asynchronous replication is a clear performance winner". Please refer to his blog for more information.
Original address: http://raychase.iteye.com/blog/1545906
http://blog.csdn.net/linlzk/article/details/47315805
Ehcache detailed interpretation (turn)