Talking about Java caching

Last Update:2018-07-26 Source: Internet

Author: User

Tags aop memcached jboss

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before we begin the discussion of caching, let's discuss another issue: Theory and Practice. From the Ahuaxuan contact programmer, some programmers biased practice, some programmers biased theory, but this is bad behavior, theory and practice is equally important, we are doing a lot of core algorithms, There is no theory at all, and in our years of practice, not summing up the theory can not deepen their understanding. So theory and practice are equally important.
Before we discuss caching, let's take a look at caching the thing itself. Ahuaxuan according to his own experience, the caching problem is subdivided into 4 kinds of small problems.

1 Why does the cache exist?
Where can the 2 cache exist?
3 What are the properties of the cache?
4 Cache Media?
1. Why does the cache exist?
In general, a Web site, or an application, its general form is that the browser request application server, the application server to do a pile of calculations and then request the database, the database received a request after a pile of calculations to return the data to the application server, The application server makes a stack of calculations and returns the data back to the browser. This is a standard process. But with the popularity of Internet, more and more people online, the amount of information on the Internet is more and more, in these two more and more cases, Our applications need to support more and more concurrent volume. Then our application server and the database server to do more and more, but often our application server resources are limited, the database to accept the number of requests per second is limited (who called our hard disk speed is limited). If you use limited resources to provide as much throughput as possible, one approach: reduce the amount of computing, shorten the request process (reduce network IO or hard disk IO), then the cache can be done. The basic principle of caching is to break the standard process depicted in the above diagram, in this standard process, Any link can be cut off. Requests can be fetched from the cache to the data directly back. This saves time, improves response speed, and saves hardware resources. We have limited hardware resources to serve more users.

Where can the 2 cache exist?
Java code

Browser--- browser and app--- layered app- database

Browser--- browser and app--- layered app- database

In the diagram above, we can see the general process of a request, and we'll redraw the diagram to make our structure a little bit more complex.
(tiering app)
Browser--- browser and app--- layered app- database

Theoretically to be, any part of the request is what the cache can do. The first link, the browser, if the data is on the browser, the speed is the fastest for the user because there is no need for network requests at all. The second link between the browser and the app, if the cache is added to this place, Then caching is transparent to the app. And this cache contains a complete page. The third node, the app itself has several levels, so the cache can be placed on a different level, which is part of the scenario or scenario. Caution is required when selecting a cache. The fourth link, The database can also have caching, such as MySQL Querycache.

So that means at any point in the entire request process, we can all add cache. But is all the data can be put into the cache. Of course not, the data that needs to be put into the cache always has some characteristics, it is clear to judge whether the data can be cached, how can be cached must start from the change characteristic of the data.

What are the characteristics of data changes? The simplest is two, changing and unchanging. We all know that the data that won't change does not need to be calculated every time. The question is whether all the data will change in theory, and change is the eternal theme of the world. That means we divide the data into two different things, So let's add one more condition: time. Then we can sum up the characteristics of the data for a period of time or change. Then, based on this data feature, we can cache the data in the right place and in the appropriate cache type.

3 What are the properties of the cache
From the object-oriented point of view, the cache is an object, then the object, there must be attributes. Then let's look at what properties the cache has. Here are 3 of the properties we use.
(1) Hit rate
A hit rate is the ratio of the number of requests to cache and the number of times the cache returns the correct result. The higher the ratio, the higher the cache usage.

The hit rate problem is a very important problem in the cache, we all want to cache the hit rate to 100%, but often counterproductive, and cache hit rate is a measure of the effectiveness of the cache is an important indicator.

(2) Maximum element
The maximum number of elements can be stored in the cache, once the number of elements in the cache exceeds this value, then the cache purge policy will be used, according to different scenarios to set the maximum element value can often increase the cache hit rate to some extent. Thus more efficient when caching.

(3) Empty policy

1 FIFO, first in primary, cached data is first cleared out when there is insufficient cache space (exceeding maximum element limit)
2 LFU, less frequently Used, the elements that have been used at least for a long time will be cleared away. This requires that the cached element have a hit attribute, and that the minimum hit value will be cleared out of the cache if the cache space is insufficient.
2 LRU, least recently Used, the least recently used, cached element has a timestamp, and when the cache is full and there is a need to make room to cache new elements, the elements in the existing cache element that are the furthest from the current time stamp are cleared out of the cache.

4 Cache Media
From the hardware media will be nothing more than two kinds of memory and hard disk (corresponding to the application layer of the program does not need to consider registers and so on). But often we will not be divided from the hardware, the General division method is from the technical division, can be divided into several, memory, hard disk files.
(1) memory. Putting the cache in memory is the quickest option, any program directly manipulate memory is much faster than the hard drive, but if your data to take into account the problem of break down, because the data placed in memory we call no persistent words, if there is no backup on the hard drive, after the machine down machine , difficult or impossible to recover.

(2) hard disk. In general, many caching frameworks use memory and hard drives in combination, for example, when the space allocated to memory is full, the user will be given the option to persist the data required to exit the memory space to the hard drive. Of course also choose to put the data directly to the hard disk (in memory, a hard drive, down machine is not afraid). There are other caches that put data directly on the hard drive.

(3) database. When it comes to the database, some people may think, before talking about reducing the number of database queries, reducing the pressure of database calculation, now how to use the database as a cached media. This is because there are many different types of databases, such as Berkleydb, which do not support SQL statements. There is no SQL engine, just key and value storage structure, so the speed is very fast, on the contemporary PC, more than 10 W queries per second is no problem (of course, this is based on the business characteristics of the decision, if you access the data is evenly distributed, That Ahuaxuan can not guarantee this speed.

In addition to caching media, Ahuaxuan is divided into local cache and remote cache according to the degree of coupling between the buffer and the application.
A local cache is a cache component that is included in an application. Remote cache refers to and applies a buffer component that is decoupled from the application. Typical local cache has ehcache,oscache, while remote The cache has a famous memcached.

The biggest advantage of LocalCache is that the application and cache are in the same process, the request cache is very fast, there is no need for network overhead and so on. Therefore, single application, do not need to cluster or cluster in case cache node does not need to notify each other in the case of using local Cache is more appropriate. This is why Ehcache and Oscache in Java are so popular.
But the local cache has some drawbacks, generally this caching framework (such as Ehcache or Oscache in Java) are local cache. That is, with the application, multiple applications cannot share the cache directly, In the case of cluster application, the problem is more obvious, of course, some cache components provide a cluster node to notify each other cache updates, but because this is broadcast, or loop update, in the case of frequent cache update, the network IO overhead is very large, Severe time will affect the normal operation of the application. And if the data in the cache is large enough to use localcache means that each application has such a large cache, is absolutely a waste of memory.

So in this case, we tend to choose remote cache, such as memcached. In such a cluster or distributed situation, each application can share the memcached data, which are used through the socket and based on the tcp/ The memcached protocol at the top of the IP protocol is directly connected to the memcached, and an app updates the values in memcached, and all applications get the latest values. Although this time a lot more network overhead, But often this scheme is much more common than LocalCache radio or loop update cache nodes, and the performance is higher than the latter. Because the data needs to be saved only one copy, it also increases the usage rate of memory.

Through the above analysis can be seen, whether the local cache, or remote cache in the field of caching has its own place, So Ahuaxuan recommends that you make sure that you choose or use caching based on the characteristics of the cache and our business scenario to determine exactly what cache to use. This will give full play to the caching function.

Ahuaxuan that the use of caching is an essential skill for architects, and that a good architect can accurately determine what type of cache to use based on the type of data and the scenario of the business, and how to use this type of cache. There is no silver bullet in the world of caching, There is not a single cache that can solve any business scenario or data type, and if it does, the architect is less valuable. Oh.

Oscache
　　
Oscache is a widely used, High-performance Java Java Cache framework that Oscache can be used for any common caching solution for any java-based application.
　　
Oscache has the following characteristics:
　　
Caching any object, you can cache part of the JSP page or HTTP request without restriction, and any Java object can be cached.
　
Having a comprehensive Api--oscache API gives you a comprehensive program to control all Oscache features.
　　
Persistent caching-caching can be written to the hard disk at will, allowing expensive creation (expensive-to-create) data to remain cached and even restart the application.
　　
Support cluster--the cluster cache data can be configured individually and without the need to modify the code.
　　
Expiration of cached records-you can have maximum control over the expiration of cached objects, including pluggable refresh policies (if the default performance is not required).
　　
Java Caching System
　　
JSC (Java Caching System) is a distributed caching system, a server-based Java application. It accelerates dynamic Web applications by providing management of a variety of dynamic cache data.
　　
JCS, like other caching systems, is an application for high-speed reading and low speed writing.
　　
Dynamic content and reporting systems can achieve better performance.
　　
If a Web site, with a duplicate web site structure, uses a database of intermittent updates (rather than a continuous update database) and is repeatedly searching for the same result, it can improve its performance and scalability by performing a caching approach.
　　
EHCache
　　
EHCache is a pure Java cache in the process, which has the following features: Fast, simple, Hibernate2.1 as pluggable caching, minimal dependencies, comprehensive documentation and testing.
　　
Jcache
　　
Jcache is an open-source program that is working hard to become a JSR-107 open source specification that JSR-107 specifications have not changed for years. This version is still built on the original functional definition.
　　
Shiftone
　　
Shiftone Java Object Cache is a Java Lib that performs a rigorous set of object caching policies, like a lightweight framework for configuring the caching of working states.
　　
Swarmcache
　　
Swarmcache is a simple and efficient distributed cache that uses IP multicast to communicate with other hosts on the same LAN, specifically designed for cluster and data-driven Web applications. Swarmcache can provide better performance support for this type of application, which typically reads much more than write operations.
　　
Swarmcache uses javagroups to manage communication between dependencies and distributed caching.
　　
Treecache/jbosscache
　　
Jbosscache is a replicated transactional cache that allows you to cache enterprise-class application data to better improve performance. Cached data is automatically replicated, allowing you to easily perform cluster work between the JBoss servers. Jbosscache can run an Mbean service through the JBoss application service or other Java EE containers, and of course it can run independently.
　　
Jbosscache consists of two modules: Treecache and TREECACHEAOP.
　　
Treecache--is a transactional cache of tree-structured replication.
　　
TREECACHEAOP--is an "object-oriented" cache that uses AOP to dynamically manage Pojo (Plain old Java Objects)
　　
Note: AOP is a continuation of OOP, an abbreviation for aspect oriented programming, meaning aspect-oriented programming.
　　
Whirlycache
　　
Whirlycache is a fast, configurable cache of objects that exist in memory. It can speed up Web sites or applications by caching objects, or it must be built by querying databases or other expensive handlers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More